Introduction to IL Academic Compute Environment (ACE)

Overview

The IL Academic Compute Environment (ACE) provides an environment for supporting and advancing academic research in a variety of areas. It currently consists of Intel® Xeon® based servers and FPGAs.

Cluster Architecture

Compute Nodes

Each compute node has access to shared NFS based filesystems. The Cascade Lake based nodes are connected to each other by Intel Omni-Path (Intel OPA) 100 Series interconnect. All nodes are connected via 10Gb Ethernet.

Access Nodes

Upon login, users are placed onto an access node, by default in their $HOME directory. This access node has the same compilers and tools installed as each of the compute nodes. You are free to do compilations and text editing from this node but please keep compute intensive tasks confined to the compute nodes. If you require direct access to a compute node for compilation/prototyping purposes you can run the following to get an interactive session on a compute node:

qsub -V -I

How to request an account

You’ll need an account to be able to access the IL Academic Compute Environment. To request an account please fill out the form here.

After filling out and submitting the registration you’ll receive an email within 48 hours during business days. This will give you further instructions of how to reach the nodes through the SSH gateway. You will also receive a second email from Duo with instructions on how to set-up the two-factor authentication required to log-in to the nodes.

Account lifetime

Your account is active for 3 months at a time. Before the 3 month period is up you will receive an email asking if you’d like to extend your account. If you are still actively using the cluster follow the directions listed in the email. If you are no longer using the cluster you can ignore the message. If we have not heard from you before the expiration date sent in the email, your account will be deactivated.

In most circumstances, the data associated with your account will remain around for 30 days after your account expires. After that time your data is subject to deletion.

How to access the nodes

  • Once you have been given access to the IL Academic Compute Environment you will be able to reach the cluster by first ssh’ing into the access node

  • As part of the account creation process your public key will be installed in your home directory. Ensure that your ssh client is using the proper corresponding private key and connect to ssh-iam.intel-research.net using the username provided.

  • Once you’ve successfully connected via ssh to the access node you will be in your home directory on the access node.

  • From here you can push data to the shared storage located at /tier2/<university name> or submit a job to be ran on the cluster through the PBS scheduler. These procedures are described in the following sections.

Data Storage in the Cluster

The cluster has multiple areas in which data resides. Your personal files will live in your home directory but we also have school directories as a shared collaboration space and large file storage area. The tier1 school directories are backed by SSDs and is the best place to store smaller files that you need fast access to. The tier2 school directories are backed by HDDs and is where you should store large files.

Directory

Storage Limit

/homes/<user>

20GB

/tier1/<school>

300GB

/tier2/<school>

2TB+

/export/shared

NA

/export/software

NA

/export/<school>

deprecated

/export/<school>/lustre

deprecated

Any data pushed to any of these should be accessible from any of the nodes on the respective cluster. Please use your preferred SCP or SFTP program for pushing datasets to this location.

NOTE No data in this cluster is backed up. It is up to you to ensure that your scripts, datasets, and results, are stored outside of the cluster in case of failure.

How to access additional compilers and libraries

Multiple compilers and libraries are available on the cluster via the module command. To see the available compilers use the module avail command.

  $ module avail

-------------------------- /opt/ohpc/pub/modulefiles --------------------------
   cmake/3.13.4    gnu8/8.3.0    intel/18.0.0.128    intel/19.0.3.199 (D)    prun/1.3

  Where:
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

To load a specific compiler version, use the module load command

$ module load intel-oneapi
$ icc -v
icc version 2021.4.0 (gcc version 4.8.5 compatibility)

Note that after a compiler family is loaded, you will then have access to various MPI libraries. After you’ve loaded the compiler family, use module avail again to find out what MPI libraries are available.

$ module load gnu8
$ module avail

------------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 --------------------------------------------------------------------------------------
   hdf5/1.10.5    metis/5.1.0    mpich/3.3.1    mvapich2/2.3.2    openblas/0.3.7    openmpi3/3.1.4    superlu/5.2.1

--------------------------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------------------------------------------------------------
   cmake/3.15.4    gnu8/8.3.0      (L)    intel-oneapi/2021.3        intel/19.0.3.199    intel/2020.1    intel/2020.4 (D)    opam/2.0.7       opencv/3.2.0        openmpi1           prun/1.3
   gnu4/4.5.2      inaccel/default        intel-oneapi/2021.4 (D)    intel/19.0.5.281    intel/2020.2    llvm5/5.0.1         opencv/latest    opencv/4.4.0 (D)    openvino/2021.1    singularity/3.4.1

Where:
   D:  Default Module
   L:  Module is loaded

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".


$ module load openmpi3
$ module avail

--------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 ---------------------------------------------------------------------------------
   boost/1.71.0    hypre/2.18.1    mumps/5.2.1     opencoarrays/2.8.0    phdf5/1.10.5      scalapack/2.0.2    superlu_dist/6.1.1
   fftw/3.3.8      mfem/4.0        netcdf/4.7.1    petsc/3.12.0          ptscotch/6.0.6    slepc/3.12.0       trilinos/12.14.1

------------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 --------------------------------------------------------------------------------------
   hdf5/1.10.5    metis/5.1.0    mpich/3.3.1    mvapich2/2.3.2    openblas/0.3.7    openmpi3/3.1.4 (L)    superlu/5.2.1

--------------------------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------------------------------------------------------------
   cmake/3.15.4    gnu8/8.3.0      (L)    intel-oneapi/2021.3        intel/19.0.3.199    intel/2020.1    intel/2020.4 (D)    opam/2.0.7       opencv/3.2.0        openmpi1           prun/1.3
   gnu4/4.5.2      inaccel/default        intel-oneapi/2021.4 (D)    intel/19.0.5.281    intel/2020.2    llvm5/5.0.1         opencv/latest    opencv/4.4.0 (D)    openvino/2021.1    singularity/3.4.1

Where:
   D:  Default Module
   L:  Module is loaded

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

For more information see the OpenHPC - Developer Tooling page.

How to submit a task

The IL Academic Compute Environment utilizes the PBS scheduler. Jobs submitted to the cluster will run on one or more of the available nodes. Ensure that your data exists in the shared storage location (/tier1/<school> or /tier2/<school>) so that it is available to the node that ends up processing your job.

To submit your task, first add the following to your ~/.bashrc and ‘source’ it so that you have access to the PBS commands.

export PATH=$PATH:/opt/pbs/default/bin

After this is complete, you can verify that your environment is set up properly by running qstat -B, which should give the output listed below.

qstat -B

Server           Max   Tot   Que   Run   Hld   Wat   Trn   Ext   Status
---------------- ----- ----- ----- ----- ----- ----- ----- ----- -----------
iam-pbs          0     3     0     1     0     0     0     2     Active
  • The default queue is xeon. If you submit a job and don’t specify the queue, your job will be submitted to the xeon queue automatically. There is a limit of 10 Xeon nodes per user. If you require more nodes, you can reach out to us and we will try to accommodate your request.

  • To submit a job to a non-default queue, the queue must be specified in the qsub command:

    qsub -q <queue-name> <job-script>
    qsub -q <queue-name> -I
    
  • To request a single xeon node (default queue):

    qsub -lselect=1:ncpus=160 <job-script>
    

To submit your first job use the qsub command. This job’s output will be sent to ~/shared.txt

qsub ./HelloWorld.sh
68.iam-pbs (Where 68 is an unique job id )

HelloWorld is a simple script which sleeps for 30s on a node and writes the directory structure of /export/data1/ to your home directory at shared.txt.

#PBS -l select=1:ncpus=160 -lplace=excl
echo "Sleeping for 30 seconds on `cat $PBS_NODEFILE`" >> ~/shared.txt echo "Writing directory structure of ls -l
/export/software to ~/shared.txt" >> ~/shared.txt ls -l /export/software >>
~/shared.txt

To view the status of your running jobs, use the qstat command:

-bash-4.2$ qstat

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
67.iam-pbs       64_node_pbs.sh   labuser         5512:16: R xeon
69.iam-pbs       HelloWorld.sh    labuser         00:00:00 E xeon

To stop a task prior to its completion use qdel with the job id:

-bash-4.2$ qdel
67.iam-pbs

Requesting support

If you have issues connecting or running jobs in the cluster feel free to reach out to our support team by emailing ace@intel-research.net. In this email, please be descriptive as possible. Include any relevant job numbers, machines used, etc.

Important Takeaways

  • The first line of your script defines how many nodes you want to run on and whether or not you want your job to have exclusive access to the nodes while running.

    • select=1 – increase to increase the # of nodes your job runs o - n

    • lplace=excl – exclude this if you don’t require exclusive access to the nodes your job is scheduled on during the duration of your run- -

  • Make sure any script you put together is executable (chmod 0755 ./HelloWorld.sh).

  • When submitting jobs, ensure that you include the proper PATH. For example if you have a “HelloWorld.sh” in your home directory do this.

    qsub ./HelloWorld.sh
    

    Not this.

    qsub HelloWorld.sh