====================================================== Introduction to IL Academic Compute Environment (ACE) ====================================================== Overview ======== The IL Academic Compute Environment (ACE) provides an environment for supporting and advancing academic research in a variety of areas. It currently consists of Intel® Xeon® based servers and `FPGAs `_. Cluster Architecture ============================== Compute Nodes -------------- Each compute node has access to shared NFS based filesystems. The Cascade Lake based nodes are connected to each other by Intel Omni-Path (Intel OPA) 100 Series interconnect. All nodes are connected via 10Gb Ethernet. Access Nodes ------------ Upon login, users are placed onto an access node, by default in their $HOME directory. This access node has the same compilers and tools installed as each of the compute nodes. You are free to do compilations and text editing from this node but please keep compute intensive tasks confined to the compute nodes. If you require direct access to a compute node for compilation/prototyping purposes you can run the following to get an interactive session on a compute node: :: qsub -V -I How to request an account ============================== You’ll need an account to be able to access the IL Academic Compute Environment. To request an account please fill out the form `here `_. After filling out and submitting the registration you’ll receive an email within 48 hours during business days. This will give you further instructions of how to reach the nodes through the SSH gateway. You will also receive a second email from Duo with instructions on how to set-up the two-factor authentication required to log-in to the nodes. Account lifetime ================ Your account is active for 3 months at a time. Before the 3 month period is up you will receive an email asking if you'd like to extend your account. If you are still actively using the cluster follow the directions listed in the email. If you are no longer using the cluster you can ignore the message. If we have not heard from you before the expiration date sent in the email, your account will be deactivated. In most circumstances, the data associated with your account will remain around for 30 days after your account expires. After that time your data is subject to deletion. How to access the nodes ======================= - Once you have been given access to the IL Academic Compute Environment you will be able to reach the cluster by first ssh’ing into the access node - As part of the account creation process your public key will be installed in your home directory. Ensure that your ssh client is using the proper corresponding private key and connect to ssh-iam.intel-research.net using the username provided. - Once you’ve successfully connected via ssh to the access node you will be in your home directory on the access node. - From here you can push data to the shared storage located at /tier2/ or submit a job to be ran on the cluster through the PBS scheduler. These procedures are described in the following sections. Data Storage in the Cluster =========================== The cluster has multiple areas in which data resides. Your personal files will live in your home directory but we also have school directories as a shared collaboration space and large file storage area. The tier1 school directories are backed by SSDs and is the best place to store smaller files that you need fast access to. The tier2 school directories are backed by HDDs and is where you should store large files. +-------------------------+---------------+ | Directory | Storage Limit | +-------------------------+---------------+ | /homes/ | 20GB | +-------------------------+---------------+ | /tier1/ | 300GB | +-------------------------+---------------+ | /tier2/ | 2TB+ | +-------------------------+---------------+ | /export/shared | N\A | +-------------------------+---------------+ | /export/software | N\A | +-------------------------+---------------+ | /export/ | deprecated | +-------------------------+---------------+ | /export//lustre | deprecated | +-------------------------+---------------+ Any data pushed to any of these should be accessible from any of the nodes on the respective cluster. Please use your preferred SCP or SFTP program for pushing datasets to this location. **NOTE** No data in this cluster is backed up. It is up to you to ensure that your scripts, datasets, and results, are stored outside of the cluster in case of failure. How to access additional compilers and libraries ================================================ Multiple compilers and libraries are available on the cluster via the ``module`` command. To see the available compilers use the ``module avail`` command. :: $ module avail -------------------------- /opt/ohpc/pub/modulefiles -------------------------- cmake/3.13.4 gnu8/8.3.0 intel/18.0.0.128 intel/19.0.3.199 (D) prun/1.3 Where: D: Default Module Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". To load a specific compiler version, use the module load command :: $ module load intel-oneapi $ icc -v icc version 2021.4.0 (gcc version 4.8.5 compatibility) Note that after a compiler family is loaded, you will then have access to various MPI libraries. After you've loaded the compiler family, use ``module avail`` again to find out what MPI libraries are available. :: $ module load gnu8 $ module avail ------------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 -------------------------------------------------------------------------------------- hdf5/1.10.5 metis/5.1.0 mpich/3.3.1 mvapich2/2.3.2 openblas/0.3.7 openmpi3/3.1.4 superlu/5.2.1 --------------------------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ---------------------------------------------------------------------------------------- cmake/3.15.4 gnu8/8.3.0 (L) intel-oneapi/2021.3 intel/19.0.3.199 intel/2020.1 intel/2020.4 (D) opam/2.0.7 opencv/3.2.0 openmpi1 prun/1.3 gnu4/4.5.2 inaccel/default intel-oneapi/2021.4 (D) intel/19.0.5.281 intel/2020.2 llvm5/5.0.1 opencv/latest opencv/4.4.0 (D) openvino/2021.1 singularity/3.4.1 Where: D: Default Module L: Module is loaded Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". $ module load openmpi3 $ module avail --------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 --------------------------------------------------------------------------------- boost/1.71.0 hypre/2.18.1 mumps/5.2.1 opencoarrays/2.8.0 phdf5/1.10.5 scalapack/2.0.2 superlu_dist/6.1.1 fftw/3.3.8 mfem/4.0 netcdf/4.7.1 petsc/3.12.0 ptscotch/6.0.6 slepc/3.12.0 trilinos/12.14.1 ------------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 -------------------------------------------------------------------------------------- hdf5/1.10.5 metis/5.1.0 mpich/3.3.1 mvapich2/2.3.2 openblas/0.3.7 openmpi3/3.1.4 (L) superlu/5.2.1 --------------------------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ---------------------------------------------------------------------------------------- cmake/3.15.4 gnu8/8.3.0 (L) intel-oneapi/2021.3 intel/19.0.3.199 intel/2020.1 intel/2020.4 (D) opam/2.0.7 opencv/3.2.0 openmpi1 prun/1.3 gnu4/4.5.2 inaccel/default intel-oneapi/2021.4 (D) intel/19.0.5.281 intel/2020.2 llvm5/5.0.1 opencv/latest opencv/4.4.0 (D) openvino/2021.1 singularity/3.4.1 Where: D: Default Module L: Module is loaded Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". For more information see the :doc:`OpenHPC` page. How to submit a task ==================== The IL Academic Compute Environment utilizes the PBS scheduler. Jobs submitted to the cluster will run on one or more of the available nodes. Ensure that your data exists in the shared storage location (/tier1/ or /tier2/) so that it is available to the node that ends up processing your job. To submit your task, first add the following to your ~/.bashrc and ‘source’ it so that you have access to the PBS commands. :: export PATH=$PATH:/opt/pbs/default/bin After this is complete, you can verify that your environment is set up properly by running qstat -B, which should give the output listed below. :: qstat -B Server Max Tot Que Run Hld Wat Trn Ext Status ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----------- iam-pbs 0 3 0 1 0 0 0 2 Active - The default queue is xeon. If you submit a job and don’t specify the queue, your job will be submitted to the xeon queue automatically. There is a limit of 10 Xeon nodes per user. If you require more nodes, you can reach out to us and we will try to accommodate your request. - To submit a job to a non-default queue, the queue must be specified in the qsub command: :: qsub -q qsub -q -I - To request a single xeon node (default queue): :: qsub -lselect=1:ncpus=160 To submit your first job use the qsub command. This job’s output will be sent to ~/shared.txt :: qsub ./HelloWorld.sh 68.iam-pbs (Where 68 is an unique job id ) HelloWorld is a simple script which sleeps for 30s on a node and writes the directory structure of /export/data1/ to your home directory at shared.txt. :: #PBS -l select=1:ncpus=160 -lplace=excl echo "Sleeping for 30 seconds on `cat $PBS_NODEFILE`" >> ~/shared.txt echo "Writing directory structure of ls -l /export/software to ~/shared.txt" >> ~/shared.txt ls -l /export/software >> ~/shared.txt To view the status of your running jobs, use the qstat command: :: -bash-4.2$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 67.iam-pbs 64_node_pbs.sh labuser 5512:16: R xeon 69.iam-pbs HelloWorld.sh labuser 00:00:00 E xeon To stop a task prior to its completion use qdel with the job id: :: -bash-4.2$ qdel 67.iam-pbs Requesting support =================== If you have issues connecting or running jobs in the cluster feel free to reach out to our support team by emailing ace@intel-research.net. In this email, please be descriptive as possible. Include any relevant job numbers, machines used, etc. Important Takeaways =================== - The first line of your script defines how many nodes you want to run on and whether or not you want your job to have exclusive access to the nodes while running. - select=1 – increase to increase the # of nodes your job runs o - n - lplace=excl – exclude this if you don’t require exclusive access to the nodes your job is scheduled on during the duration of your run- - - Make sure any script you put together is executable (chmod 0755 ./HelloWorld.sh). - When submitting jobs, ensure that you include the proper PATH. For example if you have a “HelloWorld.sh” in your home directory do this. :: qsub ./HelloWorld.sh Not this. :: qsub HelloWorld.sh