======================================================
Introduction to IL Academic Compute Environment (ACE)
======================================================

Overview 
========

The IL Academic Compute Environment (ACE) provides an environment for supporting
and advancing academic research in a variety of areas. It currently consists of
Intel® Xeon® based servers and `FPGAs <FPGA.html>`_.

Cluster Architecture 
==============================

Compute Nodes 
--------------

Each compute node has access to shared NFS based filesystems. The Cascade Lake based
nodes are connected to each other by Intel Omni-Path (Intel OPA) 100 Series
interconnect.  All nodes are connected via 10Gb Ethernet.

Access Nodes 
------------

Upon login, users are placed onto an access node, by default in their $HOME
directory. This access node has the same compilers and tools installed as each
of the compute nodes. You are free to do compilations and text editing from this
node but please keep compute intensive tasks confined to the compute nodes. If
you require direct access to a compute node for compilation/prototyping
purposes you can run the following to get an interactive session on a compute node:

::
   
   qsub -V -I

How to request an account 
==============================

You’ll need an account to be able to access the IL Academic Compute Environment.
To request an account please fill out the form `here <https://registration.intel-research.net/register>`_.

After filling out and submitting the registration you’ll receive an email
within 48 hours during business days. This will give you further instructions
of how to reach the nodes through the SSH gateway. You will also receive a
second email from Duo with instructions on how to set-up the two-factor
authentication required to log-in to the nodes.

Account lifetime
================

Your account is active for 3 months at a time.  Before the 3 month period is up
you will receive an email asking if you'd like to extend your account.  If you
are still actively using the cluster follow the directions listed in the email.
If you are no longer using the cluster you can ignore the message.  If we have
not heard from you before the expiration date sent in the email, your account will 
be deactivated.

In most circumstances, the data associated with your account will remain around for
30 days after your account expires.  After that time your data is subject to deletion.

How to access the nodes 
=======================

- Once you have been given access to the IL Academic Compute Environment
  you will be able to reach the cluster by first ssh’ing into the access node

- As part of the account creation process your public key will be installed in 
  your home directory.  Ensure that your ssh client is using the proper corresponding
  private key and connect to ssh-iam.intel-research.net using the username provided.

- Once you’ve successfully connected via ssh to the access node you will be in your home
  directory on the access node. 

- From here you can push data to the shared storage located at
  /tier2/<university name> or submit a job to be ran on the cluster through
  the PBS scheduler. These procedures are described in the following sections.    

Data Storage in the Cluster 
===========================

The cluster has multiple areas in which data resides.  Your personal files
will live in your home directory but we also have school directories
as a shared collaboration space and large file storage area.  The tier1 
school directories are backed by SSDs and is the best place to store 
smaller files that you need fast access to.  The tier2 school directories 
are backed by HDDs and is where you should store large files.

+-------------------------+---------------+
| Directory               | Storage Limit |
+-------------------------+---------------+
| /homes/<user>           | 20GB          |
+-------------------------+---------------+
| /tier1/<school>         | 300GB         |
+-------------------------+---------------+
| /tier2/<school>         | 2TB+          |
+-------------------------+---------------+
| /export/shared          | N\A           |
+-------------------------+---------------+
| /export/software        | N\A           |
+-------------------------+---------------+
| /export/<school>        | deprecated    |
+-------------------------+---------------+
| /export/<school>/lustre | deprecated    |
+-------------------------+---------------+

Any data pushed to any of these should be accessible from any of the nodes on
the respective cluster. Please use your preferred SCP or SFTP program for
pushing datasets to this location.

**NOTE** No data in this cluster is backed up.  It is up to you to ensure that
your scripts, datasets, and results, are stored outside of the cluster in case 
of failure.

How to access additional compilers and libraries
================================================

Multiple compilers and libraries are available on the cluster via the ``module`` command.  To see the available compilers use the ``module avail`` command.

::

     $ module avail

   -------------------------- /opt/ohpc/pub/modulefiles --------------------------
      cmake/3.13.4    gnu8/8.3.0    intel/18.0.0.128    intel/19.0.3.199 (D)    prun/1.3

     Where:
      D:  Default Module

   Use "module spider" to find all possible modules.
   Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".


To load a specific compiler version, use the module load command

::

   $ module load intel-oneapi
   $ icc -v
   icc version 2021.4.0 (gcc version 4.8.5 compatibility)


Note that after a compiler family is loaded, you will then have access to various MPI libraries.  After you've loaded the compiler family, use ``module avail`` again to find out what MPI libraries are available.


::

   $ module load gnu8
   $ module avail

   ------------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 --------------------------------------------------------------------------------------
      hdf5/1.10.5    metis/5.1.0    mpich/3.3.1    mvapich2/2.3.2    openblas/0.3.7    openmpi3/3.1.4    superlu/5.2.1

   --------------------------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------------------------------------------------------------
      cmake/3.15.4    gnu8/8.3.0      (L)    intel-oneapi/2021.3        intel/19.0.3.199    intel/2020.1    intel/2020.4 (D)    opam/2.0.7       opencv/3.2.0        openmpi1           prun/1.3
      gnu4/4.5.2      inaccel/default        intel-oneapi/2021.4 (D)    intel/19.0.5.281    intel/2020.2    llvm5/5.0.1         opencv/latest    opencv/4.4.0 (D)    openvino/2021.1    singularity/3.4.1

   Where:
      D:  Default Module
      L:  Module is loaded

   Use "module spider" to find all possible modules.
   Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".


   $ module load openmpi3
   $ module avail

   --------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8-openmpi3 ---------------------------------------------------------------------------------
      boost/1.71.0    hypre/2.18.1    mumps/5.2.1     opencoarrays/2.8.0    phdf5/1.10.5      scalapack/2.0.2    superlu_dist/6.1.1
      fftw/3.3.8      mfem/4.0        netcdf/4.7.1    petsc/3.12.0          ptscotch/6.0.6    slepc/3.12.0       trilinos/12.14.1

   ------------------------------------------------------------------------------------- /opt/ohpc/pub/moduledeps/gnu8 --------------------------------------------------------------------------------------
      hdf5/1.10.5    metis/5.1.0    mpich/3.3.1    mvapich2/2.3.2    openblas/0.3.7    openmpi3/3.1.4 (L)    superlu/5.2.1

   --------------------------------------------------------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------------------------------------------------------------
      cmake/3.15.4    gnu8/8.3.0      (L)    intel-oneapi/2021.3        intel/19.0.3.199    intel/2020.1    intel/2020.4 (D)    opam/2.0.7       opencv/3.2.0        openmpi1           prun/1.3
      gnu4/4.5.2      inaccel/default        intel-oneapi/2021.4 (D)    intel/19.0.5.281    intel/2020.2    llvm5/5.0.1         opencv/latest    opencv/4.4.0 (D)    openvino/2021.1    singularity/3.4.1

   Where:
      D:  Default Module
      L:  Module is loaded

   Use "module spider" to find all possible modules.
   Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".


For more information see the :doc:`OpenHPC` page.


How to submit a task 
====================

The IL Academic Compute Environment utilizes the PBS scheduler. Jobs submitted 
to the cluster will run on one or more of the available nodes. Ensure that your 
data exists in the shared storage location (/tier1/<school> or /tier2/<school>) 
so that it is available to the node that ends up processing your job. 

To submit your task, first add the following to your ~/.bashrc and ‘source’ it
so that you have access to the PBS commands.

::

   export PATH=$PATH:/opt/pbs/default/bin

After this is complete, you can verify that your environment is set up properly
by running qstat -B, which should give the output listed below.
   

::

   qstat -B

   Server           Max   Tot   Que   Run   Hld   Wat   Trn   Ext   Status 
   ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----------- 
   iam-pbs          0     3     0     1     0     0     0     2     Active


- The default queue is xeon. If you submit a job and don’t specify the queue, your job will be submitted to the xeon queue automatically. There is a limit of 10 Xeon nodes per user. If you require more nodes, you can reach out to us and we will try to accommodate your request.

- To submit a job to a non-default queue, the queue must be specified in the qsub command:

  :: 

   qsub -q <queue-name> <job-script>
   qsub -q <queue-name> -I

- To request a single xeon node (default queue):

  ::

   qsub -lselect=1:ncpus=160 <job-script>

To submit your first job use the qsub command. This job’s output will be sent
to ~/shared.txt

::
 
   qsub ./HelloWorld.sh 
   68.iam-pbs (Where 68 is an unique job id )

HelloWorld is a simple script which sleeps for 30s on a node and writes the
directory structure of /export/data1/ to your home directory at shared.txt.

::

   #PBS -l select=1:ncpus=160 -lplace=excl 
   echo "Sleeping for 30 seconds on `cat $PBS_NODEFILE`" >> ~/shared.txt echo "Writing directory structure of ls -l
   /export/software to ~/shared.txt" >> ~/shared.txt ls -l /export/software >>
   ~/shared.txt

To view the status of your running jobs, use the qstat command:

::
  
   -bash-4.2$ qstat

   Job id           Name             User             Time Use S Queue 
   ---------------- ---------------- ---------------- -------- - ----- 
   67.iam-pbs       64_node_pbs.sh   labuser         5512:16: R xeon
   69.iam-pbs       HelloWorld.sh    labuser         00:00:00 E xeon

To stop a task prior to its completion use qdel with the job id:

::

   -bash-4.2$ qdel 
   67.iam-pbs

Requesting support
===================

If you have issues connecting or running jobs in the cluster feel free to reach
out to our support team by emailing ace@intel-research.net.  In this email,
please be descriptive as possible.  Include any relevant job numbers, machines used,
etc. 

Important Takeaways 
===================

- The first line of your script defines how many nodes you want to run on and
  whether or not you want your job to have exclusive access to the nodes while
  running.

   - select=1 – increase to increase the # of nodes your job runs o - n
   - lplace=excl – exclude this if you don’t require exclusive access to the
     nodes your job is scheduled on during the duration of your run- - 

- Make sure any script you put together is executable (chmod 0755
  ./HelloWorld.sh).

- When submitting jobs, ensure that you include the proper PATH.
  For example if you have a “HelloWorld.sh” in your home directory
  do this.

  :: 

   qsub ./HelloWorld.sh

  Not this.

  ::
   
   qsub HelloWorld.sh