================= FPGA Accelerators ================= Intel is building a family of FPGA accelerators aimed at data centers. The family shares a common software layer, the Open Programmable Acceleration Engine (`OPAE `_), as well as a common hardware-side Core Cache Interface (`CCI-P `_). We are making a collection of systems available to researchers through IL Academic Compute Environment and the Intel Hardware Accelerator Research Program (HARP). Support ======= For support, please join the `HARP Forum `_ and post questions there. FPGA system classes =================== System class names are used for multiple purposes. Classes match queue names in the batch system. Class names are also used in setup scripts to configure the Quartus synthesis environment for particular hardware. .. _fpga-classes-label: .. list-table:: FPGA Classes :widths: 20 80 :header-rows: 1 * - Class - Description * - fpga-pac-s10 - | PCIe `D5005 Programmable Acceleration Cards `_ (PACs) with a Stratix 10 SX | FPGA (1SX280HN2F43E2VG). OpenCL and RTL are both supported. These | cards offer PCIe Gen 3 x16 and 4 channels of 8GB DDR4-2400 with ECC. * - fpga-pac-s10-2 - | Two PCIe `D5005 Programmable Acceleration Cards `_ (PACs) with a Stratix 10 SX | FPGA (1SX280HN2F43E2VG). OpenCL and RTL are both supported. The | cards are not yet networked with QSFP+ ports, but could be used for | projects requiring a pair of cards, communicating through system | memory. Please use the larger fpga-pac-s10 pool if your application | uses only a single FPGA. Please note the instructions below for | configuring :ref:`multi-FPGA systems `. * - fpga-pac-a10 - | PCIe `Programmable Acceleration Cards `_ (PACs) with an Arria 10 GX | FPGA (10AX115N2F40E2LG). Two cards are installed in each system and the | 40GbE QSFP+ ports are connected to each other in order to facilitate | research on networked FPGAs. OpenCL and RTL are both supported. | Please note the instructions below for configuring :ref:`multi-FPGA systems `. * - fpga-bdx-opae - | Broadwell Xeon CPUs (E5-2600v4) with an integrated in-package Arria 10 | GX1150 FPGA (10AX115U3F45E2SGE3). These systems are for workloads | written in RTL. * - fpga-bdx-opencl - | The same Broadwell Xeon+FPGA systems as fpga-bdx-opae, but configured | for use with logic written in OpenCL. Unlike later OPAE-managed | systems, Broadwell uses different FPGA-side base logic for OpenCL, | forcing a separation of RTL and OpenCL servers. * - fpga-bdx-aal - | The same Broadwell Xeon+FPGA systems as fpga-bdx-opae, but the loaded | Linux kernel driver is Intel's legacy Accelerator Abstraction Layer | (AAL). New projects should use OPAE and not AAL. .. _config-label: Configuring a build environment =============================== Connect to the IL Academic Compute Environment access node, **ssh-iam.intel-research.net**, as described in the `introduction `_. Resources required for working with FPGAs are stored in the **/export/fpga** tree. FPGA hardware release trees, Quartus, ModelSim, OPAE and setup scripts are all stored there. Configure a shell for working with an FPGA class, where class is one of the options from the :ref:`table above`: :: source /export/fpga/bin/setup-fpga-env By default, the configured environment will instantiate a shared version of the `OPAE SDK `_ and the Basic Building Blocks (`BBBs `_). Legacy AAL will be chosen instead of OPAE for class fpga-bdx-aal. The version of Quartus matching the hardware class is chosen automatically. Users requiring a private copy of the OPAE SDK are free to build their own copies. To use a private copy, set **OPAE_INSTALL_PATH** before invoking setup-fpga-env: :: # Select a non-default version of the OPAE SDK export OPAE_INSTALL_PATH= source /export/fpga/bin/setup-fpga-env Working with RTL ================ Please use generic IL Academic Compute Environment compute nodes for synthesis and simulation. Use actual FPGA systems and queues only when running on real FPGA hardware. When you reserve an FPGA system, you necessarily are given exclusive access to the entire machine and lock out all other users. Please use the **xeon** queue for synthesis and simulation. The distributed file system is shared across the entire lab. The qsub compilation scripts described below automatically choose the proper queues. These instructions apply to the OPAE software stack. If you are a new RTL user, choose OPAE. Please see the :ref:`AAL section` if you must use the legacy code. Synthesizing RTL designs ------------------------ Quartus and OPAE are pre-installed in /export/fpga and added to the path when the :ref:`configuration script ` setup-fpga-env is sourced. The configuration script also adds **/export/fpga/bin** to the PATH environment variable. The **qsub-synth** script in **/export/fpga/bin** submits a standard OPAE hardware build job to the proper queue. The configured environment, including the target FPGA class, is copied to the remote job's state. The build is performed relative to the current working directory. The following sequence clones the Basic Building Blocks (`BBB `_) Git repository, which contains a tutorial, and then synthesizes example one for execution on the currently configured FPGA: :: git clone https://github.com/OPAE/intel-fpga-bbb cd intel-fpga-bbb/samples/tutorial/01_hello_world/hw # Configure a Quartus build area afu_synth_setup -s rtl/axi/sources.txt build_fpga cd build_fpga # Run Quartus in the vLab batch queue qsub-synth # Monitor the build (the file is created after the job starts) tail -f build.log qsub-synth invokes the usual OPAE run.sh synthesis script and will typically run for a long time. During compilation, output is written to build.log. The batch system generates an output log file with the same contents when the job exits. The `tutorial itself `_ explains the details of the configuration and synthesis steps. Executing on FPGAs ------------------ Once your synthesis completes and meets timing, allocate an interactive session on a machine with an FPGA by invoking **qsub-fpga**. The script selects a queue to match the FPGA class chosen during :ref:`configuration` and opens a new shell. The **-V** argument is automatically passed to qsub, causing the environment to be copied to the new shell. Execute the synthesized image compiled above: :: # Open a shell on an FPGA system of the configured class qsub-fpga # Go to the directory whree qsub-synth was run above cd $PBS_O_WORKDIR # Load the image onto an FPGA fpgaconf cci_hello.gbs # Compile the matching software cd ../../sw make # Run the program ./cci_hello The program should print "Hello World!". Instead of having to change to the $PBS_O_WORKDIR every time you invoke qsub-fpga, consider adding the following code to your ~/.bashrc to match the working directory where qsub was invoked: :: if [ "$PBS_ENVIRONMENT" == "PBS_INTERACTIVE" ] && [ "$PBS_O_WORKDIR" != "" ] && [ "$PBS_O_HOME" == "$PWD" ] && [ "$PBS_O_WORKDIR" != "$PWD" ]; then echo "Moving to PBS workdir: $PBS_O_WORKDIR" cd "$PBS_O_WORKDIR" fi Now, qsub-fpga will open an interactive shell on an FPGA host in the same working directory. Simulating designs with ASE --------------------------- Run your simulations as interactive sessions on generic batch machines: :: qsub-sim The `tutorial `_ describes the process for running ASE simulations in great detail. Simulation requires two windows: one to run the RTL simulator and the other to run the application's software. Both processes must be on the same host. For this example, we use `tmux `_ to split the screen into two shells. You could choose to use other methods for opening two shells on a simulation host, such as running an xterm tunneled through your ssh session. ^b below is control-b, the tmux command escape sequence: :: qsub-sim # Go to the same path as the intel-fpga-bbb example above cd # Construct a simulation build directory afu_sim_setup --source rtl/axi/sources.txt build_sim # Split the screen tmux ^b" # Compile and run the RTL simulator cd build_sim make make sim # Switch to the other (software) pane ^bo cd ../sw make # Copy the export ASE_WORKDIR= from the RTL simulator pane and invoke it here export ASE_WORKDIR= with_ase ./hello_world_ase .. _aal-section-label: AAL legacy software ------------------- A few Broadwell machines remain available with the AAL driver loaded if you must continue to use AAL. The AAL user-space development environment is loaded by fpga-setup-env when the fpga-bdx-aal class is chosen. In general, the instructions for using AAL within the IL Academic Compute Environment are similar to OPAE. Configure the environment with **fpga-setup-env**, run synthesis jobs with **qsub-synth**, run simulation with **qsub-sim** and allocate an FPGA with **qsub-fpga**. Working with OpenCL =================== OpenCL designs may be executed on all OPAE-based systems except for Broadwell (fpga-bdx-opae). For Broadwell, separate fpga-bdx-opencl class systems are required. The separation of RTL and OpenCL was necessary on Broadwell due to limitations in Quartus when the base bitstream was written. The IL Academic Compute Environment-specific instructions for running OpenCL designs are largely similar to the instructions for working with RTL. Please use generic IL Academic Compute Environment compute nodes for synthesis and simulation. Use actual FPGA systems and queues only when running on real FPGA hardware. When you reserve an FPGA system, you necessarily are given exclusive access to the entire machine and lock out all other users. Please use the **xeon** queue for synthesis and simulation. The distributed file system is shared across the entire lab. Configure an OpenCL environment ------------------------------- Follow the instructions for :ref:`configuring your environment `, sourcing **setup-fpga-env** with an FPGA class that supports OpenCL. The setup scripts automatically configure the target-specific board support package (BSP), Quartus version and OpenCL compiler. All systems support OpenCL except fpga-bdx-opae and fpga-bdx-aal. Executing compiled designs -------------------------- The **qsub-fpga** script allocates an FPGA matching the class passed to setup-fpga-env and opens a shell on the target machine. Once running on the FPGA system there are no IL Academic Compute Environment-specific differences for running workloads. Each release ships with example designs, found in their top-level **opencl** directories. This is typically **$AOCL_BOARD_PACKAGE_ROOT/..**, the board support directory's parent. You may copy the examples to your own directories to compile and execute them. Most examples include pre-built bitfiles. Synthesizing hardware with OpenCL --------------------------------- We provide the **qsub-aoc** script for invoking the OpenCL compiler in the batch system. The script assumes that the target platform class has already been configured with setup-fpga-env. To synthesize the hardware side of the mem_bandwidth example: :: # Change from the execution directory above to the kernel source directory cd device # Invoke aoc qsub-aoc mem_bandwidth.cl This will take a while for even the simplest of designs. The output of the AOC job will be redirected into two log files. During the build, output is written to **build.log**. Once the build completes, the same output stream is copied by the batch system to a file prefixed with build_qsub_aoc. For this mem_bandwidth the AOC toolchain took about 245 minutes to generate a bitfile. You can check on your build status by running the command "qstat -u $USER". While the job is running it will have state "R" and, when complete, it will show up as "C". Complete jobs are drained from the scheduler so it's also possible it won't show up at all if enough time has passed from completion. .. _multi-fpga-config-label: Working on Multi-FPGA Systems ============================= When reconfiguring systems that have more than one FPGA of the same class it is necessary to indicate the specific FPGA to update. On multi-FPGA systems, fpgaconf accepts a -\\-bus argument. Bus numbers are included in the output of **fpgainfo port**, e.g. 0xaf. OpenCL defines its own namespace for enumerating FPGAs -- usually acl0, acl1, etc. **aocl list-devices** shows all available targets.