FPGA Accelerators

Intel is building a family of FPGA accelerators aimed at data centers. The family shares a common software layer, the Open Programmable Acceleration Engine (OPAE), as well as a common hardware-side Core Cache Interface (CCI-P). We are making a collection of systems available to researchers through IL Academic Compute Environment and the Intel Hardware Accelerator Research Program (HARP).

Support

For support, please join the HARP Forum and post questions there.

FPGA system classes

System class names are used for multiple purposes. Classes match queue names in the batch system. Class names are also used in setup scripts to configure the Quartus synthesis environment for particular hardware.

FPGA Classes

Class

Description

fpga-pac-s10

PCIe D5005 Programmable Acceleration Cards (PACs) with a Stratix 10 SX
FPGA (1SX280HN2F43E2VG). OpenCL and RTL are both supported. These
cards offer PCIe Gen 3 x16 and 4 channels of 8GB DDR4-2400 with ECC.

fpga-pac-s10-2

Two PCIe D5005 Programmable Acceleration Cards (PACs) with a Stratix 10 SX
FPGA (1SX280HN2F43E2VG). OpenCL and RTL are both supported. The
cards are not yet networked with QSFP+ ports, but could be used for
projects requiring a pair of cards, communicating through system
memory. Please use the larger fpga-pac-s10 pool if your application
uses only a single FPGA. Please note the instructions below for
configuring multi-FPGA systems.

fpga-pac-a10

PCIe Programmable Acceleration Cards (PACs) with an Arria 10 GX
FPGA (10AX115N2F40E2LG). Two cards are installed in each system and the
40GbE QSFP+ ports are connected to each other in order to facilitate
research on networked FPGAs. OpenCL and RTL are both supported.
Please note the instructions below for configuring multi-FPGA systems.

fpga-bdx-opae

Broadwell Xeon CPUs (E5-2600v4) with an integrated in-package Arria 10
GX1150 FPGA (10AX115U3F45E2SGE3). These systems are for workloads
written in RTL.

fpga-bdx-opencl

The same Broadwell Xeon+FPGA systems as fpga-bdx-opae, but configured
for use with logic written in OpenCL. Unlike later OPAE-managed
systems, Broadwell uses different FPGA-side base logic for OpenCL,
forcing a separation of RTL and OpenCL servers.

fpga-bdx-aal

The same Broadwell Xeon+FPGA systems as fpga-bdx-opae, but the loaded
Linux kernel driver is Intel’s legacy Accelerator Abstraction Layer
(AAL). New projects should use OPAE and not AAL.

Configuring a build environment

Connect to the IL Academic Compute Environment access node, ssh-iam.intel-research.net, as described in the introduction. Resources required for working with FPGAs are stored in the /export/fpga tree. FPGA hardware release trees, Quartus, ModelSim, OPAE and setup scripts are all stored there.

Configure a shell for working with an FPGA class, where class is one of the options from the table above:

source /export/fpga/bin/setup-fpga-env <fpga-class>

By default, the configured environment will instantiate a shared version of the OPAE SDK and the Basic Building Blocks (BBBs). Legacy AAL will be chosen instead of OPAE for class fpga-bdx-aal. The version of Quartus matching the hardware class is chosen automatically.

Users requiring a private copy of the OPAE SDK are free to build their own copies. To use a private copy, set OPAE_INSTALL_PATH before invoking setup-fpga-env:

# Select a non-default version of the OPAE SDK
export OPAE_INSTALL_PATH=<path to private OPAE SDK installation>
source /export/fpga/bin/setup-fpga-env <fpga-class>

Working with RTL

Please use generic IL Academic Compute Environment compute nodes for synthesis and simulation. Use actual FPGA systems and queues only when running on real FPGA hardware. When you reserve an FPGA system, you necessarily are given exclusive access to the entire machine and lock out all other users. Please use the xeon queue for synthesis and simulation. The distributed file system is shared across the entire lab. The qsub compilation scripts described below automatically choose the proper queues.

These instructions apply to the OPAE software stack. If you are a new RTL user, choose OPAE. Please see the AAL section if you must use the legacy code.

Synthesizing RTL designs

Quartus and OPAE are pre-installed in /export/fpga and added to the path when the configuration script setup-fpga-env is sourced. The configuration script also adds /export/fpga/bin to the PATH environment variable.

The qsub-synth script in /export/fpga/bin submits a standard OPAE hardware build job to the proper queue. The configured environment, including the target FPGA class, is copied to the remote job’s state. The build is performed relative to the current working directory.

The following sequence clones the Basic Building Blocks (BBB) Git repository, which contains a tutorial, and then synthesizes example one for execution on the currently configured FPGA:

git clone https://github.com/OPAE/intel-fpga-bbb
cd intel-fpga-bbb/samples/tutorial/01_hello_world/hw
# Configure a Quartus build area
afu_synth_setup -s rtl/axi/sources.txt build_fpga
cd build_fpga
# Run Quartus in the vLab batch queue
qsub-synth
# Monitor the build (the file is created after the job starts)
tail -f build.log

qsub-synth invokes the usual OPAE run.sh synthesis script and will typically run for a long time. During compilation, output is written to build.log. The batch system generates an output log file with the same contents when the job exits.

The tutorial itself explains the details of the configuration and synthesis steps.

Executing on FPGAs

Once your synthesis completes and meets timing, allocate an interactive session on a machine with an FPGA by invoking qsub-fpga. The script selects a queue to match the FPGA class chosen during configuration and opens a new shell. The -V argument is automatically passed to qsub, causing the environment to be copied to the new shell.

Execute the synthesized image compiled above:

# Open a shell on an FPGA system of the configured class
qsub-fpga
# Go to the directory whree qsub-synth was run above
cd $PBS_O_WORKDIR
# Load the image onto an FPGA
fpgaconf cci_hello.gbs
# Compile the matching software
cd ../../sw
make
# Run the program
./cci_hello

The program should print “Hello World!”.

Instead of having to change to the $PBS_O_WORKDIR every time you invoke qsub-fpga, consider adding the following code to your ~/.bashrc to match the working directory where qsub was invoked:

if [ "$PBS_ENVIRONMENT" == "PBS_INTERACTIVE" ] && [ "$PBS_O_WORKDIR" != "" ] &&
   [ "$PBS_O_HOME" == "$PWD" ] && [ "$PBS_O_WORKDIR" != "$PWD" ]; then
  echo "Moving to PBS workdir: $PBS_O_WORKDIR"
  cd "$PBS_O_WORKDIR"
fi

Now, qsub-fpga will open an interactive shell on an FPGA host in the same working directory.

Simulating designs with ASE

Run your simulations as interactive sessions on generic batch machines:

qsub-sim

The tutorial describes the process for running ASE simulations in great detail. Simulation requires two windows: one to run the RTL simulator and the other to run the application’s software. Both processes must be on the same host. For this example, we use tmux to split the screen into two shells. You could choose to use other methods for opening two shells on a simulation host, such as running an xterm tunneled through your ssh session.

^b below is control-b, the tmux command escape sequence:

qsub-sim
# Go to the same path as the intel-fpga-bbb example above
cd <path to intel-fpga-bbb/samples/tutorial/01_hello_world/hw>
# Construct a simulation build directory
afu_sim_setup --source rtl/axi/sources.txt build_sim
# Split the screen
tmux
^b"
# Compile and run the RTL simulator
cd build_sim
make
make sim
# Switch to the other (software) pane
^bo
cd ../sw
make
# Copy the export ASE_WORKDIR=<path> from the RTL simulator pane and invoke it here
export ASE_WORKDIR=<path to intel-fpga-bbb/samples/tutorial/01_hello_world/hw/build_sim/work>
with_ase ./hello_world_ase

AAL legacy software

A few Broadwell machines remain available with the AAL driver loaded if you must continue to use AAL. The AAL user-space development environment is loaded by fpga-setup-env when the fpga-bdx-aal class is chosen. In general, the instructions for using AAL within the IL Academic Compute Environment are similar to OPAE. Configure the environment with fpga-setup-env, run synthesis jobs with qsub-synth, run simulation with qsub-sim and allocate an FPGA with qsub-fpga.

Working with OpenCL

OpenCL designs may be executed on all OPAE-based systems except for Broadwell (fpga-bdx-opae). For Broadwell, separate fpga-bdx-opencl class systems are required. The separation of RTL and OpenCL was necessary on Broadwell due to limitations in Quartus when the base bitstream was written.

The IL Academic Compute Environment-specific instructions for running OpenCL designs are largely similar to the instructions for working with RTL. Please use generic IL Academic Compute Environment compute nodes for synthesis and simulation. Use actual FPGA systems and queues only when running on real FPGA hardware. When you reserve an FPGA system, you necessarily are given exclusive access to the entire machine and lock out all other users. Please use the xeon queue for synthesis and simulation. The distributed file system is shared across the entire lab.

Configure an OpenCL environment

Follow the instructions for configuring your environment, sourcing setup-fpga-env with an FPGA class that supports OpenCL. The setup scripts automatically configure the target-specific board support package (BSP), Quartus version and OpenCL compiler. All systems support OpenCL except fpga-bdx-opae and fpga-bdx-aal.

Executing compiled designs

The qsub-fpga script allocates an FPGA matching the class passed to setup-fpga-env and opens a shell on the target machine. Once running on the FPGA system there are no IL Academic Compute Environment-specific differences for running workloads.

Each release ships with example designs, found in their top-level opencl directories. This is typically $AOCL_BOARD_PACKAGE_ROOT/.., the board support directory’s parent. You may copy the examples to your own directories to compile and execute them. Most examples include pre-built bitfiles.

Synthesizing hardware with OpenCL

We provide the qsub-aoc script for invoking the OpenCL compiler in the batch system. The script assumes that the target platform class has already been configured with setup-fpga-env.

To synthesize the hardware side of the mem_bandwidth example:

# Change from the execution directory above to the kernel source directory
cd device
# Invoke aoc
qsub-aoc mem_bandwidth.cl

This will take a while for even the simplest of designs. The output of the AOC job will be redirected into two log files. During the build, output is written to build.log. Once the build completes, the same output stream is copied by the batch system to a file prefixed with build_qsub_aoc.

For this mem_bandwidth the AOC toolchain took about 245 minutes to generate a bitfile. You can check on your build status by running the command “qstat -u $USER”. While the job is running it will have state “R” and, when complete, it will show up as “C”. Complete jobs are drained from the scheduler so it’s also possible it won’t show up at all if enough time has passed from completion.

Working on Multi-FPGA Systems

When reconfiguring systems that have more than one FPGA of the same class it is necessary to indicate the specific FPGA to update. On multi-FPGA systems, fpgaconf accepts a -\-bus argument. Bus numbers are included in the output of fpgainfo port, e.g. 0xaf.

OpenCL defines its own namespace for enumerating FPGAs – usually acl0, acl1, etc. aocl list-devices shows all available targets.