TNN FPGA Accelerator

Ternary Neural Networks enable very efficient implementation on FPGAs.
For more information, please read our initial paper on Ternary Neural Networks:
https://arxiv.org/abs/1609.00222

This page contains our demonstration hardware implementation of a Ternary Neural Network, targeting the Xilinx VC709 FPGA board.
http://www.xilinx.com/products/boards-and-kits/dk-v7-vc709-g.html

Contacts:
Frédéric Pétrot (permanent staff), see this page
Adrien Prost-Boucle (now with Synopsys), see this page
Alban Bourge (now with Atos-Bull), see this page
(in short: email is <firstname>.<lastname>@univ-grenoble-alpes.fr)

Requirements

Our works are built for GNU/Linux operating systems.

The RIFFA framework is used to interface the hardware design with the software application.
The RIFFA driver and library must be installed on your machine.
http://riffa.ucsd.edu/

You will need root rights to ask the Linux kernel to scan/remove PCI-Express devices, when programming or re-programming the board.

To program the board, you will need either the Xilinx Vivado tool suite, or the open-source program xc3sprog (this is the default command in the provided Makefile).
http://xc3sprog.sourceforge.net/

To program the board and/or to use UART, you may need to install a library to talk to FTDI chips: libftdi and libftd2xx. Check what packages are provided by your distribution.

The power consumption is measured with the on-board PMBus, a bus based on I2C that enables to read the power directly from the on-board power converters.
To enable users to monitor the power consumption without interfering with PCI-Express workloads, our hardware design includes a simple UART-to-PMBus interface.
You need rights to use the UART ports, probably over USB.

You also need one VC709 board!

IJCNN 2017 paper

This section provides demonstration configurations for our IJCNN 2017 paper. The draft can be obtained here:
https://arxiv.org/abs/1609.00222

Download the archive (14.6 MB, format tar.gz)
For instructions, see the file README inside the archive.

Link to training and ternarization project:
https://github.com/slide-lig/tnn-train

Note: our designs have been improved since the initial paper submission.
FPGA power is now much lower than in the paper.
We also reach slightly higher accuracy for datasets CIFAR10, GTSRB and SVHN (see our FPL2017 draft paper below).

FPL 2017 paper

This section provides demonstration configurations for our FPL 2017 paper (upcoming – September 4-8th).
The draft can be obtained here:
https://hal.archives-ouvertes.fr/hal-01563763

We will participate to the Demo Night on Wednesday, September 6th, with a live demo of our accelerator designs on board VC709.

Download the archive (90.6 MB, format tar.gz)
For instructions, see the file README inside the archive.
Data in this archive allows to reproduce speed and power and to verify functionality, but not accuracy yet. Some data was lost and we have to perform training again…

Link to training and ternarization project:
https://github.com/slide-lig/tnn-train

ACM TRETS 2018 paper

This paper is an (large) extension of  our FPL 2017 paper in which we detail the way to extract parallelism, and designed many optimizations so that we can fit more efficient networks into the same FPGA. Quite a bit of a low level hacking! Note that this is yet another improvement as compared to the FPL 2017 version, and we reach a throughput of 60 kfps (32×32 frames) at 11 Watts (more than 5Kfps/W).
The draft can be obtained here:
https://hal.archives-ouvertes.fr/hal-01686718v2https://hal.archives-ouvertes.fr/hal-02103214v1

IEEE TVLSI 2019 paper

This paper focuses solely on the decompression of ternary sequences on binary strings. This is a must for ASIC implementation of TNN, but is by itself quite interesting. If anyone can come out with a better decompression scheme than ours, we’d like to know about it.
The draft can be obtained here:
https://hal.archives-ouvertes.fr/hal-02103214v1