Technical Documents
Back to list

A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

by deepx
LOK-WON KIM, Cisco Systems
SAMEH ASAAD and RALPH LINSKER, IBM T. J. Watson Research Center

Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs
because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms
are highly parallelizable. Previous work on FPGAs has shown how hardware parallelism can be used to
accelerate a “Restricted Boltzmann Machine” (RBM) ANN algorithm, and how to distribute computation
across multiple FPGAs.
Here we describe a fully pipelined parallel architecture that exploits “mini-batch” training (combining
many input cases to compute each set of weight updates) to further accelerate ANN training. We implement
on an FPGA, for the first time to our knowledge, a more powerful variant of the basic RBM, the “Factored
RBM” (fRBM). The fRBM has proved valuable in learning transformations and in discovering features that
are present across multiple types of input. We obtain (in simulation) a 100-fold acceleration (vs. CPU software)
for an fRBM having N = 256 units in each of its four groups (two input, one output, one intermediate
group of units) running on a Virtex-6 LX760 FPGA. Many of the architectural features we implement are
applicable not only to fRBMs, but to basic RBMs and other ANN algorithms more broadly.
Patent Section
AI Applications0+AI Memory Architecture0+AI Vision/ISP0+NPU0+Trade Mark0+AR/VR Applications0+
Total Approved & Pending