[Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Johan Peltenburg - EWI Fri, 09 Feb 2018 13:11:43 -0800

Dear community,

In follow-up of the e-mail below, we have made public our repository that 
contains our framework called Fletcher: A framework to integrate FPGA 
accelerators with Apache Arrow.


https://github.com/johanpel/fletcher

With this framework you are able to provide an Arrow schema from which an 
easy-to-use hardware interface for FPGAs is generated, reaping all the benefits 
that Arrow already offers. On top of that it increases the programmability of 
any acceleration project you'd want to build on top of Arrow. During run-time, 
you simply pass your Arrow table to the run-time part of the framework and your 
hardware will be able to read from it by using row index ranges, receiving 
streams of data in the form of the type you've defined through the schema.

Currently there is an example project that does regular expression matching on 
an Arrow table with strings, running on the Amazon EC2 F1 platform. We are not 
sponsored by Amazon, but as anyone can launch an instance with an FPGA there, 
we thought it would be a good starting point to hopefully gain some interest, 
even if you don't have an FPGA card yourself.

FPGA accelerators can be so fast that more often than not serialization kills a 
relatively large part of the performance. Our measurements in this (relatively 
simple) example show that by using Arrow to prevent serialization, we sometimes 
get up to 6X improvement in performance over not using Arrow, especially if we 
start in languages that run on JVMs, for example. (Thanks everyone!)

We are looking forward for people with a little bit of FPGA experience to try 
it out and receive their thoughts, comments, etc. Please drop me an e-mail.

With kind regards,

Johan Peltenburg
Computer Engineering Lab
Delft University of Technology
________________________________________
From: Johan Peltenburg [j.w.peltenb...@tudelft.nl]
Sent: Tuesday, November 28, 2017 16:29
To: dev@arrow.apache.org
Subject: Development of an FPGA Accelerator framework around Apache Arrow

Dear community,

Over the last year we have been looking into integration of FPGA
accelerators
with big data frameworks such as Spark. Before Arrow took off, we
experienced
many issues like serialization overhead but also garbage collection issues,
as well as language interoperability issues with our low-level stack. These
are all problems that Arrow is now already solving for us in a very nice
manner.

We see a growing amount of support for infrastructure providers such as
Amazon
that offer instances with FPGA resources already. Also, we see very rapid
advancements from the hardware technology side, where soon enough
accelerators can (cache-coherently) be attached to host memory (for
example in
OpenCAPI), allowing accelerators to work in the same virtual address
space as
the host process.

We believe that a somewhat standardized format for data in-memory like
Arrow
can help us generalize big data processing in FPGAs tremendously. At the
same
time, it is known to us that FPGAs are notorious for their high
development time
and low programmability. Therefore, to alleviate some of these burdens
put upon
an accelerator developer, we are building a generalized framework around
Arrow
that abstracts away a very cumbersome aspect of FPGA design; interfacing
with
the data.

The framework takes Arrow Schemas as input, and generates a layer that
on the
one side interfaces with whatever the host platform provides to access host
memory (our initial framework will target support for AXI and OpenCAPI),
and
on the other side will interface with the user kernel.

The user can express request for access to the data in terms of row index
ranges. The generated layer will then provide data streams to the user,
which
the user may read using some kernel that they designed using high-level
synthesis (for example they could write the kernel in OpenCL). Thus,
they do
not need to go into the specifics of the Arrow in-memory format, bother
with
creating hardware constructs to deal with index buffers and validity
buffers,
interfacing with the host-side bus, implementing FIFO's, etc... anymore.
Hopefully this will be beneficial to faster deployment of FPGA accelerated
applications based on data represented in the Arrow format.

Currently the framework supports schemas of primitive data types, (nested)
lists and structs. The major challenge here was to be able to generate
hardware
structures from the many forms of schemas that users may provide, but these
challenges have been solved. We are in the process of testing the
framework in
simulation, and will soon move to a test on real FPGA systems. With a
bit of luck
we hope to initially release our framework in January.

We will fully open-source this framework and will attempt to make it as
vendor
independent as possible. Initially we hope to provide some example
applications
that demonstrate some of the benefits of using our framework in terms of
productivity and the benefits of using FPGAs for specific problems in big
data in general.

We are reaching out for your comments, questions, suggestions, etc... Please
give us your thoughts about this. Thank you in advance.

With kind regards,

Johan Peltenburg
Computer Engineering Lab
Delft University of Technology

[Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Reply via email to