Hello,

My colleagues at Deephaven Data Labs and I have been addressing problems at
the intersection of data-driven applications, data science, and updating
(/ticking) data for some years.

Deephaven has a query engine that supports updating tabular data via a
protocol that communicates precise changes about datasets, such as 1) which
rows were removed, 2) which rows were added, 3) which rows were modified
(and for which columns). We are inspired by Arrow and would like to adopt a
version of this protocol that adheres to goals similar to Arrow and Arrow
Flight.

Out of the box, Arrow Flight is insufficient to represent such a stream of
changes. For example, because you cannot identify a particular row within
an Arrow Flight, you cannot indicate which rows were removed or modified.

The project integrates with Arrow Flight at the header-metadata level. We
have preliminarily named the project Barrage as in a "barrage of arrows"
which plays in the same "namespace" as a "flight of arrows."

We built this as part of an initiative to modernize and open up our table
IPC mechanisms. This is part of a larger open source effort which will
become more visible in the next month or so once we've finished the work
necessary to share our core software components, including a unified static
and real time query engine complete with data visualization tools, a REPL
experience, Jupyter integration, and more.

I would like to find out:
- if we have understood the primary goals of Arrow, and are honoring them
as closely as possible
- if there are other projects that might benefit from sharing this
extension of Arrow Flight
- if there are any gaps that are best addressed early on to maximize future
compatibility

A great place to digest the concepts that differ from Arrow Flight are here:
https://deephaven.github.io/barrage/Concepts.html

The proposed protocol can be perused here:
https://github.com/deephaven/barrage

Internally, we already have a java server and java client implemented as a
working proof of concept for our use case.

I really look forward to your feedback; thank you!

Nate Bauernfeind

Deephaven Data Labs - https://deephaven.io/
--

Reply via email to