Re: Arrow streaming computation engine

2022-01-12 Thread Weston Pace
> Would appreciate it if you can give some pointers to how to > start playing with that code. I have a (somewhat) minimal example here: https://gist.github.com/westonpace/e555a3b1c269c31de7176d34f47a2fb0 The PR I mentioned earlier (https://github.com/apache/arrow/pull/12033) has more examples (tha

Re: Arrow streaming computation engine

2022-01-12 Thread Li Jin
Weston - Thanks for the pointer. The C++ streaming engine you pointed out is a lot like what I have in mind. Will take a close look at that. Would appreciate it if you can give some pointers to how to start playing with that code. Hou - Glad to hear that the DataFusion community has similar ideas.

Re: Arrow streaming computation engine

2022-01-11 Thread QP Hou
For datafusion (the Rust engine that Weston mentioned), the community is about to start building a PoC for streaming engine. The discussion is happening at https://github.com/apache/arrow-datafusion/issues/1544. On Tue, Jan 11, 2022 at 3:29 PM Weston Pace wrote: > > First, note that there are dif

Re: Arrow streaming computation engine

2022-01-11 Thread Weston Pace
First, note that there are different computation engines in different languages. The Rust implementation has datafusion[1] for example. For the rest of this email, I will speak in more detail specifically about the C++ computation engine (which I am more familiar with) that is in place today. The

Arrow streaming computation engine

2022-01-11 Thread Li Jin
Hi, This is a somewhat lengthy email about thoughts around a streaming computation engine for Arrow dataset that I would like to hear feedback from Arrow devs. The main use cases that we are thinking for the streaming engine are time series data, i.e., data arrives in time order (e.g. daily US st