Re: Arrow as a streaming format

2020-09-21 Thread Micah Kornfield
> > Is there any chance you could point me to those abstractions so that I may > have a look and play around with them? Sorry if there doesn't exist anything in Java (and I realize that might have been what you were expecting). I was thinking of C++/Python which have ChunkedArray classes. The c

Re: Arrow as a streaming format

2020-09-20 Thread Pedro Silva
Is there any chance you could point me to those abstractions so that I may have a look and play around with them? Sent from my iPhone > On 20 Sep 2020, at 05:17, Micah Kornfield wrote: > >  >> Furthermore, these types of queries seem to fit what I would call (for lack >> of a better word) "s

Re: Arrow as a streaming format

2020-09-19 Thread Micah Kornfield
> > Furthermore, these types of queries seem to fit what I would call (for > lack of a better word) "sliding" dataframes. Arrow's aim (as I understand > it) is to standardized the static dataframe data structure memory model, > can it also support a sliding version? I don't think there are any ex

Re: Arrow as a streaming format

2020-09-10 Thread Pedro Silva
Hi Micah, Thank you for your reply and the links, the threads were quite interesting. You are right, I opened the flink issue regarding arrow support to understand whether it was on their roadmap to take a look at. My use-case is processing a stream of events (or rows if you will) to compute ~100

Re: Arrow as a streaming format

2020-09-09 Thread Mark Farnan
+1 on this also. As per previous questions, this is something I am also looking into. IIOT realtime streaming, it can be as low as one datapoint per 'message' / block / packet etc.Or at best. one 'row'. i.e. 1 second streaming sensor data, or faster which also has a 1 second latency / u

Re: Arrow as a streaming format

2020-09-09 Thread Fan Liya
+1 for introducing Arrow in streaming processing, as we have made some attempts on this. IMO, the metadata overhead is not likely to be a problem. If the streaming data is having a high arriving rate, we can compensate for this with a large batch size without impacting the response time, while if

Re: Arrow as a streaming format

2020-09-04 Thread Micah Kornfield
Hi Pedro, I think the answer is it likely depends. The main trade-off in using Arrow in a streaming process is the high metadata overhead if you have very few rows. There have been prior discussions on the mailing list about row-based and streaming that might be useful [1][2] in expanding on the

Re: Arrow as a streaming format

2020-09-04 Thread Radu Teodorescu
Hi Pedro, You should be able to use flight for this: pack you subscription call in a DoGet and listen on the FlightDataStream for new data. I thinkˆyou can control the granularity of your messages through the size of the record batches you are writing, but I am not a flight developer so don’t t