I'll add my perspective (which hopefully doesn't confuse things more).
I think the fragment concept is a little too specific and the key
abstraction here is "stream of homogenous record batches". This
manifests in a few different flavors (synchronous, asynchronous,
push/pull) but we have some gene
Hi David,
This is a perfect answer. I was looking for the Fragment concept and the
issues you linked make it easy to follow.
I understand this is a really hard field with a ton of work, getting
chunking, prefetch and backpressure correctly + adding filter predicate and
other computation pushdown i
TL;DR yes, if and when all is said and done.
Breaking this down…
Substrait isn't really relevant here. It's a way to serialize a query in a way
that's agnostic to whatever's actually generating or executing the query.
But if you have a Substrait plan, that can get converted by the Arrow C++ Que
Hi James,
Your answer helps, yes.
My question is whether I will be able to join two datasets (producing a new
dataset) in a streaming way or do I have to fetch the whole response and
keep it in memory?
So if my local node has memory constraints, will it be able to stream data
from an Apache Flight
Hey Adam,
Good question, there are outstanding JIRAs to integrate Flight [1] and HTTP/FTP
[2] into Datasets/Filesystems. There are also some JIRAs about various RDBMSes
[3] that could also be viewed along a Datasets lens perhaps.
Note that this work all proceeds in layers, e.g. it's the C++ qu
Hi Adam,
Arrow Flight can be used to provide an RPC framework that returns datasets
(sent over the wire as arrow buffers) and exposes them from a FlightClient
as Arrow RecordBatches without serialization. Is this what you mean by
remote datasets?
Arrow Flight SQL is an application layer built on t