Re: Remote datasets

2022-04-12 Thread Weston Pace
gt; So if my local node has memory constraints, will it be able to stream > > data > > > from an Apache Flight datasource and stream it back to a different Apache > > > Flight target? > > > If the answer is yes, is it because there will be a Remote Dataset > > c

Re: Remote datasets

2022-04-12 Thread Adam Lippai
> > Adam Lippai > > > > On Tue, Apr 12, 2022 at 4:14 PM James Duong .invalid> > > wrote: > > > >> Hi Adam, > >> > >> Arrow Flight can be used to provide an RPC framework that returns > datasets > >> (sent over the wire as ar

Re: Remote datasets

2022-04-12 Thread David Li
sed to provide an RPC framework that returns datasets >> (sent over the wire as arrow buffers) and exposes them from a FlightClient >> as Arrow RecordBatches without serialization. Is this what you mean by >> remote datasets? >> Arrow Flight SQL is an application layer

Re: Remote datasets

2022-04-12 Thread Adam Lippai
> Hi Adam, > > Arrow Flight can be used to provide an RPC framework that returns datasets > (sent over the wire as arrow buffers) and exposes them from a FlightClient > as Arrow RecordBatches without serialization. Is this what you mean by > remote datasets? > Arrow Flight SQL is an ap

Re: Remote datasets

2022-04-12 Thread David Li
pr 12, 2022, at 15:51, Adam Lippai wrote: > Hi, > > I saw really nice features like groupby and join developed recently. > I like how Dataset is supported for joins and how streamed processing is > gaining momentum in Arrow. > > Does Apache Arrow have the concept of remote datase

Re: Remote datasets

2022-04-12 Thread James Duong
Hi Adam, Arrow Flight can be used to provide an RPC framework that returns datasets (sent over the wire as arrow buffers) and exposes them from a FlightClient as Arrow RecordBatches without serialization. Is this what you mean by remote datasets? Arrow Flight SQL is an application layer built on

Remote datasets

2022-04-12 Thread Adam Lippai
Hi, I saw really nice features like groupby and join developed recently. I like how Dataset is supported for joins and how streamed processing is gaining momentum in Arrow. Does Apache Arrow have the concept of remote datasets eg using Arrow Flight? Or will this happen directly using S3 and