hi Julien, Having standard RPC/REST messaging protocols for systems to implement sounds like a great idea to me. Some systems might choose to pack Arrow files or streams into a Protocol Buffer or Thrift message, but it would be good to have a "native" protocol for the streaming file format in particular.
I will be happy to provide feedback on a spec for this and to help soliciting input from other projects which may use the spec. Thanks, Wes On Wed, Mar 15, 2017 at 11:02 PM, Julien Le Dem <jul...@dremio.com> wrote: > We’re working on finalizing a few types and writing the integration tests > that go with them. > > At this point we have a solid foundation in the Arrow project. > > As a next step I’m going to look into adding an Arrow RPC/REST interface > dedicated to data retrieval. > > We had several discussions about this and I’m going to formalize a spec and > ask for review. > > This Arrow based data access interface is intended to be used by systems > that need access to data for processing (SQL engines, processing > frameworks, …) and implemented by storage layers or really anything that > can produce data (including processing frameworks return result sets for > example). That will greatly simplify integration between the many actors in > each category. > > The basic premise is to be able to fetch data in Arrow format while > benefitting from the no-overhead serialization deserialization and getting > the data in columnar format. > > Some obvious topics that come to mind: > > - How do we identify a dataset? > > - How do we specify projections? > > - What about predicate push downs or in general parameters? > > - What underlying protocol to use? HTTP2? > > - push vs pull? > > - build a reference implementation (Suggestions?) > > Potential candidates for using this: > > - to consume data or to expose result sets: Drill, Hive, Presto, Impala, > Spark, RecordService... > - as a server: Kudu, HBase, Cassandra, … > > -- > Julien