If your use case is SQL RPC, then you are getting close to Avatica's territory. Avatica[1] is a protocol for implementing language-independent JDBC and ODBC stacks.
Now, I agree that many ODBC implementations are inefficient. Some ODBC stacks make more round trips than necessary, and do more copying than necessary. In Avatica we are trying to squeeze out those inefficiencies, for example minimizing the number of RPCs. We would also love to use Arrow as the data format and reduce copying on the server side and client side. But conversely, people who start with a simple RPC use case - send SQL, get the results - may soon find themselves needing a more complex protocol - authentication, sessions, prepared statements, bind variables, getting metadata before executing, cursors, skipping over rows. In other words, find themselves wanting substantial portions of an ODBC or JDBC driver. You could find yourselves building Avatica all over again. We saw all of this happen in XML-RPC, and it was sad. I suggest to keep flight for the truly simple use case, and for the more complex use case, invest effort putting Arrow into Avatica. We are always happy to welcome new contributors. Julian [1] https://calcite.apache.org/avatica/docs/ On Thu, Aug 16, 2018 at 7:56 AM Wes McKinney <wesmck...@gmail.com> wrote: > > To give some extra color on my personal motivation for interest in Arrow > Flight: > > Systems that expose databases on a network frequently send data very > slowly. For example, ODBC is in general extremely slow. What I would > like to see is servers that can expose a "sql" action type. > > So, in consideration of the protocol as it stands now [1], example > session goes like this: > > * Client issues ListActions -> returns one or more ActionType, suppose > one is "sql" > * Client issues DoAction with type sql and body "select * from $TABLE" > * Server returns stream URI for query result set and Ticket in the Result > proto > * Client issues GetFlightInfo using URI to obtain schema of result set > * Client issues DoGet with ticket returned by sql DoAction > > There's some possible refinements to this workflow; for example, if we > wanted to enable DoAction to return more structured results (e.g. to > avoid the extra GetFlightInfo RPC to get the schema of the query > result set) > > - Wes > > [1]: > https://github.com/apache/arrow/blob/c52897274035f8b5192d7647b9711c68d9c54ccc/java/flight/src/main/protobuf/flight.proto > > On Thu, Aug 16, 2018 at 10:29 AM, Jacques Nadeau <jacq...@apache.org> wrote: > > I'm out of town this week (vacation) and will be reviewing your feedback > > next week. Thanks for the feedback! > > > > On Thu, Aug 9, 2018, 8:45 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > >> hi folks, > >> > >> I left some feedback on this PR. If others could take a look > >> (particularly at the .proto service definition) that would be useful. > >> > >> We should decide on an approach to getting multiple production-worthy > >> Flight/RPC implementations ready to go. It would be a good goal to > >> deliver (end-to-end send/receive data between Python and Java, or > >> Python and other Python processes) in the next couple releases. > >> > >> - Wes > >> > >> On Wed, May 30, 2018 at 12:44 PM, Jacques Nadeau <jacq...@apache.org> > >> wrote: > >> > Correct, I'm maintaining standard protobuf encoding so a consumer that > >> > doesn't go byte by byte can still consumer/produce the messages. > >> > > >> > More impls: for sure. > >> > > >> > On Wed, May 30, 2018 at 9:01 AM, Wes McKinney <wesmck...@gmail.com> > >> wrote: > >> > > >> >> I see; looking more closely I see you've sidestepped the standard > >> >> Protobuf serialization to write the stream as tagged components: > >> >> > >> >> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff- > >> >> 02cfc9235e22653fce8a7636c9f95507R241 > >> >> > >> >> and then reading the fields of the message tag by tag > >> >> > >> >> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff- > >> >> 02cfc9235e22653fce8a7636c9f95507R159 > >> >> > >> >> Would it be correct that if a GRPC implementation doesn't provide > >> >> sufficient access to the byte stream (or if it doesn't care enough > >> >> about zero copy) that you could allow GRPC to return an instance of > >> >> the FlightData structure? > >> >> > >> >> I expect we'd want to see a few interoperable implementations (I > >> >> suggest Java, C++, Go) to harden the fine details. > >> >> > >> >> - Wes > >> >> > >> >> On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacq...@apache.org> > >> >> wrote: > >> >> > Cutting through the layers of GRPC will be a per language approach > >> thing. > >> >> > Assuming that each GRPC language implementation does a good job of > >> >> > separating message encapsulation from the base library, this should be > >> >> > straight-forward-ish. Hope improves around this as I see creation of > >> >> > non-protobuf protocols built on top of the base GRPC [1]. How to do > >> this > >> >> in > >> >> > each language will probably take time looking at the GRPC internals > >> for > >> >> > that language but can be a secondary step once you get the protocol > >> >> working > >> >> > (you can just pay for extra copies until then). > >> >> > > >> >> > In my Java approach I believe I do one read copy and zero write copies > >> >> > (needs more testing) which was my target. (Getting to zero-copy on > >> read > >> >> > means a lot more complexity because your socket-reading has to be > >> >> protocol > >> >> > aware: even our bespoke layer in Dremio doesn't try to do that. I'd > >> guess > >> >> > KRPC does the same but haven't reviewed the code to confirm.) > >> >> > > >> >> > Will try to get some more slides/readme and a proper proposed patch up > >> >> soon. > >> >> > > >> >> > [1] https://grpc.io/blog/flatbuffers > >> >> > > >> >> > > >> >> > > >> >> > On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmck...@gmail.com> > >> >> wrote: > >> >> > > >> >> >> hey Jacques, > >> >> >> > >> >> >> This is great news, I look forward to digging into this. My biggest > >> >> >> initial question is the Protobuf encapsulation, specifically: > >> >> >> > >> >> >> https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> >> >> src/main/protobuf/flight.proto#L99 > >> >> >> > >> >> >> My understanding of Protocol Buffers is that on read, the "data_body" > >> >> >> memory would be copied out of the serialized protobuf that came > >> across > >> >> >> the wire. Your comment in the .proto says this "comes last in the > >> >> >> definition to help with sidecar patterns" -- my read is that it would > >> >> >> be up to us to do our own sidecar implementation, similar to how > >> >> >> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the > >> >> >> comment there describes pretty much exactly the problem we have). I > >> >> >> saw that you also replied on a GRPC thread about this issue [2]. > >> Could > >> >> >> you summarize what (if anything) stands in the way to get zero-copy > >> on > >> >> >> write and read? > >> >> >> > >> >> >> - Wes > >> >> >> > >> >> >> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/ > >> >> >> rpc_sidecar.h#L34 > >> >> >> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment- > >> >> 391692087 > >> >> >> > >> >> >> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacq...@apache.org> > >> >> >> wrote: > >> >> >> > FYI, if you want to see an example server you can run with a GRPC > >> >> >> generated > >> >> >> > client, you can run the ExampleFlightServer located at [1]. Very > >> basic > >> >> >> > 'test' with that class and client is located at [2]. > >> >> >> > > >> >> >> > [1] > >> >> >> > https://github.com/jacques-n/arrow/tree/flight/java/flight/ > >> >> >> src/main/java/org/apache/arrow/flight/example > >> >> >> > [2] > >> >> >> > https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> >> >> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java > >> >> >> > > >> >> >> > > >> >> >> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau < > >> jacq...@apache.org> > >> >> >> wrote: > >> >> >> > > >> >> >> >> Hey All, > >> >> >> >> > >> >> >> >> I used my Strata talk today as a forcing function to make > >> additional > >> >> >> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it > >> >> “Apache > >> >> >> >> Arrow Flight”. You can take a look at the work here [2]. I’ll > >> work to > >> >> >> clean > >> >> >> >> up my work and explain my thoughts about the protocol in the > >> coming > >> >> >> days. > >> >> >> >> High-level: use protobuf as a encapsulation format so that any > >> client > >> >> >> that > >> >> >> >> is supported in GRPC will work. However, we can optimize the > >> >> read/write > >> >> >> >> path for targeted languages and hand control the > >> >> >> >> serialization/deserialization and memory handling. (I did that in > >> >> this > >> >> >> Java > >> >> >> >> patch [3][4][5].) I also looked at starting to use GRPC generated > >> >> >> bindings > >> >> >> >> within Python but it looks like some glue code may be needed in > >> the > >> >> C++ > >> >> >> >> layer since Python delegates down frequently. I also am still > >> trying > >> >> to > >> >> >> >> understand GRPC back-pressure patterns and whether the protocol > >> >> >> >> realistically needs to change to cover real-world high performance > >> >> use > >> >> >> >> cases. > >> >> >> >> > >> >> >> >> I’ll send out some slides about the ideas and update README, etc. > >> >> soon. > >> >> >> >> > >> >> >> >> Thanks, > >> >> >> >> Jacques > >> >> >> >> > >> >> >> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> >> >> >> src/main/protobuf/flight.proto > >> >> >> >> [2] http://github.com/jacques-n/arrow/ > >> >> >> >> [3] https://github.com/jacques-n/arrow/tree/flight/ > >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/grpc > >> >> >> >> [4] https://github.com/jacques-n/arrow/blob/flight/ > >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/ > >> >> >> ArrowMessage.java#L253 > >> >> >> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> >> >> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253> > >> >> >> >> [5] https://github.com/jacques-n/arrow/blob/flight/ > >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/ > >> >> >> ArrowMessage.java#L185 > >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >>