To give some extra color on my personal motivation for interest in Arrow Flight:
Systems that expose databases on a network frequently send data very slowly. For example, ODBC is in general extremely slow. What I would like to see is servers that can expose a "sql" action type. So, in consideration of the protocol as it stands now [1], example session goes like this: * Client issues ListActions -> returns one or more ActionType, suppose one is "sql" * Client issues DoAction with type sql and body "select * from $TABLE" * Server returns stream URI for query result set and Ticket in the Result proto * Client issues GetFlightInfo using URI to obtain schema of result set * Client issues DoGet with ticket returned by sql DoAction There's some possible refinements to this workflow; for example, if we wanted to enable DoAction to return more structured results (e.g. to avoid the extra GetFlightInfo RPC to get the schema of the query result set) - Wes [1]: https://github.com/apache/arrow/blob/c52897274035f8b5192d7647b9711c68d9c54ccc/java/flight/src/main/protobuf/flight.proto On Thu, Aug 16, 2018 at 10:29 AM, Jacques Nadeau <jacq...@apache.org> wrote: > I'm out of town this week (vacation) and will be reviewing your feedback > next week. Thanks for the feedback! > > On Thu, Aug 9, 2018, 8:45 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> hi folks, >> >> I left some feedback on this PR. If others could take a look >> (particularly at the .proto service definition) that would be useful. >> >> We should decide on an approach to getting multiple production-worthy >> Flight/RPC implementations ready to go. It would be a good goal to >> deliver (end-to-end send/receive data between Python and Java, or >> Python and other Python processes) in the next couple releases. >> >> - Wes >> >> On Wed, May 30, 2018 at 12:44 PM, Jacques Nadeau <jacq...@apache.org> >> wrote: >> > Correct, I'm maintaining standard protobuf encoding so a consumer that >> > doesn't go byte by byte can still consumer/produce the messages. >> > >> > More impls: for sure. >> > >> > On Wed, May 30, 2018 at 9:01 AM, Wes McKinney <wesmck...@gmail.com> >> wrote: >> > >> >> I see; looking more closely I see you've sidestepped the standard >> >> Protobuf serialization to write the stream as tagged components: >> >> >> >> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff- >> >> 02cfc9235e22653fce8a7636c9f95507R241 >> >> >> >> and then reading the fields of the message tag by tag >> >> >> >> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff- >> >> 02cfc9235e22653fce8a7636c9f95507R159 >> >> >> >> Would it be correct that if a GRPC implementation doesn't provide >> >> sufficient access to the byte stream (or if it doesn't care enough >> >> about zero copy) that you could allow GRPC to return an instance of >> >> the FlightData structure? >> >> >> >> I expect we'd want to see a few interoperable implementations (I >> >> suggest Java, C++, Go) to harden the fine details. >> >> >> >> - Wes >> >> >> >> On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacq...@apache.org> >> >> wrote: >> >> > Cutting through the layers of GRPC will be a per language approach >> thing. >> >> > Assuming that each GRPC language implementation does a good job of >> >> > separating message encapsulation from the base library, this should be >> >> > straight-forward-ish. Hope improves around this as I see creation of >> >> > non-protobuf protocols built on top of the base GRPC [1]. How to do >> this >> >> in >> >> > each language will probably take time looking at the GRPC internals >> for >> >> > that language but can be a secondary step once you get the protocol >> >> working >> >> > (you can just pay for extra copies until then). >> >> > >> >> > In my Java approach I believe I do one read copy and zero write copies >> >> > (needs more testing) which was my target. (Getting to zero-copy on >> read >> >> > means a lot more complexity because your socket-reading has to be >> >> protocol >> >> > aware: even our bespoke layer in Dremio doesn't try to do that. I'd >> guess >> >> > KRPC does the same but haven't reviewed the code to confirm.) >> >> > >> >> > Will try to get some more slides/readme and a proper proposed patch up >> >> soon. >> >> > >> >> > [1] https://grpc.io/blog/flatbuffers >> >> > >> >> > >> >> > >> >> > On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmck...@gmail.com> >> >> wrote: >> >> > >> >> >> hey Jacques, >> >> >> >> >> >> This is great news, I look forward to digging into this. My biggest >> >> >> initial question is the Protobuf encapsulation, specifically: >> >> >> >> >> >> https://github.com/jacques-n/arrow/blob/flight/java/flight/ >> >> >> src/main/protobuf/flight.proto#L99 >> >> >> >> >> >> My understanding of Protocol Buffers is that on read, the "data_body" >> >> >> memory would be copied out of the serialized protobuf that came >> across >> >> >> the wire. Your comment in the .proto says this "comes last in the >> >> >> definition to help with sidecar patterns" -- my read is that it would >> >> >> be up to us to do our own sidecar implementation, similar to how >> >> >> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the >> >> >> comment there describes pretty much exactly the problem we have). I >> >> >> saw that you also replied on a GRPC thread about this issue [2]. >> Could >> >> >> you summarize what (if anything) stands in the way to get zero-copy >> on >> >> >> write and read? >> >> >> >> >> >> - Wes >> >> >> >> >> >> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/ >> >> >> rpc_sidecar.h#L34 >> >> >> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment- >> >> 391692087 >> >> >> >> >> >> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacq...@apache.org> >> >> >> wrote: >> >> >> > FYI, if you want to see an example server you can run with a GRPC >> >> >> generated >> >> >> > client, you can run the ExampleFlightServer located at [1]. Very >> basic >> >> >> > 'test' with that class and client is located at [2]. >> >> >> > >> >> >> > [1] >> >> >> > https://github.com/jacques-n/arrow/tree/flight/java/flight/ >> >> >> src/main/java/org/apache/arrow/flight/example >> >> >> > [2] >> >> >> > https://github.com/jacques-n/arrow/blob/flight/java/flight/ >> >> >> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java >> >> >> > >> >> >> > >> >> >> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau < >> jacq...@apache.org> >> >> >> wrote: >> >> >> > >> >> >> >> Hey All, >> >> >> >> >> >> >> >> I used my Strata talk today as a forcing function to make >> additional >> >> >> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it >> >> “Apache >> >> >> >> Arrow Flight”. You can take a look at the work here [2]. I’ll >> work to >> >> >> clean >> >> >> >> up my work and explain my thoughts about the protocol in the >> coming >> >> >> days. >> >> >> >> High-level: use protobuf as a encapsulation format so that any >> client >> >> >> that >> >> >> >> is supported in GRPC will work. However, we can optimize the >> >> read/write >> >> >> >> path for targeted languages and hand control the >> >> >> >> serialization/deserialization and memory handling. (I did that in >> >> this >> >> >> Java >> >> >> >> patch [3][4][5].) I also looked at starting to use GRPC generated >> >> >> bindings >> >> >> >> within Python but it looks like some glue code may be needed in >> the >> >> C++ >> >> >> >> layer since Python delegates down frequently. I also am still >> trying >> >> to >> >> >> >> understand GRPC back-pressure patterns and whether the protocol >> >> >> >> realistically needs to change to cover real-world high performance >> >> use >> >> >> >> cases. >> >> >> >> >> >> >> >> I’ll send out some slides about the ideas and update README, etc. >> >> soon. >> >> >> >> >> >> >> >> Thanks, >> >> >> >> Jacques >> >> >> >> >> >> >> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/ >> >> >> >> src/main/protobuf/flight.proto >> >> >> >> [2] http://github.com/jacques-n/arrow/ >> >> >> >> [3] https://github.com/jacques-n/arrow/tree/flight/ >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/grpc >> >> >> >> [4] https://github.com/jacques-n/arrow/blob/flight/ >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/ >> >> >> ArrowMessage.java#L253 >> >> >> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/ >> >> >> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253> >> >> >> >> [5] https://github.com/jacques-n/arrow/blob/flight/ >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/ >> >> >> ArrowMessage.java#L185 >> >> >> >> >> >> >> >> >> >> >> >> >> >>