I think we could probably expose the oneof behavior without exposing the protobuf functions. On the any... hmm. I guess we could expose as two fields: type and data. Then users could use it for whatever but if people wanted to treat it as any, it would work. (Basically a user could use any with it easily but they could also use any other mechanism). At least in java, the any concepts are pretty simple/diy. Are other language bindings less diy?
I'm *not* hardcore against the empty FlightData + metadata but it just seemed a bit janky. Thinking about the control message/wrapper object thing, I wonder if we should redefine DoPut and DoGet to have the same property if we think it is a good idea... On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com> wrote: > I was definitely considering having control messages without data, and > I thought that could be encoded by a FlightData with only app_metadata > set. I think I understand your position now: FlightData should always > carry (some) data (with optional metadata)? > > That makes sense to me, and is consistent with the documentation on > FlightData in the Protobuf file. I was worried about having a > redundant metadata field, but oneof prevents that from happening, and > overall having a clear separation between data and control messages is > cleaner. > > As for using Protobuf's Any: so far, we've refrained from exposing > Protobuf by using bytes, would we want to change that now? > > Best, > David > > On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote: > > Hey David, > > > > RE: Async: I was trying to match the pattern we use for doget/doput for > > async. Yes, more thinking java given java grpc's async always pattern. > > > > On the comment around the FlightData, I think it is overloading the > message > > to use metadata for this. If I want to send a control message > independently > > of the data message, I would have to define something like an empty > flight > > data message that has custom metadata. Why not support a container object > > with a oneof{FlightData, Any} in it instead so users can add more data as > > desired. The default impl could be a noop for the Any messages. > > > > On Tue, Oct 15, 2019 at 6:50 PM David Li <li.david...@gmail.com> wrote: > > > >> Hi Jacques, > >> > >> Thanks for the comments. > >> > >> - I do agree DoExchange is a better name! > >> - FlightData already has metadata fields as a result of prior > >> proposals, so I don't think we need a new message to carry that kind > >> of information. > >> - I like the suggestion of an async handler to handle incoming > >> messages as the fundamental API; it would actually be quite natural to > >> implement in Flight/Java. I will note that it's not possible in > >> C++/Python without spawning a thread, though. (In essence, gRPC-Java > >> is async-always and gRPC-C++ is sync-always.) There are experimental > >> C++ APIs that would let us do something similar to Java, but those are > >> only in relatively recent gRPC versions and are still under > >> development (contrary to the interceptor APIs which have been around > >> for quite a while). > >> > >> Thanks, > >> David > >> > >> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote: > >> > I like it. Added some comments to the doc. Might worth discussion here > >> > depending on your thoughts. > >> > > >> > On Tue, Oct 15, 2019 at 7:11 AM David Li <li.david...@gmail.com> > wrote: > >> > > >> >> Hey Ryan, > >> >> > >> >> Thanks for the comments. > >> >> > >> >> Concrete example: I've edited the doc to provide a Python strawman. > >> >> > >> >> Sync vs async: while I don't touch on it, you could interleave > uploads > >> >> and downloads if you were so inclined. Right now, synchronous APIs > >> >> make this error-prone, e.g. if both client and server wait for each > >> >> other due to an application logic bug. (gRPC doesn't give us the > >> >> ability to have per-read timeouts, only an overall timeout.) As an > >> >> example of this happening with DoPut, see ARROW-6063: > >> >> https://issues.apache.org/jira/browse/ARROW-6063 > >> >> > >> >> This is mostly tangential though, eventually we will want to design > >> >> asynchronous APIs for Flight as a whole. A bidirectional stream like > >> >> this (and like DoPut) just makes these pitfalls easier to run into. > >> >> > >> >> Using DoPut+DoGet: I discussed this in the proposal, but the main > >> >> concern is that depending on how you deploy, two separate calls could > >> >> get routed to different instances. Additionally, gRPC has some > >> >> reconnection behaviors; if the server goes away in between the two > >> >> calls, but it then restarts or there is another instance available, > >> >> the client will happily reconnect to the new server without warning. > >> >> > >> >> Thanks, > >> >> David > >> >> > >> >> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote: > >> >> > Hey David, > >> >> > > >> >> > I think this proposal makes a lot of sense. I like it and the > >> >> > possibility > >> >> > of remote compute via arrow buffers. One thing that would help me > >> would > >> >> be > >> >> > a concrete example of the API in a real life use case. Also, what > >> would > >> >> the > >> >> > client experience be in terms of sync vs asyc? Would the client > >> >> > block > >> >> till > >> >> > the bidirectional call return ie c = flight.vector_mult(a, b) or > >> >> > would > >> >> the > >> >> > client wait to be signaled that computation was done. If the later > >> >> > how > >> >> > is > >> >> > that different from a DoPut then DoGet? I suppose that this could > be > >> >> > implemented without extending the RPC interface but rather by a > >> >> > function/util? > >> >> > > >> >> > > >> >> > Best, > >> >> > > >> >> > Ryan > >> >> > > >> >> > On Sun, Oct 13, 2019 at 9:24 PM David Li <li.david...@gmail.com> > >> wrote: > >> >> > > >> >> >> Hi all, > >> >> >> > >> >> >> We've been using Flight quite successfully so far, but we have > >> >> >> identified a new use case on the horizon: being able to both send > >> >> >> and > >> >> >> retrieve Arrow data within a single RPC call. To that end, I've > >> >> >> written up a proposal for a new RPC method: > >> >> >> > >> >> >> > >> >> > >> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing > >> >> >> > >> >> >> Please let me know if you can't view or comment on the document. > >> >> >> I'd > >> >> >> appreciate any feedback; I think this is a relatively > >> >> >> straightforward > >> >> >> addition - it is essentially "DoPutThenGet". > >> >> >> > >> >> >> This is a format change and would require a vote. I've decided to > >> >> >> table the other format change I had proposed (on DoPut), as it > >> doesn't > >> >> >> functionally change Flight, just the interpretation of the > >> >> >> semantics. > >> >> >> > >> >> >> Thanks, > >> >> >> David > >> >> >> > >> >> > > >> >> > > >> >> > -- > >> >> > > >> >> > Ryan Murray | Principal Consulting Engineer > >> >> > > >> >> > +447540852009 | rym...@dremio.com > >> >> > > >> >> > <https://www.dremio.com/> > >> >> > Check out our GitHub <https://www.github.com/dremio>, join our > >> >> > community > >> >> > site <https://community.dremio.com/> & Download Dremio > >> >> > <https://www.dremio.com/download> > >> >> > > >> >> > >> > > >> > > >