I've updated the proposal.

On the subject of Protobuf Any vs bytes, and how to handle
errors/metadata, I still think using bytes is preferable:
- It doesn't require (conditionally) exposing or wrapping Protobuf types,
- We wouldn't be able to practically expose the Protobuf field to C++
users without causing build pains,
- We can't let Python users take advantage of the Protobuf field
without somehow being compatible with the Protobuf wheels (by linking
to the same version, and doing magic to turn the C++ Protobufs into
the Python ones),
- All our other application-defined fields are already bytes.

Applications that want structure can encode JSON or Protobuf Any into
the bytes field themselves, much as you can already do for Ticket,
commands in FlightDescriptors, and application metadata in
DoGet/DoPut. I don't think this is (much) less efficient than using
Any directly, since Any itself is a bytes field with a tag, and must
invoke the Protobuf deserializer again to read the actual message.

If we decide on using bytes, then I don't think it makes sense to
define a new message with a oneof either, since it would be redundant.

Thanks,
David

On 11/7/19, David Li <li.david...@gmail.com> wrote:
> I've been extremely backlogged, I will update the proposal when I get
> a chance and reply here when done.
>
> Best,
> David
>
> On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote:
>> Bumping this discussion since a couple of weeks have passed. It seems
>> there are still some questions here, could we summarize what are the
>> alternatives along with any public API implications so we can try to
>> render a decision?
>>
>> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.david...@gmail.com> wrote:
>>>
>>> Hi Wes,
>>>
>>> Responses inline:
>>>
>>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote:
>>>
>>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com>
>>> > wrote:
>>> > >
>>> > > The question is whether to repurpose the existing FlightData
>>> > > structure, and allow for the metadata field to be filled in and data
>>> > > fields to be blank (as a control message), or to wrap the FlightData
>>> > > structure in another structure that explicitly distinguishes between
>>> > > control and data messages.
>>> >
>>> > I'm not super against having metadata-only FlightData with empty body.
>>> > One question to consider is what changes (if any) would need to be
>>> > made to public APIs in either scenario.
>>> >
>>>
>>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
>>> in
>>> the future. This would be a breaking change, but wouldn't change the
>>> wire
>>> format. I think the APIs could be changed backwards compatibly, though.
>>>
>>>
>>>
>>> > > The other question is how to handle the metadata fields. So far,
>>> > > we've
>>> > > used bytestring fields for application-defined data. This is
>>> > > workable
>>> > > if you want to use Protobuf to define the contents of those fields,
>>> > > but requires you to pack/unpack your Protobuf into/from the
>>> > > bytestring
>>> > > field. If we instead used the Protobuf Any field, a dynamically
>>> > > typed
>>> > > field, this would be more convenient, but then we'd be exposing
>>> > > Protobuf types. We could alternatively use a combination of a type
>>> > > field and a bytestring field, mimicking what the Protobuf Any type
>>> > > looks like on the wire. I'm not sure this is actually cleaner in any
>>> > > of the language APIs, though.
>>> >
>>> > Leaving the deserialization of the app metadata to the particular
>>> > Flight implementation seems on first principles like the most flexible
>>> > thing, if Any is used, does that mean the metadata _must_ be a
>>> > protobuf?
>>> >
>>>
>>>
>>> If Any is used, we could still expose a bytes-based API, but it would
>>> have
>>> some more wrapping. (We could put a ByteString in Any.) Then the
>>> question
>>> would just be how to expose this (would be easier in Java, harder in
>>> C++).
>>>
>>>
>>>
>>> > > David
>>> > >
>>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote:
>>> > > >
>>> > > > Can one of you explain what is being proposed in non-protobuf
>>> > > > terms?
>>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>>> > > >
>>> > > > Regards
>>> > > >
>>> > > > Antoine.
>>> > > >
>>> > > >
>>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>>> > > >> Oneof doesn't actually change the wire encoding; it would just be
>>> > > >> application-level logic. (The official guide doesn't even mention
>>> > > >> it
>>> > > >> in the encoding docs; I found
>>> > > >>
>>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>>> > > >> as well.)
>>> > > >>
>>> > > >> If I follow you, Jacques, then you are proposing essentially
>>> > > >> inlining
>>> > > >> the definition of Any, e.g.
>>> > > >>
>>> > > >> message FlightMessage {
>>> > > >>   oneof message {
>>> > > >>     FlightData data = 1;
>>> > > >>     FlightAny metadata = 2;
>>> > > >>   }
>>> > > >> }
>>> > > >>
>>> > > >> message FlightAny {
>>> > > >>   string type = 1;
>>> > > >>   bytes data = 2;
>>> > > >> }
>>> > > >>
>>> > > >> Is this correct?
>>> > > >>
>>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
>>> > > >> as
>>> > > >> well, but at that point, I'd rather we be consistent with all of
>>> > > >> them,
>>> > > >> rather than have one of the three methods do its own thing.
>>> > > >>
>>> > > >> Thanks,
>>> > > >> David
>>> > > >>
>>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote:
>>> > > >>> I think we could probably expose the oneof behavior without
>>> > > >>> exposing
>>> > the
>>> > > >>> protobuf functions. On the any... hmm. I guess we could expose
>>> > > >>> as
>>> > > >>> two
>>> > > >>> fields: type and data. Then users could use it for whatever but
>>> > > >>> if
>>> > > >>> people
>>> > > >>> wanted to treat it as any, it would work. (Basically a user
>>> > > >>> could
>>> > > >>> use
>>> > > >>> any
>>> > > >>> with it easily but they could also use any other mechanism). At
>>> > least in
>>> > > >>> java, the any concepts are pretty simple/diy. Are other language
>>> > > >>> bindings
>>> > > >>> less diy?
>>> > > >>>
>>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but
>>> > > >>> it
>>> > just
>>> > > >>> seemed a bit janky.
>>> > > >>>
>>> > > >>> Thinking about the control message/wrapper object thing, I
>>> > > >>> wonder
>>> > > >>> if
>>> > we
>>> > > >>> should redefine DoPut and DoGet to have the same property if we
>>> > think it
>>> > > >>> is
>>> > > >>> a good idea...
>>> > > >>>
>>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com>
>>> > wrote:
>>> > > >>>
>>> > > >>>> I was definitely considering having control messages without
>>> > > >>>> data,
>>> > and
>>> > > >>>> I thought that could be encoded by a FlightData with only
>>> > app_metadata
>>> > > >>>> set. I think I understand your position now: FlightData should
>>> > always
>>> > > >>>> carry (some) data (with optional metadata)?
>>> > > >>>>
>>> > > >>>> That makes sense to me, and is consistent with the
>>> > > >>>> documentation
>>> > > >>>> on
>>> > > >>>> FlightData in the Protobuf file. I was worried about having a
>>> > > >>>> redundant metadata field, but oneof prevents that from
>>> > > >>>> happening,
>>> > and
>>> > > >>>> overall having a clear separation between data and control
>>> > > >>>> messages
>>> > is
>>> > > >>>> cleaner.
>>> > > >>>>
>>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>>> > > >>>> exposing
>>> > > >>>> Protobuf by using bytes, would we want to change that now?
>>> > > >>>>
>>> > > >>>> Best,
>>> > > >>>> David
>>> > > >>>>
>>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote:
>>> > > >>>>> Hey David,
>>> > > >>>>>
>>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>>> > > >>>>> doget/doput
>>> > > >>>>> for
>>> > > >>>>> async. Yes, more thinking java given java grpc's async always
>>> > pattern.
>>> > > >>>>>
>>> > > >>>>> On the comment around the FlightData, I think it is
>>> > > >>>>> overloading
>>> > > >>>>> the
>>> > > >>>> message
>>> > > >>>>> to use metadata for this. If I want to send a control message
>>> > > >>>> independently
>>> > > >>>>> of the data message, I would have to define something like an
>>> > > >>>>> empty
>>> > > >>>> flight
>>> > > >>>>> data message that has custom metadata. Why not support a
>>> > > >>>>> container
>>> > > >>>>> object
>>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
>>> > > >>>>> more
>>> > data
>>> > > >>>>> as
>>> > > >>>>> desired. The default impl could be a noop for the Any
>>> > > >>>>> messages.
>>> > > >>>>>
>>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>>> > > >>>>> <li.david...@gmail.com>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>>> Hi Jacques,
>>> > > >>>>>>
>>> > > >>>>>> Thanks for the comments.
>>> > > >>>>>>
>>> > > >>>>>> - I do agree DoExchange is a better name!
>>> > > >>>>>> - FlightData already has metadata fields as a result of prior
>>> > > >>>>>> proposals, so I don't think we need a new message to carry
>>> > > >>>>>> that
>>> > kind
>>> > > >>>>>> of information.
>>> > > >>>>>> - I like the suggestion of an async handler to handle
>>> > > >>>>>> incoming
>>> > > >>>>>> messages as the fundamental API; it would actually be quite
>>> > natural
>>> > > >>>>>> to
>>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
>>> > > >>>>>> in
>>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>>> > gRPC-Java
>>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>>> > experimental
>>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
>>> > > >>>>>> those
>>> > > >>>>>> are
>>> > > >>>>>> only in relatively recent gRPC versions and are still under
>>> > > >>>>>> development (contrary to the interceptor APIs which have been
>>> > around
>>> > > >>>>>> for quite a while).
>>> > > >>>>>>
>>> > > >>>>>> Thanks,
>>> > > >>>>>> David
>>> > > >>>>>>
>>> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote:
>>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>>> > > >>>>>>> discussion
>>> > > >>>>>>> here
>>> > > >>>>>>> depending on your thoughts.
>>> > > >>>>>>>
>>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>>> > > >>>>>>> <li.david...@gmail.com>
>>> > > >>>> wrote:
>>> > > >>>>>>>
>>> > > >>>>>>>> Hey Ryan,
>>> > > >>>>>>>>
>>> > > >>>>>>>> Thanks for the comments.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
>>> > strawman.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>>> > > >>>>>>>> interleave
>>> > > >>>> uploads
>>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>>> > > >>>>>>>> synchronous
>>> > APIs
>>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
>>> > > >>>>>>>> for
>>> > each
>>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give
>>> > > >>>>>>>> us
>>> > > >>>>>>>> the
>>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>>> > > >>>>>>>> timeout.)
>>> > > >>>>>>>> As
>>> > an
>>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>>> > > >>>>>>>>
>>> > > >>>>>>>> This is mostly tangential though, eventually we will want
>>> > > >>>>>>>> to
>>> > design
>>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>>> > > >>>>>>>> stream
>>> > > >>>>>>>> like
>>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
>>> > > >>>>>>>> run
>>> > into.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
>>> > > >>>>>>>> the
>>> > main
>>> > > >>>>>>>> concern is that depending on how you deploy, two separate
>>> > > >>>>>>>> calls
>>> > > >>>>>>>> could
>>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
>>> > > >>>>>>>> some
>>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
>>> > > >>>>>>>> the
>>> > two
>>> > > >>>>>>>> calls, but it then restarts or there is another instance
>>> > available,
>>> > > >>>>>>>> the client will happily reconnect to the new server without
>>> > > >>>>>>>> warning.
>>> > > >>>>>>>>
>>> > > >>>>>>>> Thanks,
>>> > > >>>>>>>> David
>>> > > >>>>>>>>
>>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote:
>>> > > >>>>>>>>> Hey David,
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
>>> > > >>>>>>>>> the
>>> > > >>>>>>>>> possibility
>>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
>>> > > >>>>>>>>> help
>>> > me
>>> > > >>>>>> would
>>> > > >>>>>>>> be
>>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>>> > > >>>>>>>>> Also,
>>> > what
>>> > > >>>>>> would
>>> > > >>>>>>>> the
>>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
>>> > > >>>>>>>>> client
>>> > > >>>>>>>>> block
>>> > > >>>>>>>> till
>>> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a,
>>> > > >>>>>>>>> b)
>>> > or
>>> > > >>>>>>>>> would
>>> > > >>>>>>>> the
>>> > > >>>>>>>>> client wait to be signaled that computation was done. If
>>> > > >>>>>>>>> the
>>> > > >>>>>>>>> later
>>> > > >>>>>>>>> how
>>> > > >>>>>>>>> is
>>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
>>> > > >>>>>>>>> this
>>> > could
>>> > > >>>> be
>>> > > >>>>>>>>> implemented without extending the RPC interface but rather
>>> > > >>>>>>>>> by a
>>> > > >>>>>>>>> function/util?
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Best,
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Ryan
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>>> > li.david...@gmail.com>
>>> > > >>>>>> wrote:
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>> Hi all,
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we
>>> > > >>>>>>>>>> have
>>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
>>> > > >>>>>>>>>> both
>>> > > >>>>>>>>>> send
>>> > > >>>>>>>>>> and
>>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
>>> > > >>>>>>>>>> end,
>>> > I've
>>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>
>>> > > >>>>
>>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>>> > document.
>>> > > >>>>>>>>>> I'd
>>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>>> > > >>>>>>>>>> straightforward
>>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>>> > > >>>>>>>>>> decided
>>> > > >>>>>>>>>> to
>>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut),
>>> > > >>>>>>>>>> as
>>> > > >>>>>>>>>> it
>>> > > >>>>>> doesn't
>>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
>>> > > >>>>>>>>>> the
>>> > > >>>>>>>>>> semantics.
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> Thanks,
>>> > > >>>>>>>>>> David
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> --
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> +447540852009 | rym...@dremio.com
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> <https://www.dremio.com/>
>>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join
>>> > > >>>>>>>>> our
>>> > > >>>>>>>>> community
>>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>>> > > >>>>>>>>> <https://www.dremio.com/download>
>>> > > >>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>
>>> > > >>>>>>
>>> > > >>>>>
>>> > > >>>>
>>> > > >>>
>>> > > >
>>> >
>>
>

Reply via email to