On Tue, Feb 12, 2019 at 3:46 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Le 12/02/2019 à 22:34, Wes McKinney a écrit :
> > On Tue, Feb 12, 2019 at 2:48 PM Antoine Pitrou <anto...@python.org> wrote:
> >>
> >>
> >> Hi David,
> >>
> >> I think allowing to send application-specific ancillary data in addition
> >> to Arrow data makes sense.
> >>
> >> (I'm also wondering whether the choice of gRPC is appropriate at all -
> >> the current C++ hacks around "zero-copy" are not pretty and they may not
> >> translate to other languages either)
> >>
> >
> > This is unrelated to the discussion of extending the Flight protocol,
> > but I'm not sure I would describe the serialization optimizations that
> > have been implemented as "hacks". gRPC exposes its message
> > serialization layer among other things to permit extensibility and to
> > not require the use of Protocol Buffers necessarily.
>
> One thing that surfaced is that the current implementation relies on C++
> undefined behaviour (the reinterpret_cast from pb::FlightData to the
> unrelated struct FlightData).  I don't know if there's a way to
> reimplement the optimization without that cast, but otherwise it's cause
> for worry, IMHO.

Is there a JIRA about this? I spent some time looking around gRPC's
C++ library (which is header-only) and  AFAICT the only exposure of
the template parameter to any relevant part of the code is at the
SerializationTraits interface, so the two template types should be
internally isomorphic (but I am not a C++ language lawyer). There may
be a safer way to get the library to generate the code we are looking
for. Note that the initial C++ implementation was written over a short
period of a few days; my goal was to get something working and do more
research later

>
> > The reason that we chose to use the Protobuf wire format for all
> > message types, including data, is that there is excellent
> > cross-language support for protobufs, and among production-ready RPC
> > frameworks, gRPC has the most robust language support, covering pretty
> > much all the languages we care about:
> > https://github.com/grpc/grpc#to-start-using-grpc. The only one missing
> > is Rust, and I reckon that will get rectified at some point (there is
> > already https://github.com/stepancheg/grpc-rust, maybe it will be
> > adopted into gRPC formally at some point). But to have C++, C#, Go,
> > Java, and Node officially supported out of the box is not nothing. I
> > think it would be unwise to go a different way unless you have some
> > compelling reason that gRPC / HTTP/2 is fundamentally flawed this this
> > intended use.
>
> Since our use case pretty much requires high-performance transmission
> with as few copies as possible (ideally, data should be directly sent
> from/received to Arrow buffers without any intermediate userspace
> copies), I think we should evaluate whether gRPC can allow us to achieve
> that (there are still copies currently, AFAICT), and at which cost.
>
> As a side note, the Flight C++ benchmark currently achieves a bit more
> than 2 GB/s here.  There may be ways to improve this number (does gRPC
> enable TLS by default? does it compress by default?)...
>

One design question as we work on this project is how one could open a
"side channel" of sorts for moving the dataset itself outside of gRPC
but still using the flexible command layer

> Regards
>
> Antoine.

Reply via email to