On Tue, Jul 6, 2021 at 6:33 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> >
> > Right, I had wanted to focus the discussion on Flight as I think schema
> > evolution or multiplexing streams (more so the latter) is a property of the
> > transport and not the stream format itself. If we are leaning towards just
> > schema evolution then maybe it makes sense to discuss it for the IPC stream
> > format and leverage that in Flight. I'd be interested in what others think.
>
> I tend to agree, I think stream multiplexing is likely a transport level
> issue.  IMO I think schema evolution should be consistent with the IPC
> stream format  and flight.
>
>
> > Nate: it may be worth starting a separate discussion about more general
> > metadata in the IPC message. I'm not aware of why key-value metadata was
> > chosen/if opaque bytes were considered in the past.
>
>
> I think  this was an unfortunate design of the key value metadata in
> Schema.fbs, but I don't think I was around when this decision was made.

I agree that it's unfortunate that we did not use [ byte ] instead of
string for the value in the KeyValue metadata — I think this was more
of an oversight than a deliberate choice (e.g. it was not our intent
to require binary data to be base64-encoded — this is something that
we have to do when encoding binary data in Thrift KeyValue metadata
for Parquet, for example). Is the binary representation of [byte]
different from string?



> Side Question: Why isn't the IPC stream format a series of the flight
> > protobufs?
>
> In addition to what David said, protobufs can't be read directly from a
> memory-mapped file (they need decoding).  This was one of the design
> considerations of using flatbuffers and IPC Stream/File format.
>
> I was thinking Micah's comment is more that whatever we do, it should be
> > clearly specified and edge cases should be considered, especially if we
> > might want to 'backport' this into the stream format later.
>
>
> Yes, for dictionaries we just need to be careful to define semantics and
> ensure implementations are validating them with regards to dictionaries.
> There likely isn't any need to change current implementations though.
>
> On Mon, Jun 28, 2021 at 1:25 PM David Li <lidav...@apache.org> wrote:
>
> > Right, I had wanted to focus the discussion on Flight as I think schema
> > evolution or multiplexing streams (more so the latter) is a property of the
> > transport and not the stream format itself. If we are leaning towards just
> > schema evolution then maybe it makes sense to discuss it for the IPC stream
> > format and leverage that in Flight. I'd be interested in what others think.
> >
> > Especially if we are looking at multiplexing streams - I would wonder if
> > that's actually better served by making it easier to implement using the
> > Flight implementation as it stands (by managing concurrent RPC calls and/or
> > performing the union-of-structs encoding trick for you), instead of having
> > to change the protocol.
> >
> > Nate: it may be worth starting a separate discussion about more general
> > metadata in the IPC message. I'm not aware of why key-value metadata was
> > chosen/if opaque bytes were considered in the past. Off the top of my head
> > if it's for on-disk storage and fully application-defined it may make sense
> > to store as a separate file alongside the Arrow file (indexed by record
> > batch index) where you can take advantage of whatever format is most
> > suitable.
> >
> > -David
> >
> > On Sun, Jun 27, 2021, at 07:50, Gosh Arzumanyan wrote:
> > > Hi guys,
> > >
> > > 1. Regarding IPC vs Flight: in fact my initial suggestion was to add this
> > > feature starting from the IPC(I moved initial write up steps to the
> > bottom
> > > of the doc). Afterwards David suggested focusing on Flight and that's how
> > > we ended up with the protobufs change in the proposal. This being said I
> > do
> > > think that the place where this should be impemented is a good question
> > on
> > > its own. Maybe it makes sense to have this kind of a feature in IPC and
> > > somehow use it in Flight, maybe not.
> > > 2. The point about dictionaries deserves a dedicated section in the
> > > proposal. Nate and David brought it up and shared some insights. I'll try
> > > to aggregate them and we can continue the discussion form there.
> > >
> > > Cheers,
> > > Gosh
> > >
> > > On Sat., 26 Jun. 2021, 17:26 Nate Bauernfeind, <
> > natebauernfe...@deephaven.io>
> > > wrote:
> > >
> > > > >
> > > > > > > makes it more difficult to bring schema evolution back into the
> > > > > > > IPC Stream format (i.e. it would live only in flight)
> > > > > >
> > > > > > Gosh's proposal extends the flatbuffer structures not the
> > protobufs.
> > > > Can
> > > > > > you help me understand how difficult it would be to bring the
> > > > `schema_id`
> > > > > > approach to the IPC stream format?
> > > > >
> > > > > I thought we were talking solely about the Flight Protobuf
> > definitions -
> > > > > not the Flatbuffers (and the Google doc at least only talks about the
> > > > > Protobufs).
> > > > >
> > > >
> > > > I somehow missed that schema_id is being added to protobuf in the
> > document.
> > > > It feels to me that the schema_id is a property that would ideally only
> > > > apply to the RecordBatch. I better understand Micah's dictionary
> > concerns,
> > > > now, too.
> > > >
> > > > > Side Question: Why isn't the IPC stream format a series of the flight
> > > > > > protobufs? It's a real shame that there is no standard way to
> > > > > > capture/replay a stream with app_metadata. (Obviously ignoring the
> > > > > > annoyances around protobuf wrapping flatbuffers.)
> > > > >
> > > > > The IPC format was defined long before Flight, and Flight's
> > app_metadata
> > > > > was added after Flight's initial definition. Note an IPC message does
> > > > have
> > > > > a provision for key-value metadata, though I think APIs for that are
> > not
> > > > > fully exposed. (See ARROW-6940:
> > > > > https://issues.apache.org/jira/browse/ARROW-6940 and despite my
> > comments
> > > > > there perhaps we need to unify or at least consider how Flight's
> > > > > app_metadata relates to the IPC message custom_metadata. Also
> > perhaps see
> > > > > ARROW-1059.)
> > > > >
> > > >
> > > > KeyValue unfortunately is string to string. In flatbuffer strings are
> > only
> > > > UTF-8 or 7-bit ASCII. The app_metadata on the other hand is opaque
> > bytes.
> > > > The latter is a bit more useful.
> > > >
> > > > --
> > > >
> > >
> >

Reply via email to