Another Flatbuffers/Message.fbs project we should rekindle soon, in
addition to the schema evolution/replacement question which has been
raised with Flight, is that of sparse/compressed data (e.g. RLE). I
have a vacation plus some travel coming up so won't be able to devote
meaningful attention to this until the last part of August, but would
like to help it move forward.


On Tue, Jul 27, 2021 at 1:40 PM David Li <lidav...@apache.org> wrote:
>
> Hey Nate,
>
> For the first two points, semantically I'm tempted to think of it more like 
> the ability to send a "bag of columns" according to some schema (and hence 
> columns could have differing lengths or even be absent). This could be a new 
> structure alongside a record batch, which is semantically like a "slice of a 
> table" (and hence rectangular and complete), instead of exposing existing 
> users of RecordBatch to rather different behavior.
>
> For #3, a different thread was discussing some of the points there - it 
> sounds like it may be possible to relax from map<string, string> to 
> map<string, binary>.
>
> -David
>
> On Mon, Jul 26, 2021, at 11:01, Nate Bauernfeind wrote:
> > Wes suggested that maybe there are enough new ideas that it may make sense
> > to evolve-past the existing structures rather than to bolt-on new
> > functionality. I would like to learn what requirements exist should new
> > structures be adopted, and if applicable, would like to turn this into a
> > full POC proposal.
> >
> > These are the features that I feel are missing from the existing design:
> > - the ability to notify that the columns are not consistent in length (e.g.
> > setting RecordBatch.length to -1; and give the arrow/flight user the true
> > FieldNode lengths).
> > - the ability to skip top-level field nodes that have length 0 at a small
> > cost (such as in a bitset)
> > - the ability to embed binary payload in the Message flatbuffer wrapper
> > (instead of String payload only)
> > - the ability to concurrently use more than one schema (the most likely API
> > will look like how one identifies a dictionary. ideally dictionaries could
> > be shared across field nodes in a schema or across schemas in the same
> > flight)
> >
> > What other features, or improvements, could/should be considered? Any
> > strong opinions against the ideas above? (Remember, that a goal of mine is
> > to be able to send a RecordBatch of rows that were modified intersected
> > only by the field-nodes that have changed (including those with only inner
> > node changes); thus the columns are a subset of the full schema and that
> > the length of each node is independent of the other).
> >
> > On Fri, Jul 9, 2021 at 9:26 AM Wes McKinney <wesmck...@gmail.com> wrote:
> > > It sounds like we may want to discuss some potential evolutions of the
> > > Arrow binary protocol (for example: new Message types). Certainly a
> > > can of worms but rather than trying to bolt some new functionality
> > > onto the existing structures, it might be better to support the new
> > > use cases through some new structures which will be more clear cut
> > > from a forward compatibility standpoint.
> >
> > Nate
> >
> > --
> >

Reply via email to