I'm still interested in RLE related effort, but not sure about my available
bandwidth (which is why I haven't made more of an effort there).

On Tue, Aug 3, 2021 at 6:00 PM Wes McKinney <wesmck...@gmail.com> wrote:

> Another Flatbuffers/Message.fbs project we should rekindle soon, in
> addition to the schema evolution/replacement question which has been
> raised with Flight, is that of sparse/compressed data (e.g. RLE). I
> have a vacation plus some travel coming up so won't be able to devote
> meaningful attention to this until the last part of August, but would
> like to help it move forward.
>
>
> On Tue, Jul 27, 2021 at 1:40 PM David Li <lidav...@apache.org> wrote:
> >
> > Hey Nate,
> >
> > For the first two points, semantically I'm tempted to think of it more
> like the ability to send a "bag of columns" according to some schema (and
> hence columns could have differing lengths or even be absent). This could
> be a new structure alongside a record batch, which is semantically like a
> "slice of a table" (and hence rectangular and complete), instead of
> exposing existing users of RecordBatch to rather different behavior.
> >
> > For #3, a different thread was discussing some of the points there - it
> sounds like it may be possible to relax from map<string, string> to
> map<string, binary>.
> >
> > -David
> >
> > On Mon, Jul 26, 2021, at 11:01, Nate Bauernfeind wrote:
> > > Wes suggested that maybe there are enough new ideas that it may make
> sense
> > > to evolve-past the existing structures rather than to bolt-on new
> > > functionality. I would like to learn what requirements exist should new
> > > structures be adopted, and if applicable, would like to turn this into
> a
> > > full POC proposal.
> > >
> > > These are the features that I feel are missing from the existing
> design:
> > > - the ability to notify that the columns are not consistent in length
> (e.g.
> > > setting RecordBatch.length to -1; and give the arrow/flight user the
> true
> > > FieldNode lengths).
> > > - the ability to skip top-level field nodes that have length 0 at a
> small
> > > cost (such as in a bitset)
> > > - the ability to embed binary payload in the Message flatbuffer wrapper
> > > (instead of String payload only)
> > > - the ability to concurrently use more than one schema (the most
> likely API
> > > will look like how one identifies a dictionary. ideally dictionaries
> could
> > > be shared across field nodes in a schema or across schemas in the same
> > > flight)
> > >
> > > What other features, or improvements, could/should be considered? Any
> > > strong opinions against the ideas above? (Remember, that a goal of
> mine is
> > > to be able to send a RecordBatch of rows that were modified intersected
> > > only by the field-nodes that have changed (including those with only
> inner
> > > node changes); thus the columns are a subset of the full schema and
> that
> > > the length of each node is independent of the other).
> > >
> > > On Fri, Jul 9, 2021 at 9:26 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > > It sounds like we may want to discuss some potential evolutions of
> the
> > > > Arrow binary protocol (for example: new Message types). Certainly a
> > > > can of worms but rather than trying to bolt some new functionality
> > > > onto the existing structures, it might be better to support the new
> > > > use cases through some new structures which will be more clear cut
> > > > from a forward compatibility standpoint.
> > >
> > > Nate
> > >
> > > --
> > >
>

Reply via email to