Hey Nate,

For the first two points, semantically I'm tempted to think of it more like the 
ability to send a "bag of columns" according to some schema (and hence columns 
could have differing lengths or even be absent). This could be a new structure 
alongside a record batch, which is semantically like a "slice of a table" (and 
hence rectangular and complete), instead of exposing existing users of 
RecordBatch to rather different behavior.

For #3, a different thread was discussing some of the points there - it sounds 
like it may be possible to relax from map<string, string> to map<string, 
binary>. 

-David

On Mon, Jul 26, 2021, at 11:01, Nate Bauernfeind wrote:
> Wes suggested that maybe there are enough new ideas that it may make sense
> to evolve-past the existing structures rather than to bolt-on new
> functionality. I would like to learn what requirements exist should new
> structures be adopted, and if applicable, would like to turn this into a
> full POC proposal.
> 
> These are the features that I feel are missing from the existing design:
> - the ability to notify that the columns are not consistent in length (e.g.
> setting RecordBatch.length to -1; and give the arrow/flight user the true
> FieldNode lengths).
> - the ability to skip top-level field nodes that have length 0 at a small
> cost (such as in a bitset)
> - the ability to embed binary payload in the Message flatbuffer wrapper
> (instead of String payload only)
> - the ability to concurrently use more than one schema (the most likely API
> will look like how one identifies a dictionary. ideally dictionaries could
> be shared across field nodes in a schema or across schemas in the same
> flight)
> 
> What other features, or improvements, could/should be considered? Any
> strong opinions against the ideas above? (Remember, that a goal of mine is
> to be able to send a RecordBatch of rows that were modified intersected
> only by the field-nodes that have changed (including those with only inner
> node changes); thus the columns are a subset of the full schema and that
> the length of each node is independent of the other).
> 
> On Fri, Jul 9, 2021 at 9:26 AM Wes McKinney <wesmck...@gmail.com> wrote:
> > It sounds like we may want to discuss some potential evolutions of the
> > Arrow binary protocol (for example: new Message types). Certainly a
> > can of worms but rather than trying to bolt some new functionality
> > onto the existing structures, it might be better to support the new
> > use cases through some new structures which will be more clear cut
> > from a forward compatibility standpoint.
> 
> Nate
> 
> --
> 

Reply via email to