Hey Nate, For the first two points, semantically I'm tempted to think of it more like the ability to send a "bag of columns" according to some schema (and hence columns could have differing lengths or even be absent). This could be a new structure alongside a record batch, which is semantically like a "slice of a table" (and hence rectangular and complete), instead of exposing existing users of RecordBatch to rather different behavior.
For #3, a different thread was discussing some of the points there - it sounds like it may be possible to relax from map<string, string> to map<string, binary>. -David On Mon, Jul 26, 2021, at 11:01, Nate Bauernfeind wrote: > Wes suggested that maybe there are enough new ideas that it may make sense > to evolve-past the existing structures rather than to bolt-on new > functionality. I would like to learn what requirements exist should new > structures be adopted, and if applicable, would like to turn this into a > full POC proposal. > > These are the features that I feel are missing from the existing design: > - the ability to notify that the columns are not consistent in length (e.g. > setting RecordBatch.length to -1; and give the arrow/flight user the true > FieldNode lengths). > - the ability to skip top-level field nodes that have length 0 at a small > cost (such as in a bitset) > - the ability to embed binary payload in the Message flatbuffer wrapper > (instead of String payload only) > - the ability to concurrently use more than one schema (the most likely API > will look like how one identifies a dictionary. ideally dictionaries could > be shared across field nodes in a schema or across schemas in the same > flight) > > What other features, or improvements, could/should be considered? Any > strong opinions against the ideas above? (Remember, that a goal of mine is > to be able to send a RecordBatch of rows that were modified intersected > only by the field-nodes that have changed (including those with only inner > node changes); thus the columns are a subset of the full schema and that > the length of each node is independent of the other). > > On Fri, Jul 9, 2021 at 9:26 AM Wes McKinney <wesmck...@gmail.com> wrote: > > It sounds like we may want to discuss some potential evolutions of the > > Arrow binary protocol (for example: new Message types). Certainly a > > can of worms but rather than trying to bolt some new functionality > > onto the existing structures, it might be better to support the new > > use cases through some new structures which will be more clear cut > > from a forward compatibility standpoint. > > Nate > > -- >