Just following up here - what's the status? It looks like there's some 
unaddressed comments on the PR?

On Tue, Nov 23, 2021, at 13:54, Micah Kornfield wrote:
> Sorry I just took a closer look and left some comments.  I think the one
> substantive issue, is the document linked talks about different
> length columns in the Bag, and this isn't mentioned in the flatbuffers?
> Could you comment/update the documentations in flatbuffers accordingly?
>
> Thanks,
> Micah
>
> On Tue, Nov 23, 2021 at 10:41 AM David Li <lidav...@apache.org> wrote:
>
>> Thanks for putting that up.
>>
>> It doesn't look like there's been too much discussion here. If people
>> agree it's useful, maybe the next step is to draft an implementation in
>> Java or C++ for feedback? There was some discussion on the use cases in the
>> document, do we feel like we need to clarify that better?
>>
>> -David
>>
>> On Mon, Nov 8, 2021, at 16:46, Nate Bauernfeind wrote:
>> > I put the draft up here: https://github.com/apache/arrow/pull/11646
>> >
>> > Thanks.
>> >
>> > On Mon, Nov 8, 2021 at 1:57 PM David Li <lidav...@apache.org> wrote:
>> >
>> > > Hey Nate,
>> > >
>> > > Thanks for doing this! Would you be interested in putting that commit
>> up
>> > > as a draft PR for discussion? I think we can discuss there.
>> > >
>> > > I'm not sure anyone is actively working on RLE or other encoding
>> schemes
>> > > at the moment.
>> > >
>> > > -David
>> > >
>> > > On Mon, Nov 8, 2021, at 13:19, Nate Bauernfeind wrote:
>> > > > I've written up the ColumnBag proposal addressing items 1 and 2 on
>> the
>> > > > list. I'm open to any and all feedback/suggestions.
>> > > >
>> > > > I'd be happy to add item 3 (binary metadata) to the proposed change
>> set.
>> > > > Let me know if you want me to whip up the initial suggestion for that
>> > > > version (and whether or not to keep it separate from ColumnBag).
>> > > >
>> > > > Would RLE related efforts change the structure of RecordBatch or
>> > > ColumnBag
>> > > > (if accepted)?
>> > > >
>> > > > Here is the brief history-discussion around why ColumnBag:
>> > > >
>> > >
>> https://docs.google.com/document/d/1jsmmqLTyJkU8fx0sUGIqd6yu72N4v9uHFsuGSgB_DfE/
>> > > >
>> > > > Here is a brief commit doctoring up the flatbuffer to support this
>> > > version
>> > > > of the proposed change:
>> > > > https://github.com/nbauernfeind/arrow/tree/column_bag_demo_v1
>> > > >
>> > > > I don't know if it's better to comment in the document or bring
>> comments
>> > > > back to the list. If it ends up being document heavy, then I'll
>> summarize
>> > > > the main points back on the list.
>> > > >
>> > > > I think I'll get started on a Java impl just to learn more even if it
>> > > ends
>> > > > up being extra work.
>> > > >
>> > > > Looking forward to your feedback,
>> > > > Nate
>> > > >
>> > > > On Mon, Aug 9, 2021 at 10:06 PM Micah Kornfield <
>> emkornfi...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I'm still interested in RLE related effort, but not sure about my
>> > > available
>> > > > > bandwidth (which is why I haven't made more of an effort there).
>> > > > >
>> > > > > On Tue, Aug 3, 2021 at 6:00 PM Wes McKinney <wesmck...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Another Flatbuffers/Message.fbs project we should rekindle soon,
>> in
>> > > > > > addition to the schema evolution/replacement question which has
>> been
>> > > > > > raised with Flight, is that of sparse/compressed data (e.g.
>> RLE). I
>> > > > > > have a vacation plus some travel coming up so won't be able to
>> devote
>> > > > > > meaningful attention to this until the last part of August, but
>> would
>> > > > > > like to help it move forward.
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Jul 27, 2021 at 1:40 PM David Li <lidav...@apache.org>
>> > > wrote:
>> > > > > > >
>> > > > > > > Hey Nate,
>> > > > > > >
>> > > > > > > For the first two points, semantically I'm tempted to think of
>> it
>> > > more
>> > > > > > like the ability to send a "bag of columns" according to some
>> schema
>> > > (and
>> > > > > > hence columns could have differing lengths or even be absent).
>> This
>> > > could
>> > > > > > be a new structure alongside a record batch, which is
>> semantically
>> > > like a
>> > > > > > "slice of a table" (and hence rectangular and complete), instead
>> of
>> > > > > > exposing existing users of RecordBatch to rather different
>> behavior.
>> > > > > > >
>> > > > > > > For #3, a different thread was discussing some of the points
>> there
>> > > - it
>> > > > > > sounds like it may be possible to relax from map<string, string>
>> to
>> > > > > > map<string, binary>.
>> > > > > > >
>> > > > > > > -David
>> > > > > > >
>> > > > > > > On Mon, Jul 26, 2021, at 11:01, Nate Bauernfeind wrote:
>> > > > > > > > Wes suggested that maybe there are enough new ideas that it
>> may
>> > > make
>> > > > > > sense
>> > > > > > > > to evolve-past the existing structures rather than to
>> bolt-on new
>> > > > > > > > functionality. I would like to learn what requirements exist
>> > > should
>> > > > > new
>> > > > > > > > structures be adopted, and if applicable, would like to turn
>> this
>> > > > > into
>> > > > > > a
>> > > > > > > > full POC proposal.
>> > > > > > > >
>> > > > > > > > These are the features that I feel are missing from the
>> existing
>> > > > > > design:
>> > > > > > > > - the ability to notify that the columns are not consistent
>> in
>> > > length
>> > > > > > (e.g.
>> > > > > > > > setting RecordBatch.length to -1; and give the arrow/flight
>> user
>> > > the
>> > > > > > true
>> > > > > > > > FieldNode lengths).
>> > > > > > > > - the ability to skip top-level field nodes that have length
>> 0
>> > > at a
>> > > > > > small
>> > > > > > > > cost (such as in a bitset)
>> > > > > > > > - the ability to embed binary payload in the Message
>> flatbuffer
>> > > > > wrapper
>> > > > > > > > (instead of String payload only)
>> > > > > > > > - the ability to concurrently use more than one schema (the
>> most
>> > > > > > likely API
>> > > > > > > > will look like how one identifies a dictionary. ideally
>> > > dictionaries
>> > > > > > could
>> > > > > > > > be shared across field nodes in a schema or across schemas
>> in the
>> > > > > same
>> > > > > > > > flight)
>> > > > > > > >
>> > > > > > > > What other features, or improvements, could/should be
>> > > considered? Any
>> > > > > > > > strong opinions against the ideas above? (Remember, that a
>> goal
>> > > of
>> > > > > > mine is
>> > > > > > > > to be able to send a RecordBatch of rows that were modified
>> > > > > intersected
>> > > > > > > > only by the field-nodes that have changed (including those
>> with
>> > > only
>> > > > > > inner
>> > > > > > > > node changes); thus the columns are a subset of the full
>> schema
>> > > and
>> > > > > > that
>> > > > > > > > the length of each node is independent of the other).
>> > > > > > > >
>> > > > > > > > On Fri, Jul 9, 2021 at 9:26 AM Wes McKinney <
>> wesmck...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > > It sounds like we may want to discuss some potential
>> > > evolutions of
>> > > > > > the
>> > > > > > > > > Arrow binary protocol (for example: new Message types).
>> > > Certainly a
>> > > > > > > > > can of worms but rather than trying to bolt some new
>> > > functionality
>> > > > > > > > > onto the existing structures, it might be better to support
>> > > the new
>> > > > > > > > > use cases through some new structures which will be more
>> clear
>> > > cut
>> > > > > > > > > from a forward compatibility standpoint.
>> > > > > > > >
>> > > > > > > > Nate
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>>

Reply via email to