Thanks for getting the discussion started, Micah!

I'm +1 on this change and also slightly prefer 1. As Antoine mentions,
there doesn't seem to be a clear benefit from 2, unless we want to also
support 8 or 16 bit indices in the future, which seems unlikely. So going
with 1 is ok I think.

Best,
Philipp.

On Thu, Apr 11, 2019 at 7:06 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 11/04/2019 à 10:52, Micah Kornfield a écrit :
> > ARROW-4810 [1] and ARROW-750 [2] discuss adding types with 64-bit offsets
> > to Lists, Strings and binary data types.
> >
> > Philipp started an implementation for the large list type [3] and I
> hacked
> > together a potentially viable java implementation [4]
> >
> > I'd like to kickoff the discussion for getting these types voted on.  I'm
> > coupling them together because I think there are design consideration for
> > how we evolve Schema.fbs
> >
> > There are two proposed options:
> > 1.  The current PR proposal which adds a new type LargeList:
> >   // List with 64-bit offsets
> >   table LargeList {}
> >
> > 2.  As François suggested, it might cleaner to parameterize List with
> > offset width.  I suppose something like:
> >
> > table List {
> >   // only 32 bit and 64 bit is supported.
> >   bitWidth: int = 32;
> > }
> >
> > I think Option 2 is cleaner and potentially better long-term, but I think
> > it breaks forward compatibility of the existing arrow libraries.  If we
> > proceed with Option 2, I would advocate making the change to Schema.fbs
> all
> > at once for all types (assuming we think that 64-bit offsets are
> desirable
> > for all types) along with future compatibility checks to avoid multiple
> > releases were future compatibility is broken (by broken I mean the
> > inability to detect that an implementation is receiving data it can't
> > read).    What are peoples thoughts on this?
>
> I think Option 1 is ok.  Making List / String / Binary parameterizable
> doesn't bring anything *concretely*, since the types will not be
> physically interchangeable.  The cost of breaking compatibility should
> be offset by a compelling benefit, which doesn't seem to exist here.
>
> Of course, implementations are free to refactor their internals to avoid
> code duplication (for example the C++ ListBuilder and LargeListBuilder
> classes could be instances of a BaseListBuilder<IndexType> generic type)...
>
> Regards
>
> Antoine.
>

Reply via email to