Hi Wes,
I think we might be closer than we think on the Java side to having the
functionality listed (I've added comments inline at the end with the
features you listed in the original e-mail).

My biggest concern is I don't think there is a clear path forward for
Sparse Unions.  Getting compatibility for Sparse unions would be more
invasive/breaking changes to the java code base.  [1] is the last thread on
the issue.  I sadly have not had time to get back to this, nor will I
probably have time before the next release.

I would propose that if there isn't an implementation in any language we
might drop it as part of the specification.  The main feature that I think
meets this criteria is the Dictionary of Dictionary columns (Is this
supported in C++)?

Thanks,
Micah


* custom_metadata fields

Not sure about this one.

> * Extension Types

There is an implementation already in Java, probably. needs more work for
integration testing.

* Large (64-bit offset) variable size types

there is an open PR for string/binary types.  LargeList is of more
questionable value until Java supports vectors/arrays with more than 2^32
elements.

* Delta and Replacement Dictionaries

There is an implementation already in Java, probably needs more work for
specifically for integration testing.

> * Unions

There is an implementation for dense unions (likely needs more work for
integration testing).

On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson <
[email protected]> wrote:

> I'm all for making our next release be 1.0. Everything is about tradeoffs,
> and while I too would like to see a complete Java implementation, I think
> the costs of further delaying 1.0 outweigh the benefits of holding it
> indefinitely in hopes that there will be enough availability of Java
> developers to finish integration testing.
>
> Neal
>
> On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <[email protected]> wrote:
>
> > hi Bryan -- with the way that things are going, if we were to block
> > the 1.0.0 release on completing the Java work, it could be a very long
> > time to wait (long time = more than 6 months from now). I don't think
> > that's acceptable. The Versioning document was formally adopted last
> > August and so a year will have soon elapsed since we previously said
> > we wanted to have everything integration tested.
> >
> > With what I'm proposing the primary things that would not be tested
> > (if no progress in Java):
> >
> > * custom_metadata fields
> > * Extension Types
> > * Large (64-bit offset) variable size types
> > * Delta and Replacement Dictionaries
> > * Unions
> >
> > These do not seem like huge sacrifices, or at least not ones that
> > compromise the stability of the columnar format. Of course, if some of
> > them are completed in the next 10-12 weeks, then that's great.
> >
> > - Wes
> >
> > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <[email protected]> wrote:
> > >
> > > I really would like to see a 1.0.0 release with complete
> implementations
> > > for C++ and Java. From my experience, that interoperability has been a
> > > major selling point for the project. That being said, my time for
> > > contributions has been pretty limited lately and I know that Java has
> > been
> > > lagging, so if the rest of the community would like to push forward
> with
> > a
> > > reduced scope, that is okay with me. I'll still continue to do what I
> can
> > > on Java to fill in the gaps.
> > >
> > > Bryan
> > >
> > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <[email protected]>
> > wrote:
> > >
> > > > Hi all -- are there some opinions about this?
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <[email protected]>
> > wrote:
> > > > >
> > > > > hi folks,
> > > > >
> > > > > Previously we had discussed a plan for making a 1.0.0 release based
> > on
> > > > > completeness of columnar format integration tests and making
> > > > > forward/backward compatibility guarantees as formalized in
> > > > >
> > > > >
> > > >
> >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > > > >
> > > > > In particular, we wanted to demonstrate comprehensive Java/C++
> > > > interoperability.
> > > > >
> > > > > As time has passed we have stalled out a bit on completing
> > integration
> > > > > tests for the "long tail" of data types and columnar format
> features.
> > > > >
> > > > >
> > > >
> >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > > > >
> > > > > As such I wanted to propose a reduction in scope so that we can
> make
> > a
> > > > > 1.0.0 release sooner. The plan would be as follows:
> > > > >
> > > > > * Endeavor to have integration tests implemented and working in at
> > > > > least one reference implementation (likely to be the C++ library).
> It
> > > > > seems important to verify that what's in Columnar.rst is able to be
> > > > > unambiguously implemented.
> > > > > * Indicate in Versioning.rst or another place in the documentation
> > the
> > > > > list of data types or advanced columnar format features (like
> > > > > delta/replacement dictionaries) that are not yet fully integration
> > > > > tested.
> > > > >
> > > > > Some of the essential protocol stability details and all of the
> most
> > > > > commonly used data types have been stable for a long time now,
> > > > > particularly after the recent alignment change. The current list of
> > > > > features that aren't being tested for cross-implementation
> > > > > compatibility should not pose risk to downstream users.
> > > > >
> > > > > Thoughts about this? The 1.0.0 release is an important milestone
> for
> > > > > the project and will help build continued momentum in developer and
> > > > > user community growth.
> > > > >
> > > > > Thanks
> > > > > Wes
> > > >
> >
>

Reply via email to