I think this makes sense, and it is a good clarification on process.  It
might be a good idea to also give a preliminary vote on an existing backlog
of JIRAs so people don't waste time starting PRs if they won't be supported
(or supported via a custom metadata type).  I've included a list below
doing a quick search through JIRA.

Issue with open PRs:
[ARROW-352] [Format] Interval(DAY_TIME) has no unit [1]
[ARROW-835] [Format] Add Timedelta type to describe time intervals [1]
[ARROW-4810] [Format][C++] Add "LargeList" type with 64-bit offsets [2]

And these (Apologies if I missed a the pull requests for them):
[ARROW-638] Add metadata for single and double precision complex numbers
[ARROW-2296]  Add num_rows to file footer (file format)
[ARROW-4651] [Format] Flight Location should be more flexible than a (host,
port) pair
[ARROW-300] [Format] Add buffer compression option to IPC file format
[ARROW-1790] [Format] Define logical data type that represents a "packed C
struct" composed from other fixed-size primitive types
[ARROW-412] [Format] Handling of buffer padding in the IPC metadata
[ARROW-2152] [Format] UUID type
[ARROW-730] [Format] Define Flatbuffers metadata for random-access
compressed block memory format
[ARROW-750] [Format] Add LargeBinary and LargeString types
[ARROW-2009] [Format] Support 32- and 64-bit decimals in IPC messages
[ARROW-1614] [C++] Add a tensor logical value type
[ARROW-3263] [R] Use R sentinel values for missingness in addition to
bitmask

[1] https://github.com/apache/arrow/pull/3644
[2] https://github.com/apache/arrow/pull/3848


On Sun, Mar 17, 2019 at 6:07 PM Jacques Nadeau <jacq...@apache.org> wrote:

> >
> > How about "at least two native implementations" instead of
> > "Java and C++"? Now, we have multiple native
> > implementations:
> >
>
> I think we should have two complete implementations. I don't think having
> one feature in C# and Go and another in JavaScript and Rust does justice to
> the project goals. I think Java and C++ should always be complete. They are
> the first two implementations. I believe they are the most complete and
> broadly used/popular (C++ given Python & Pandas integration and Java via
> Spark & Dremio). This is a compromise between setting a high barrier for
> creation of new features and making sure that we have validated things
> across impls.
>
> Are there specific changes to format/ that have been merged that you
> > are concerned about that you feel need to be discussed separately?
> > There have been some changes related to serializing tensor metadata
> > that are clearly marked as experimental, and they also do not interact
> > with the columnar format.
>
>
> There are several things we've introduced over time that suffered this
> problem. Alignment changes, dictionary encoding, union behavior, interval
> behavior, tensors, unsigned integrations, etc that we've failed to make
> sure we have integration tests for. I've meant to send this email for
> months but saw a couple of recent proposed changes which made me feel like
> we should discuss further.
>

Reply via email to