I think this makes sense, and it is a good clarification on process. It might be a good idea to also give a preliminary vote on an existing backlog of JIRAs so people don't waste time starting PRs if they won't be supported (or supported via a custom metadata type). I've included a list below doing a quick search through JIRA.
Issue with open PRs: [ARROW-352] [Format] Interval(DAY_TIME) has no unit [1] [ARROW-835] [Format] Add Timedelta type to describe time intervals [1] [ARROW-4810] [Format][C++] Add "LargeList" type with 64-bit offsets [2] And these (Apologies if I missed a the pull requests for them): [ARROW-638] Add metadata for single and double precision complex numbers [ARROW-2296] Add num_rows to file footer (file format) [ARROW-4651] [Format] Flight Location should be more flexible than a (host, port) pair [ARROW-300] [Format] Add buffer compression option to IPC file format [ARROW-1790] [Format] Define logical data type that represents a "packed C struct" composed from other fixed-size primitive types [ARROW-412] [Format] Handling of buffer padding in the IPC metadata [ARROW-2152] [Format] UUID type [ARROW-730] [Format] Define Flatbuffers metadata for random-access compressed block memory format [ARROW-750] [Format] Add LargeBinary and LargeString types [ARROW-2009] [Format] Support 32- and 64-bit decimals in IPC messages [ARROW-1614] [C++] Add a tensor logical value type [ARROW-3263] [R] Use R sentinel values for missingness in addition to bitmask [1] https://github.com/apache/arrow/pull/3644 [2] https://github.com/apache/arrow/pull/3848 On Sun, Mar 17, 2019 at 6:07 PM Jacques Nadeau <jacq...@apache.org> wrote: > > > > How about "at least two native implementations" instead of > > "Java and C++"? Now, we have multiple native > > implementations: > > > > I think we should have two complete implementations. I don't think having > one feature in C# and Go and another in JavaScript and Rust does justice to > the project goals. I think Java and C++ should always be complete. They are > the first two implementations. I believe they are the most complete and > broadly used/popular (C++ given Python & Pandas integration and Java via > Spark & Dremio). This is a compromise between setting a high barrier for > creation of new features and making sure that we have validated things > across impls. > > Are there specific changes to format/ that have been merged that you > > are concerned about that you feel need to be discussed separately? > > There have been some changes related to serializing tensor metadata > > that are clearly marked as experimental, and they also do not interact > > with the columnar format. > > > There are several things we've introduced over time that suffered this > problem. Alignment changes, dictionary encoding, union behavior, interval > behavior, tensors, unsigned integrations, etc that we've failed to make > sure we have integration tests for. I've meant to send this email for > months but saw a couple of recent proposed changes which made me feel like > we should discuss further. >