Top things on my list: - Formalize Arrow RPC and/or REST - Some reference transformation algorithms - Prototype IPC
On Mon, Jul 24, 2017 at 9:47 AM, Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > In recent discussions, since the Arrow memory format and metadata has > become reasonably stabilized, and we're more likely to add new data > types than change existing ones, we may consider making a 1.0.0 to > declare to the rest of the open source world that "Arrow is open for > business" and can be relied upon in production applications (which > some reasonable tolerance for library API changes from major release > to major release). I hope we can all agree that forward and backward > compatibility in the zero-copy wire format and metadata is the most > essential thing. > > To that end, I'd like to collect ideas for what needs to be > accomplished in the project before we'd be comfortable making a 1.0.0 > release. I think it would be a good show of project stability / > production-readiness to do this (with the caveat the APIs will > continue to evolve). > > The main things on my end are hardening the memory format and > integration tests for the remaining data types: > > - Decimals > - Lingering issues with 128-bit decimals > - Need integration tests > - Fixed size list > - Java has implemented, but not C++. Need integration tests > - Union > - Two kinds of unions, Java only implements one. Need integration tests > > On these, Decimals have the most work since the memory format needs to > be specified. On Unions, we may decide to not implement the dense > variant and focus on integration testing the sparse variant. I don't > think this is going to be too much work, but it needs to get sorted > out so we don't have incomplete or under-tested parts of the > specification. > > There's some other things being discussed, like a Map logical type, > but that (at least as currently proposed) won't require any disruptive > modifications to the metadata. > > As far as the metadata and memory format, we would use the Open/Closed > principle to guide our efforts > (https://en.wikipedia.org/wiki/Open/closed_principle). For example, it > would be possible to add compression or encoding at the field level > without disrupting earlier versions of the software that lack these > features. > > In the event that we do need to change the metadata or memory format > in the future (which would probably be an extreme circumstance), we > have the option of increasing the MetadataVersion which is one of the > first tags accompanying Arrow messages > (https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22). > So if you encounter a message that you do not support, you can raise > an appropriate exception. > > There are some other things that would be nice to prototype or > specify, like a REST protocol for exposing Arrow datasets in a > client-server model (sending Arrow record batches via REST HTTP > calls). > > Anything else that would need to go to move to a 1.x mainline for > development? One idea would be if we need to make any breaking changes > that we would leap from 1.x to 2.0.0 and throw the 1.x branches into > maintenance mode. > > Thanks > Wes >