Re: [DISCUSS] The road from Arrow 0.5.0 to 1.0.0

Jacques Nadeau Mon, 24 Jul 2017 13:01:23 -0700

Top things on my list:

- Formalize Arrow RPC and/or REST
- Some reference transformation algorithms
- Prototype IPC


On Mon, Jul 24, 2017 at 9:47 AM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi folks,
>
> In recent discussions, since the Arrow memory format and metadata has
> become reasonably stabilized, and we're more likely to add new data
> types than change existing ones, we may consider making a 1.0.0 to
> declare to the rest of the open source world that "Arrow is open for
> business" and can be relied upon in production applications (which
> some reasonable tolerance for library API changes from major release
> to major release). I hope we can all agree that forward and backward
> compatibility in the zero-copy wire format and metadata is the most
> essential thing.
>
> To that end, I'd like to collect ideas for what needs to be
> accomplished in the project before we'd be comfortable making a 1.0.0
> release. I think it would be a good show of project stability /
> production-readiness to do this (with the caveat the APIs will
> continue to evolve).
>
> The main things on my end are hardening the memory format and
> integration tests for the remaining data types:
>
> - Decimals
>     - Lingering issues with 128-bit decimals
>     - Need integration tests
>   - Fixed size list
>     - Java has implemented, but not C++. Need integration tests
>   - Union
>     - Two kinds of unions, Java only implements one. Need integration tests
>
> On these, Decimals have the most work since the memory format needs to
> be specified. On Unions, we may decide to not implement the dense
> variant and focus on integration testing the sparse variant. I don't
> think this is going to be too much work, but it needs to get sorted
> out so we don't have incomplete or under-tested parts of the
> specification.
>
> There's some other things being discussed, like a Map logical type,
> but that (at least as currently proposed) won't require any disruptive
> modifications to the metadata.
>
> As far as the metadata and memory format, we would use the Open/Closed
> principle to guide our efforts
> (https://en.wikipedia.org/wiki/Open/closed_principle). For example, it
> would be possible to add compression or encoding at the field level
> without disrupting earlier versions of the software that lack these
> features.
>
> In the event that we do need to change the metadata or memory format
> in the future (which would probably be an extreme circumstance), we
> have the option of increasing the MetadataVersion which is one of the
> first tags accompanying Arrow messages
> (https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22).
> So if you encounter a message that you do not support, you can raise
> an appropriate exception.
>
> There are some other things that would be nice to prototype or
> specify, like a REST protocol for exposing Arrow datasets in a
> client-server model (sending Arrow record batches via REST HTTP
> calls).
>
> Anything else that would need to go to move to a 1.x mainline for
> development? One idea would be if we need to make any breaking changes
> that we would leap from 1.x to 2.0.0 and throw the 1.x branches into
> maintenance mode.
>
> Thanks
> Wes
>

Re: [DISCUSS] The road from Arrow 0.5.0 to 1.0.0

Reply via email to