Hi Simon,

There are several Arrow implementations in parallel:
https://arrow.apache.org/docs/status.html
The Python and R versions are based on Arrow C++, others are
completely separate projects.
Arrow-rs and Arrow2 are referring to the Rust implementation, Arrow C++ is
not going to be replaced.

The thread is about making a really big rewrite of the Arrow-rs
implementation, now called arrow2.
This will be a more idiomatic and safer Rust implementation, incorporating
the experience collected during the Arrow-rs development.

As the Arrow Rust community intends to keep a single Rust implementation,
it's quite a challenge how to release such API breaks allowing the
downstream projects to keep up with the development and provide feedback.

Tl;dr Arrow2 replaces the Arrow-rs Rust implementation, it has nothing to
do with the C++ implementation (other than all of them share the same
concepts and implement the same standards, formats)

Best regards,
Adam Lippai

On Sun, Jul 18, 2021 at 7:01 PM Simon Perkins <simon.perk...@gmail.com>
wrote:

> I'm interested to hear what the relation between arrow2, arrow-rs and the
> main github apache/arrow is. Is the intention to replace the C++ codebase
> with a rust implementation?
>
> The reason I'm asking is that I'm adding complex number support in the C++
> codebase. It may instead be a better idea to do this in the Rust
> implementation if it is indeed replacing the C++ implementation.
>
>
>
> On Sat, Jul 17, 2021 at 1:59 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> > What if we released "beta" [1] versions of arrow on cargo at whatever
> pace
> > was necessary? That way dependent crates could opt in to bleeding edge
> > functionality / APIs.
> >
> > There is tension between full technical freedom to change APIs and the
> > needs of downstream projects for a more stable API.
> >
> > Whatever its technical faults may be, projects that rely on arrow (such
> as
> > anything based on DataFusion, like my own) need to be supported as they
> > have made the bet on Rust Arrow. I don't think we can abandon maintenance
> > on the existing codebase until we have a successor ready.
> >
> > Andrew
> >
> > p.s. I personally very much like Adam's suggestion for "Arrow 6.0 in Oct
> > 2021 be based on arrow2" but that is predicated on wanting to have arrow2
> > widely used by downstreams at that point.
> >
> > [1]
> >
> >
> https://stackoverflow.com/questions/46373028/how-to-release-a-beta-version-of-a-crate-for-limited-public-testing
> >
> >
> > On Sat, Jul 17, 2021 at 5:56 AM Adam Lippai <a...@rigo.sk> wrote:
> >
> > > 5.0 is being released right now, which means from timing perspective
> this
> > > is the worst moment for arrow2, indeed. You'd need to wait the full 3
> > > months. On the other hand does releasing a 6.0 beta based on arrow2 on
> > Aug
> > > 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a
> bad
> > > plan?
> > >
> > > I don't think a 6.0-beta release would be confusing and dedicating most
> > of
> > > the 5.0->6.0 cycle to this change doesn't sound excessive.
> > >
> > > I think this approach wouldn't result in extra work (backporting the
> > > important changes to 5.1,5.2 release). It only shows the magnitude of
> > this
> > > change, the work would be done by you anyways, this would just make it
> > > clear this is a huge effort.
> > >
> > > Best regards,
> > > Adam Lippai
> > >
> > > On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão <
> > jorgecarlei...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Arrow2 and parquet2 have passed the IP clearance vote and are ready
> to
> > be
> > > > merged to apache/* repos.
> > > >
> > > > My plan is to merge them and PR to both of them to the latest updates
> > on
> > > my
> > > > own repo, so that I can temporarily (and hopefully permanently)
> archive
> > > the
> > > > versions of my account and move development to apache/*.
> > > >
> > > > Most of the work happening in arrow-rs is backward compatible or
> simple
> > > to
> > > > deprecate. However, this situation is different in arrow2 and
> > parquet2. A
> > > > release cadence of a major every 3 months is prohibitive at the pace
> > > that I
> > > > am plowing through.
> > > >
> > > > The core API (types, alloc, buffer, bitmap, array, mutable array) is
> > imo
> > > > stable and not prone to change much, but the non-core API (namely IO
> > and
> > > > compute) is prone to change. Examples:
> > > >
> > > > * Add Scalar API to allow dynamic casting over the aggregate kernels
> > and
> > > > parquet statistics
> > > > * move compute/ from the arrow crate into a separate crate
> > > > * move io/ from the arrow crate into a separate crate
> > > > * add option to select encoding based on DataType and field name when
> > > > writing to parquet
> > > >
> > > > (I will create issues for them in the experimental repos for proper
> > > > visibility and discussion).
> > > >
> > > > This situation is usually addressed via the 0.X model in semver 2 (in
> > > > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant
> > example
> > > > that uses it, and almost all in Rust also uses it). However, there
> are
> > a
> > > > couple of blockers in this context:
> > > >
> > > > 1. We do not allow releases of experimental repos to avoid confusion
> > over
> > > > which is *the* official package.
> > > > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx
> seem
> > > to
> > > > prefer a slower release cadence of breaking changes.
> > > >
> > > > On the other hand, other parts of the community do not care about
> this
> > > > aspect. Polars for example, the fastest DataFrame in H2O benchmarks,
> > > > currently maintains an arrow2 branch that is faster and safer than
> > master
> > > > [1], and will be releasing the Python binaries from the arrow2
> branch.
> > We
> > > > would like to release the Rust API also based on arrow2, which
> requires
> > > it
> > > > to be in Cargo.
> > > >
> > > > The best “hack” that I can come up with given the constraints above
> is
> > to
> > > > release arrow2 and parquet2 in cargo.io from my personal account so
> > that
> > > > dependents can release to cargo while still making it obvious that
> they
> > > are
> > > > not the official release. However, this is obviously not ideal.
> > > >
> > > > Any suggestions?
> > > >
> > > > [1] https://github.com/pola-rs/polars/pull/922
> > > >
> > > > Best,
> > > > Jorge
> > > >
> > >
> >
>

Reply via email to