Hi Simon, There are several Arrow implementations in parallel: https://arrow.apache.org/docs/status.html The Python and R versions are based on Arrow C++, others are completely separate projects. Arrow-rs and Arrow2 are referring to the Rust implementation, Arrow C++ is not going to be replaced.
The thread is about making a really big rewrite of the Arrow-rs implementation, now called arrow2. This will be a more idiomatic and safer Rust implementation, incorporating the experience collected during the Arrow-rs development. As the Arrow Rust community intends to keep a single Rust implementation, it's quite a challenge how to release such API breaks allowing the downstream projects to keep up with the development and provide feedback. Tl;dr Arrow2 replaces the Arrow-rs Rust implementation, it has nothing to do with the C++ implementation (other than all of them share the same concepts and implement the same standards, formats) Best regards, Adam Lippai On Sun, Jul 18, 2021 at 7:01 PM Simon Perkins <simon.perk...@gmail.com> wrote: > I'm interested to hear what the relation between arrow2, arrow-rs and the > main github apache/arrow is. Is the intention to replace the C++ codebase > with a rust implementation? > > The reason I'm asking is that I'm adding complex number support in the C++ > codebase. It may instead be a better idea to do this in the Rust > implementation if it is indeed replacing the C++ implementation. > > > > On Sat, Jul 17, 2021 at 1:59 PM Andrew Lamb <al...@influxdata.com> wrote: > > > What if we released "beta" [1] versions of arrow on cargo at whatever > pace > > was necessary? That way dependent crates could opt in to bleeding edge > > functionality / APIs. > > > > There is tension between full technical freedom to change APIs and the > > needs of downstream projects for a more stable API. > > > > Whatever its technical faults may be, projects that rely on arrow (such > as > > anything based on DataFusion, like my own) need to be supported as they > > have made the bet on Rust Arrow. I don't think we can abandon maintenance > > on the existing codebase until we have a successor ready. > > > > Andrew > > > > p.s. I personally very much like Adam's suggestion for "Arrow 6.0 in Oct > > 2021 be based on arrow2" but that is predicated on wanting to have arrow2 > > widely used by downstreams at that point. > > > > [1] > > > > > https://stackoverflow.com/questions/46373028/how-to-release-a-beta-version-of-a-crate-for-limited-public-testing > > > > > > On Sat, Jul 17, 2021 at 5:56 AM Adam Lippai <a...@rigo.sk> wrote: > > > > > 5.0 is being released right now, which means from timing perspective > this > > > is the worst moment for arrow2, indeed. You'd need to wait the full 3 > > > months. On the other hand does releasing a 6.0 beta based on arrow2 on > > Aug > > > 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a > bad > > > plan? > > > > > > I don't think a 6.0-beta release would be confusing and dedicating most > > of > > > the 5.0->6.0 cycle to this change doesn't sound excessive. > > > > > > I think this approach wouldn't result in extra work (backporting the > > > important changes to 5.1,5.2 release). It only shows the magnitude of > > this > > > change, the work would be done by you anyways, this would just make it > > > clear this is a huge effort. > > > > > > Best regards, > > > Adam Lippai > > > > > > On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão < > > jorgecarlei...@gmail.com > > > > > > > wrote: > > > > > > > Hi, > > > > > > > > Arrow2 and parquet2 have passed the IP clearance vote and are ready > to > > be > > > > merged to apache/* repos. > > > > > > > > My plan is to merge them and PR to both of them to the latest updates > > on > > > my > > > > own repo, so that I can temporarily (and hopefully permanently) > archive > > > the > > > > versions of my account and move development to apache/*. > > > > > > > > Most of the work happening in arrow-rs is backward compatible or > simple > > > to > > > > deprecate. However, this situation is different in arrow2 and > > parquet2. A > > > > release cadence of a major every 3 months is prohibitive at the pace > > > that I > > > > am plowing through. > > > > > > > > The core API (types, alloc, buffer, bitmap, array, mutable array) is > > imo > > > > stable and not prone to change much, but the non-core API (namely IO > > and > > > > compute) is prone to change. Examples: > > > > > > > > * Add Scalar API to allow dynamic casting over the aggregate kernels > > and > > > > parquet statistics > > > > * move compute/ from the arrow crate into a separate crate > > > > * move io/ from the arrow crate into a separate crate > > > > * add option to select encoding based on DataType and field name when > > > > writing to parquet > > > > > > > > (I will create issues for them in the experimental repos for proper > > > > visibility and discussion). > > > > > > > > This situation is usually addressed via the 0.X model in semver 2 (in > > > > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant > > example > > > > that uses it, and almost all in Rust also uses it). However, there > are > > a > > > > couple of blockers in this context: > > > > > > > > 1. We do not allow releases of experimental repos to avoid confusion > > over > > > > which is *the* official package. > > > > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx > seem > > > to > > > > prefer a slower release cadence of breaking changes. > > > > > > > > On the other hand, other parts of the community do not care about > this > > > > aspect. Polars for example, the fastest DataFrame in H2O benchmarks, > > > > currently maintains an arrow2 branch that is faster and safer than > > master > > > > [1], and will be releasing the Python binaries from the arrow2 > branch. > > We > > > > would like to release the Rust API also based on arrow2, which > requires > > > it > > > > to be in Cargo. > > > > > > > > The best “hack” that I can come up with given the constraints above > is > > to > > > > release arrow2 and parquet2 in cargo.io from my personal account so > > that > > > > dependents can release to cargo while still making it obvious that > they > > > are > > > > not the official release. However, this is obviously not ideal. > > > > > > > > Any suggestions? > > > > > > > > [1] https://github.com/pola-rs/polars/pull/922 > > > > > > > > Best, > > > > Jorge > > > > > > > > > >