I'm interested to hear what the relation between arrow2, arrow-rs and the main github apache/arrow is. Is the intention to replace the C++ codebase with a rust implementation?
The reason I'm asking is that I'm adding complex number support in the C++ codebase. It may instead be a better idea to do this in the Rust implementation if it is indeed replacing the C++ implementation. On Sat, Jul 17, 2021 at 1:59 PM Andrew Lamb <al...@influxdata.com> wrote: > What if we released "beta" [1] versions of arrow on cargo at whatever pace > was necessary? That way dependent crates could opt in to bleeding edge > functionality / APIs. > > There is tension between full technical freedom to change APIs and the > needs of downstream projects for a more stable API. > > Whatever its technical faults may be, projects that rely on arrow (such as > anything based on DataFusion, like my own) need to be supported as they > have made the bet on Rust Arrow. I don't think we can abandon maintenance > on the existing codebase until we have a successor ready. > > Andrew > > p.s. I personally very much like Adam's suggestion for "Arrow 6.0 in Oct > 2021 be based on arrow2" but that is predicated on wanting to have arrow2 > widely used by downstreams at that point. > > [1] > > https://stackoverflow.com/questions/46373028/how-to-release-a-beta-version-of-a-crate-for-limited-public-testing > > > On Sat, Jul 17, 2021 at 5:56 AM Adam Lippai <a...@rigo.sk> wrote: > > > 5.0 is being released right now, which means from timing perspective this > > is the worst moment for arrow2, indeed. You'd need to wait the full 3 > > months. On the other hand does releasing a 6.0 beta based on arrow2 on > Aug > > 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a bad > > plan? > > > > I don't think a 6.0-beta release would be confusing and dedicating most > of > > the 5.0->6.0 cycle to this change doesn't sound excessive. > > > > I think this approach wouldn't result in extra work (backporting the > > important changes to 5.1,5.2 release). It only shows the magnitude of > this > > change, the work would be done by you anyways, this would just make it > > clear this is a huge effort. > > > > Best regards, > > Adam Lippai > > > > On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão < > jorgecarlei...@gmail.com > > > > > wrote: > > > > > Hi, > > > > > > Arrow2 and parquet2 have passed the IP clearance vote and are ready to > be > > > merged to apache/* repos. > > > > > > My plan is to merge them and PR to both of them to the latest updates > on > > my > > > own repo, so that I can temporarily (and hopefully permanently) archive > > the > > > versions of my account and move development to apache/*. > > > > > > Most of the work happening in arrow-rs is backward compatible or simple > > to > > > deprecate. However, this situation is different in arrow2 and > parquet2. A > > > release cadence of a major every 3 months is prohibitive at the pace > > that I > > > am plowing through. > > > > > > The core API (types, alloc, buffer, bitmap, array, mutable array) is > imo > > > stable and not prone to change much, but the non-core API (namely IO > and > > > compute) is prone to change. Examples: > > > > > > * Add Scalar API to allow dynamic casting over the aggregate kernels > and > > > parquet statistics > > > * move compute/ from the arrow crate into a separate crate > > > * move io/ from the arrow crate into a separate crate > > > * add option to select encoding based on DataType and field name when > > > writing to parquet > > > > > > (I will create issues for them in the experimental repos for proper > > > visibility and discussion). > > > > > > This situation is usually addressed via the 0.X model in semver 2 (in > > > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant > example > > > that uses it, and almost all in Rust also uses it). However, there are > a > > > couple of blockers in this context: > > > > > > 1. We do not allow releases of experimental repos to avoid confusion > over > > > which is *the* official package. > > > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx seem > > to > > > prefer a slower release cadence of breaking changes. > > > > > > On the other hand, other parts of the community do not care about this > > > aspect. Polars for example, the fastest DataFrame in H2O benchmarks, > > > currently maintains an arrow2 branch that is faster and safer than > master > > > [1], and will be releasing the Python binaries from the arrow2 branch. > We > > > would like to release the Rust API also based on arrow2, which requires > > it > > > to be in Cargo. > > > > > > The best “hack” that I can come up with given the constraints above is > to > > > release arrow2 and parquet2 in cargo.io from my personal account so > that > > > dependents can release to cargo while still making it obvious that they > > are > > > not the official release. However, this is obviously not ideal. > > > > > > Any suggestions? > > > > > > [1] https://github.com/pola-rs/polars/pull/922 > > > > > > Best, > > > Jorge > > > > > >