What if we released "beta" [1] versions of arrow on cargo at whatever pace was necessary? That way dependent crates could opt in to bleeding edge functionality / APIs.
There is tension between full technical freedom to change APIs and the needs of downstream projects for a more stable API. Whatever its technical faults may be, projects that rely on arrow (such as anything based on DataFusion, like my own) need to be supported as they have made the bet on Rust Arrow. I don't think we can abandon maintenance on the existing codebase until we have a successor ready. Andrew p.s. I personally very much like Adam's suggestion for "Arrow 6.0 in Oct 2021 be based on arrow2" but that is predicated on wanting to have arrow2 widely used by downstreams at that point. [1] https://stackoverflow.com/questions/46373028/how-to-release-a-beta-version-of-a-crate-for-limited-public-testing On Sat, Jul 17, 2021 at 5:56 AM Adam Lippai <a...@rigo.sk> wrote: > 5.0 is being released right now, which means from timing perspective this > is the worst moment for arrow2, indeed. You'd need to wait the full 3 > months. On the other hand does releasing a 6.0 beta based on arrow2 on Aug > 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a bad > plan? > > I don't think a 6.0-beta release would be confusing and dedicating most of > the 5.0->6.0 cycle to this change doesn't sound excessive. > > I think this approach wouldn't result in extra work (backporting the > important changes to 5.1,5.2 release). It only shows the magnitude of this > change, the work would be done by you anyways, this would just make it > clear this is a huge effort. > > Best regards, > Adam Lippai > > On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão <jorgecarlei...@gmail.com > > > wrote: > > > Hi, > > > > Arrow2 and parquet2 have passed the IP clearance vote and are ready to be > > merged to apache/* repos. > > > > My plan is to merge them and PR to both of them to the latest updates on > my > > own repo, so that I can temporarily (and hopefully permanently) archive > the > > versions of my account and move development to apache/*. > > > > Most of the work happening in arrow-rs is backward compatible or simple > to > > deprecate. However, this situation is different in arrow2 and parquet2. A > > release cadence of a major every 3 months is prohibitive at the pace > that I > > am plowing through. > > > > The core API (types, alloc, buffer, bitmap, array, mutable array) is imo > > stable and not prone to change much, but the non-core API (namely IO and > > compute) is prone to change. Examples: > > > > * Add Scalar API to allow dynamic casting over the aggregate kernels and > > parquet statistics > > * move compute/ from the arrow crate into a separate crate > > * move io/ from the arrow crate into a separate crate > > * add option to select encoding based on DataType and field name when > > writing to parquet > > > > (I will create issues for them in the experimental repos for proper > > visibility and discussion). > > > > This situation is usually addressed via the 0.X model in semver 2 (in > > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant example > > that uses it, and almost all in Rust also uses it). However, there are a > > couple of blockers in this context: > > > > 1. We do not allow releases of experimental repos to avoid confusion over > > which is *the* official package. > > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx seem > to > > prefer a slower release cadence of breaking changes. > > > > On the other hand, other parts of the community do not care about this > > aspect. Polars for example, the fastest DataFrame in H2O benchmarks, > > currently maintains an arrow2 branch that is faster and safer than master > > [1], and will be releasing the Python binaries from the arrow2 branch. We > > would like to release the Rust API also based on arrow2, which requires > it > > to be in Cargo. > > > > The best “hack” that I can come up with given the constraints above is to > > release arrow2 and parquet2 in cargo.io from my personal account so that > > dependents can release to cargo while still making it obvious that they > are > > not the official release. However, this is obviously not ideal. > > > > Any suggestions? > > > > [1] https://github.com/pola-rs/polars/pull/922 > > > > Best, > > Jorge > > >