5.0 is being released right now, which means from timing perspective this is the worst moment for arrow2, indeed. You'd need to wait the full 3 months. On the other hand does releasing a 6.0 beta based on arrow2 on Aug 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a bad plan?
I don't think a 6.0-beta release would be confusing and dedicating most of the 5.0->6.0 cycle to this change doesn't sound excessive. I think this approach wouldn't result in extra work (backporting the important changes to 5.1,5.2 release). It only shows the magnitude of this change, the work would be done by you anyways, this would just make it clear this is a huge effort. Best regards, Adam Lippai On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote: > Hi, > > Arrow2 and parquet2 have passed the IP clearance vote and are ready to be > merged to apache/* repos. > > My plan is to merge them and PR to both of them to the latest updates on my > own repo, so that I can temporarily (and hopefully permanently) archive the > versions of my account and move development to apache/*. > > Most of the work happening in arrow-rs is backward compatible or simple to > deprecate. However, this situation is different in arrow2 and parquet2. A > release cadence of a major every 3 months is prohibitive at the pace that I > am plowing through. > > The core API (types, alloc, buffer, bitmap, array, mutable array) is imo > stable and not prone to change much, but the non-core API (namely IO and > compute) is prone to change. Examples: > > * Add Scalar API to allow dynamic casting over the aggregate kernels and > parquet statistics > * move compute/ from the arrow crate into a separate crate > * move io/ from the arrow crate into a separate crate > * add option to select encoding based on DataType and field name when > writing to parquet > > (I will create issues for them in the experimental repos for proper > visibility and discussion). > > This situation is usually addressed via the 0.X model in semver 2 (in > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant example > that uses it, and almost all in Rust also uses it). However, there are a > couple of blockers in this context: > > 1. We do not allow releases of experimental repos to avoid confusion over > which is *the* official package. > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx seem to > prefer a slower release cadence of breaking changes. > > On the other hand, other parts of the community do not care about this > aspect. Polars for example, the fastest DataFrame in H2O benchmarks, > currently maintains an arrow2 branch that is faster and safer than master > [1], and will be releasing the Python binaries from the arrow2 branch. We > would like to release the Rust API also based on arrow2, which requires it > to be in Cargo. > > The best “hack” that I can come up with given the constraints above is to > release arrow2 and parquet2 in cargo.io from my personal account so that > dependents can release to cargo while still making it obvious that they are > not the official release. However, this is obviously not ideal. > > Any suggestions? > > [1] https://github.com/pola-rs/polars/pull/922 > > Best, > Jorge >