Hi, I meant to stop releasing "arrow" in crates.io and start releasing it as "arrow2" under a different versioning schema; like "psycopg" -> "psycopg2" in pypi and others that suffered from large architectural changes that required a different versioning that better represents the state of the new API.
> The only thing preventing that movement is for you to decide you are ready to release it to the wider audience and then let us help you do that. Uhm? imo that is a bit misleading, but there you go: https://crates.io/crates/arrow2 and https://crates.io/crates/parquet2 : now they are available to the wider audience. imo the disagreement here is over how we version arrow2. Since there is no consensus, I propose that we postpone this to a later point when the APIs are mature enough to be released under Arrow's stable versioning schema. Until then, I need them in crates.io to be able to gather feedback about the API, its usability, missing stuff, etc. A bit of a bummer since I have been blocking releases to crates.io for the past 6 months and other Apache-related bureaucracies, but life goes on. Best, Jorge On Mon, Jul 19, 2021 at 3:05 PM Andrew Lamb <al...@influxdata.com> wrote: > > If we do indeed have the expectation of stability over its whole public > surface, > > I certainly do not have this expectation between major releases. Who does? > > I believe it is a disservice to the overall community to release two API > incompatible Rust implementations of Arrow to crates.io. It will > 1. potentially confuse new users > 2. split development effort > 3. encourage writing more code that relies on the old API. > > The Rust Arrow community has been *more than supportive* of the changes you > are proposing in arrow2 -- there is strong support for switching; The only > thing preventing that movement is for you to decide you are ready to > release it to the wider audience and then let us help you do that. > > Making major public API changes for additional benefit between arrow 5.0.0 > and arrow 6.0.0 (or other future versions) is perfectly compatible with > semantic versioning and other software projects. > > Andrew > > On Mon, Jul 19, 2021 at 2:08 AM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > Whatever its technical faults may be, projects that rely on arrow (such > as > > > anything based on DataFusion, like my own) need to be supported as they > > > have made the bet on Rust Arrow. > > > > > > > 1.X versioning in Apache Arrow was never meant to represent stability of > > their individual libraries, but only the stability of the C++/Python and > > the spec. It is a misconception that Rust implementation is stable and/or > > ready for production; its version is aligned with Apache Arrow general > > versioning simply for historical reasons. Requiring arrow2 to also be > > marked as stable is imo just dragging this onwards. > > > > As primary developer of arrow2 and a contributor of some of the major > > pieces of arrow-rs, I am saying that: > > > > * arrow-rs does not have a stable API: it requires large large > incompatible > > changes to even make it *safe* > > * arrow2 does not have a stable API: it requires incompatible changes to > > improve UX, performance, and functionality > > * using arrow2 core API results in faster, safer, and less error-prone > code > > > > The main difference is that arrow-rs requires API changes to its core > > modules (buffer, bytes, etc), while arrow2 requires changes to its > > peripheral modules (compute and IO). This is why imo we can make arrow2 > > available: expected changes will only break a small surface of the public > > API which, while incompatible, are easy to address. > > > > Which is the gist of my proposal: > > > > - Arrow2 starts its release in cargo.io as 0.1 > > - A major release (e.g. 0.16.2 -> 1.0.0): > > - must be voted > > - may be backward incompatible > > - Minor releases (e.g. 0.16.1 -> 0.17.0): > > - must be voted > > - may be backward incompatible > > - may have new features > > - Patch releases (e.g. 0.16.1 -> 0.16.2): > > - may be voted > > - must not be backward compatible > > - may have new features > > - Minor releases may have a maintenance period (e.g. 3+ months) over > > which we guarantee security patches and feature backports. > > - Major releases have a maintenance period over which we guarantee > > security patches and feature backports according to semver 2.0. > > > > So that: > > > > - It aligns expectations wrt to the current state of Rust's > > implementation > > - it offers support to downstream dependencies that require > longer-term > > stability > > - it offers room for developers to improve its API, scrutinize > security, > > etc. > > > > If we do indeed have an expectation of stability over its whole public > > surface, then I suggest that we keep arrow2 in the experimental repo as > it > > is today. > > > > Btw, this is why some in the Rust community recommend using smaller > crates: > > so that versioning is not bound to a large public API surface and can > thus > > more easily be applied to smaller surfaces. There is of course a tradeoff > > with maintenance of CI and releases. > > > > Best, > > Jorge > > > > On Sat, Jul 17, 2021 at 1:59 PM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > What if we released "beta" [1] versions of arrow on cargo at whatever > > pace > > > was necessary? That way dependent crates could opt in to bleeding edge > > > functionality / APIs. > > > > > > There is tension between full technical freedom to change APIs and the > > > needs of downstream projects for a more stable API. > > > > > > Whatever its technical faults may be, projects that rely on arrow (such > > as > > > anything based on DataFusion, like my own) need to be supported as they > > > have made the bet on Rust Arrow. I don't think we can abandon > maintenance > > > on the existing codebase until we have a successor ready. > > > > > > Andrew > > > > > > p.s. I personally very much like Adam's suggestion for "Arrow 6.0 in > Oct > > > 2021 be based on arrow2" but that is predicated on wanting to have > arrow2 > > > widely used by downstreams at that point. > > > > > > [1] > > > > > > > > > https://stackoverflow.com/questions/46373028/how-to-release-a-beta-version-of-a-crate-for-limited-public-testing > > > > > > > > > On Sat, Jul 17, 2021 at 5:56 AM Adam Lippai <a...@rigo.sk> wrote: > > > > > > > 5.0 is being released right now, which means from timing perspective > > this > > > > is the worst moment for arrow2, indeed. You'd need to wait the full 3 > > > > months. On the other hand does releasing a 6.0 beta based on arrow2 > on > > > Aug > > > > 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a > > bad > > > > plan? > > > > > > > > I don't think a 6.0-beta release would be confusing and dedicating > most > > > of > > > > the 5.0->6.0 cycle to this change doesn't sound excessive. > > > > > > > > I think this approach wouldn't result in extra work (backporting the > > > > important changes to 5.1,5.2 release). It only shows the magnitude of > > > this > > > > change, the work would be done by you anyways, this would just make > it > > > > clear this is a huge effort. > > > > > > > > Best regards, > > > > Adam Lippai > > > > > > > > On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão < > > > jorgecarlei...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Arrow2 and parquet2 have passed the IP clearance vote and are ready > > to > > > be > > > > > merged to apache/* repos. > > > > > > > > > > My plan is to merge them and PR to both of them to the latest > updates > > > on > > > > my > > > > > own repo, so that I can temporarily (and hopefully permanently) > > archive > > > > the > > > > > versions of my account and move development to apache/*. > > > > > > > > > > Most of the work happening in arrow-rs is backward compatible or > > simple > > > > to > > > > > deprecate. However, this situation is different in arrow2 and > > > parquet2. A > > > > > release cadence of a major every 3 months is prohibitive at the > pace > > > > that I > > > > > am plowing through. > > > > > > > > > > The core API (types, alloc, buffer, bitmap, array, mutable array) > is > > > imo > > > > > stable and not prone to change much, but the non-core API (namely > IO > > > and > > > > > compute) is prone to change. Examples: > > > > > > > > > > * Add Scalar API to allow dynamic casting over the aggregate > kernels > > > and > > > > > parquet statistics > > > > > * move compute/ from the arrow crate into a separate crate > > > > > * move io/ from the arrow crate into a separate crate > > > > > * add option to select encoding based on DataType and field name > when > > > > > writing to parquet > > > > > > > > > > (I will create issues for them in the experimental repos for proper > > > > > visibility and discussion). > > > > > > > > > > This situation is usually addressed via the 0.X model in semver 2 > (in > > > > > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant > > > example > > > > > that uses it, and almost all in Rust also uses it). However, there > > are > > > a > > > > > couple of blockers in this context: > > > > > > > > > > 1. We do not allow releases of experimental repos to avoid > confusion > > > over > > > > > which is *the* official package. > > > > > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx > > seem > > > > to > > > > > prefer a slower release cadence of breaking changes. > > > > > > > > > > On the other hand, other parts of the community do not care about > > this > > > > > aspect. Polars for example, the fastest DataFrame in H2O > benchmarks, > > > > > currently maintains an arrow2 branch that is faster and safer than > > > master > > > > > [1], and will be releasing the Python binaries from the arrow2 > > branch. > > > We > > > > > would like to release the Rust API also based on arrow2, which > > requires > > > > it > > > > > to be in Cargo. > > > > > > > > > > The best “hack” that I can come up with given the constraints above > > is > > > to > > > > > release arrow2 and parquet2 in cargo.io from my personal account > so > > > that > > > > > dependents can release to cargo while still making it obvious that > > they > > > > are > > > > > not the official release. However, this is obviously not ideal. > > > > > > > > > > Any suggestions? > > > > > > > > > > [1] https://github.com/pola-rs/polars/pull/922 > > > > > > > > > > Best, > > > > > Jorge > > > > > > > > > > > > > > >