I'm interested to hear what the relation between arrow2, arrow-rs and the
main github apache/arrow is. Is the intention to replace the C++ codebase
with a rust implementation?

The reason I'm asking is that I'm adding complex number support in the C++
codebase. It may instead be a better idea to do this in the Rust
implementation if it is indeed replacing the C++ implementation.



On Sat, Jul 17, 2021 at 1:59 PM Andrew Lamb <al...@influxdata.com> wrote:

> What if we released "beta" [1] versions of arrow on cargo at whatever pace
> was necessary? That way dependent crates could opt in to bleeding edge
> functionality / APIs.
>
> There is tension between full technical freedom to change APIs and the
> needs of downstream projects for a more stable API.
>
> Whatever its technical faults may be, projects that rely on arrow (such as
> anything based on DataFusion, like my own) need to be supported as they
> have made the bet on Rust Arrow. I don't think we can abandon maintenance
> on the existing codebase until we have a successor ready.
>
> Andrew
>
> p.s. I personally very much like Adam's suggestion for "Arrow 6.0 in Oct
> 2021 be based on arrow2" but that is predicated on wanting to have arrow2
> widely used by downstreams at that point.
>
> [1]
>
> https://stackoverflow.com/questions/46373028/how-to-release-a-beta-version-of-a-crate-for-limited-public-testing
>
>
> On Sat, Jul 17, 2021 at 5:56 AM Adam Lippai <a...@rigo.sk> wrote:
>
> > 5.0 is being released right now, which means from timing perspective this
> > is the worst moment for arrow2, indeed. You'd need to wait the full 3
> > months. On the other hand does releasing a 6.0 beta based on arrow2 on
> Aug
> > 1st, rc on Sept 1st and releasing the stable on Oct 1st sound like a bad
> > plan?
> >
> > I don't think a 6.0-beta release would be confusing and dedicating most
> of
> > the 5.0->6.0 cycle to this change doesn't sound excessive.
> >
> > I think this approach wouldn't result in extra work (backporting the
> > important changes to 5.1,5.2 release). It only shows the magnitude of
> this
> > change, the work would be done by you anyways, this would just make it
> > clear this is a huge effort.
> >
> > Best regards,
> > Adam Lippai
> >
> > On Sat, Jul 17, 2021, 11:31 Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Arrow2 and parquet2 have passed the IP clearance vote and are ready to
> be
> > > merged to apache/* repos.
> > >
> > > My plan is to merge them and PR to both of them to the latest updates
> on
> > my
> > > own repo, so that I can temporarily (and hopefully permanently) archive
> > the
> > > versions of my account and move development to apache/*.
> > >
> > > Most of the work happening in arrow-rs is backward compatible or simple
> > to
> > > deprecate. However, this situation is different in arrow2 and
> parquet2. A
> > > release cadence of a major every 3 months is prohibitive at the pace
> > that I
> > > am plowing through.
> > >
> > > The core API (types, alloc, buffer, bitmap, array, mutable array) is
> imo
> > > stable and not prone to change much, but the non-core API (namely IO
> and
> > > compute) is prone to change. Examples:
> > >
> > > * Add Scalar API to allow dynamic casting over the aggregate kernels
> and
> > > parquet statistics
> > > * move compute/ from the arrow crate into a separate crate
> > > * move io/ from the arrow crate into a separate crate
> > > * add option to select encoding based on DataType and field name when
> > > writing to parquet
> > >
> > > (I will create issues for them in the experimental repos for proper
> > > visibility and discussion).
> > >
> > > This situation is usually addressed via the 0.X model in semver 2 (in
> > > Python fastAPI <https://fastapi.tiangolo.com/> is a predominant
> example
> > > that uses it, and almost all in Rust also uses it). However, there are
> a
> > > couple of blockers in this context:
> > >
> > > 1. We do not allow releases of experimental repos to avoid confusion
> over
> > > which is *the* official package.
> > > 2. arrow-rs is at version 5, and some dependencies like IOx/Influx seem
> > to
> > > prefer a slower release cadence of breaking changes.
> > >
> > > On the other hand, other parts of the community do not care about this
> > > aspect. Polars for example, the fastest DataFrame in H2O benchmarks,
> > > currently maintains an arrow2 branch that is faster and safer than
> master
> > > [1], and will be releasing the Python binaries from the arrow2 branch.
> We
> > > would like to release the Rust API also based on arrow2, which requires
> > it
> > > to be in Cargo.
> > >
> > > The best “hack” that I can come up with given the constraints above is
> to
> > > release arrow2 and parquet2 in cargo.io from my personal account so
> that
> > > dependents can release to cargo while still making it obvious that they
> > are
> > > not the official release. However, this is obviously not ideal.
> > >
> > > Any suggestions?
> > >
> > > [1] https://github.com/pola-rs/polars/pull/922
> > >
> > > Best,
> > > Jorge
> > >
> >
>

Reply via email to