Jorge,

* in rust, run integration tests against the latest apache/master on every
> PR
>

I've started to familiarize myself with the archery integration framework
over the last few days. Could you clarify for the "archery novices" what
exactly ^ this line would mean? Does apache/master refer to the C++
implementation as the "reference implementation", so rust would test
against/integrate with it? Or is it the arrow JSON format that needs to be
consumed into valid arrow in-memory, then produce the same arrow JSON from
in-memory arrow (this seems to be the extent of the go integration tests at
least)?

Sorry if this easily answerable from knowing archery better, but I'm still
in the learning/discovery phase of how exactly all the integration tests
are setup/run.

-Jacob


On Sat, Apr 10, 2021 at 1:03 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> Wrt to integration tests, I agree that it is important to have a plan prior
> to this.
>
> What we have been doing in the apache/arrow:
>
> 1. only release if integration tests pass against each other
> 2. release the signed tar with the latest of every implementation (i.e.
> master)
>
> My suggestion for independent versioning:
>
> CI:
>
> * in rust, run integration tests against the latest apache/master on every
> PR
> * in apache/arrow, run integration tests against the latest released rust
> version
>
> Release mechanism:
>
> 1. an arrow crate can only be released if it passes integration tests
> against the current latest apache/arrow master
> 2. apache/arrow master can release if their integration tests pass against
> the latest released rust crate
>
> The common scenario is that the integration tests in apache/arrow against
> Rust pass, and thus
> apache/arrow would just need to bundle the latest rust release.
>
> If tests in apache/arrow fail, then some change in apache/arrow
> caused our latest release to stop integrating (since we integration-tested
> that version against master prior to our release).
> This implies that a current Rust release is out of spec and we thus must
> release a patch
> asap to correct for this (just like we would need to push a commit to
> apache/arrow asap).
> Once that patch is released, apache/arrow becomes green again and
> apache/arrow can bundle these on the signed apache arrow release.
>
> In the unlikely event that the latest release is unable to pass integration
> tests *and* despite the best efforts Rust is unable to release a patch in
> time, we *may* still bundle a previous release of the Rust crate, thereby
> not blocking the whole
> release (i.e. this allows us to fall back to a previous release without a
> mass revert on the apache/arrow repo).
>
> > * If Rust runs against the latest nightly of Arrow the how will Rust
> release without a new Arrow release?
>
> Not sure if this answers, but Rust does not compile or link against any
> implementation, so there are
> no ABI contracts. Its "only" contract is the spec (in-memory, IPC, flight,
> C data interface, etc).
>
> A related point is that when we release a Rust version, we can upload
> "integration test artifacts" separately (the same binaries that we
> currently use in our integration
> tests or a docker image with them), that apache/arrow can use to run
> integration tests.
> This would allow our CI at apache/arrow to download these artifacts and run
> tests as usual via archery and CLI,
> without having to compile them. This would alleviate some of the challenges
> around integration testing whereby every implementation is currently built
> on every run and in sequence.
>
> If someone thinks that it is useful, I would be happy to open a JIRA on
> this and draft a google docs
> to work out a technical design.
>
> Best,
> Jorge
>
>
> On Sat, Apr 10, 2021 at 1:57 AM Weston Pace <weston.p...@gmail.com> wrote:
>
> > > I'm assuming the idea is that the existing integration tests will
> remain
> > in apache/arrow. Will you also run the integration test suites on your
> rust
> > repository CI checks?
> >
> > Furthermore, against what version will these tests run?
> >
> > * If Arrow runs against the latest release of Rust then it will lag
> > behind and issues may be detected later.
> > * If Arrow runs against the latest nightly of Rust then things will
> > get tricky at release time (all Arrow integrations tests pass but Rust
> > isn't ready to cut a new release and Arrow tests fail against the
> > latest released Rust).
> >
> > Assuming Rust is also running integration tests against Arrow
> > (probably a good idea) you get a similar problem (this one might be
> > trickier given the relative frequencies)...
> >
> > * If Rust runs against the latest release of Arrow then it will lag
> > behind (several months).  There will be a "catching up" period after
> > Arrow releases.
> > * If Rust runs against the latest nightly of Arrow the how will Rust
> > release without a new Arrow release?
> >
> > Note, these problems technically exist now with the concept that any
> > language can release a patch at any time.  Also, since Rust isn't
> > directly compiling against other Arrow libs and we are only talking
> > about interoperability it's probably not going to be too big of a
> > deal.  Still, worth giving some thought ahead of time.
> >
> > On Fri, Apr 9, 2021 at 1:11 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> > >
> > > >
> > > > With this explanation do you still have a concern? There is no
> > suggestion
> > > > of making releases that depend on GitHub hashes.
> > >
> > > No, I don't think so.  IIUC you are saying the crates dependency does
> not
> > > imply the crate artifacts are published elsewhere.  This sounds inline
> > with
> > > policies to me.  For some reason I thought the notion of crates implied
> > > publishing to Rusts package management system.
> > >
> > > On Fri, Apr 9, 2021 at 4:07 PM Andy Grove <andygrov...@gmail.com>
> wrote:
> > >
> > > > Hi Micah,
> > > >
> > > > During development, the Rust crates have local dependencies on each
> > other
> > > > based on relative file system paths. At release time, we change these
> > to
> > > > versioned dependencies before publishing, because it isn't possible
> to
> > > > publish a crate that depends on non-published crates.
> > > >
> > > > With the code in separate repositories, we would still need an
> > equivalent
> > > > mechanism for DataFusion to use the Arrow code that is under
> > development
> > > > but we would point to a GitHub hash rather than a relative path. We
> > should
> > > > still update to use versioned dependencies when releasing.
> > > >
> > > > I will revise the text in the document to better explain what this
> > means.
> > > >
> > > > With this explanation do you still have a concern? There is no
> > suggestion
> > > > of making releases that depend on GitHub hashes.
> > > >
> > > > Thanks,
> > > >
> > > > Andy.
> > > >
> > > >
> > > >
> > > > On Fri, Apr 9, 2021 at 4:57 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > wrote:
> > > >
> > > >> >
> > > >> > " Crates can depend on GitHub commit hashes between releases"
> > > >>
> > > >>
> > > >> This sounds  like it might not align with ASF release policies [1].
> > > >>
> > > >> [1]
> > https://www.apache.org/legal/release-policy.html#release-definition
> > > >>
> > > >> On Fri, Apr 9, 2021 at 1:34 PM Neal Richardson <
> > > >> neal.p.richard...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Thanks, Andy. Two areas of concern I think we should have some
> > answer
> > > >> for
> > > >> > before going forward with this (and I make no opinions as to what
> > the
> > > >> > "right" answers are, just raising them for discussion):
> > > >> >
> > > >> > 1. Integration testing: what is our workflow for ensuring that our
> > > >> > implementations are integration tested, and what do we do when
> > changes
> > > >> > (whether in apache/arrow or in apache/arrow-rs) introduce
> > > >> > regressions/failures? I'm assuming the idea is that the existing
> > > >> > integration tests will remain in apache/arrow. Will you also run
> the
> > > >> > integration test suites on your rust repository CI checks?
> > > >> > 2. Versioning: one rationale from our current policy of "everyone
> > > >> releases
> > > >> > together" is that you don't have to guess as much whether (for
> > example)
> > > >> > Arrow Java 3.0 and Arrow Rust 3.0 are compatible and using the
> same
> > > >> format.
> > > >> > It's kind of a heuristic for what library versions were
> integration
> > > >> tested
> > > >> > with each other. It sounds like (but maybe I misunderstand) that
> > y'all
> > > >> are
> > > >> > looking to break from that. But if Arrow C++ goes to version 7.0
> by
> > the
> > > >> end
> > > >> > of the year and arrow-rs chooses to go to 15.4, or 3.12, or
> > whatever,
> > > >> does
> > > >> > that create confusion or doubt that works against the Arrow goal
> of
> > easy
> > > >> > interoperability?
> > > >> >
> > > >> > Neal
> > > >> >
> > > >> > On Fri, Apr 9, 2021 at 8:18 AM Andy Grove <andygrov...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Following on from the email thread "Rust sync meeting" I would
> > like to
> > > >> > > start a new discussion about moving the Rust components out to
> new
> > > >> GitHub
> > > >> > > repositories and using a new process for issues and release
> > > >> management.
> > > >> > >
> > > >> > > I have started a Google document [1] with details and to track
> the
> > > >> work
> > > >> > > required for this effort but I will summarize the key points of
> > the
> > > >> > > proposal here:
> > > >> > >
> > > >> > >
> > > >> > >    -
> > > >> > >
> > > >> > >    Move existing Rust code into two new repositories
> > > >> > >    -
> > > >> > >
> > > >> > >       apache/arrow-rs
> > > >> > >       -
> > > >> > >
> > > >> > >          Arrow + Parquet crates
> > > >> > >          -
> > > >> > >
> > > >> > >       apache/datafusion
> > > >> > >       -
> > > >> > >
> > > >> > >          DataFusion + Ballista crates (which are expected to
> > merge to
> > > >> > some
> > > >> > >          degree over time)
> > > >> > >          -
> > > >> > >
> > > >> > >          TPC-H benchmarks
> > > >> > >          -
> > > >> > >
> > > >> > >       Use GitHub issues for issue tracking
> > > >> > >       -
> > > >> > >
> > > >> > >    Decouple release process
> > > >> > >    -
> > > >> > >
> > > >> > >       Crates are released individually
> > > >> > >       -
> > > >> > >
> > > >> > >       A vote on the source release of the released crate is held
> > over
> > > >> the
> > > >> > >       mailing list as usual.
> > > >> > >       -
> > > >> > >
> > > >> > >       Rust does not need to release a new version when the rest
> of
> > > >> Arrow
> > > >> > >       releases; we bundle our latest released crates to the
> signed
> > > >> tar.
> > > >> > >       -
> > > >> > >
> > > >> > >       Crates can depend on GitHub commit hashes between releases
> > > >> > >
> > > >> > >
> > > >> > > The Google document may be the best place to collaborate on the
> > > >> proposal
> > > >> > > but I can update the document based on any comments in this
> email
> > > >> thread
> > > >> > as
> > > >> > > well.
> > > >> > >
> > > >> > > Note that I have excluded discussion about arrow2/parquet2 from
> > this
> > > >> > > proposal and I believe we should discuss that separately as a
> > > >> follow-on
> > > >> > > discussion.
> > > >> > >
> > > >> > > I look forward to hearing opinions on this both from current
> Rust
> > > >> > > maintainers and contributors and also from the wider Arrow
> > community.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Andy.
> > > >> > >
> > > >> > > [1]
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> >
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>

Reply via email to