hi Krisztian,

On Mon, Apr 12, 2021 at 8:41 AM Krisztián Szűcs
<szucs.kriszt...@gmail.com> wrote:
>
> Hi,
>
> Based on the google document I see one actual problem and two actions
> not explicitly solving the real issue.
>
> # Issue: Decouple release process to enable independent releases
>
> This is something the whole project requires, not just the rust
> implementation. Eventually every implementation should have its own
> release cycle not blocked on others.
> Even if we want to tackle this just for the rust implementation in the
> first iteration, we need to figure out the right versioning scheme,
> voting process and integration testing. Note that we have already
> decided to decouple the source release process from the release of the
> binaries. We also need to figure out how to handle the source release
> itself once we provide different artifacts for different
> implementations.

I don't think it's necessary to figure out all this right now. For
now, this seems to be a Rust-specific issue.

> # Action 1: Maintain the rust implementation in additional apache repositories
>
> Pros:
> - presumably would make it easier to define the dependencies between
> the rust crates (though cargo should support multiple crates in a
> single repository)
> Cons:
> - inconsistent: if there is apache/arrow-rs why isn't there apache/arrow-js

I don't think this matters, nor will anyone actually care in practice.
If someone google searches for "Arrow Rust" or "Arrow JavaScript",
they'll find what they need.

> - decrease project visibility: having all of the implementations in a
> single repository makes other implementations trivial to find

As long as things are documented clearly in READMEs and the READMEs
have links to each other, I think this is okay.

> - introduces a lot of complexity for the CI/CD processes, and not just
> for the rust folks

Could you explain this? In apache/arrow, we're basically just deleting
stuff from our CI/CD configurations. In integration testing, we would
have an additional git checkout phase to add a pinned version of Rust
to the integration tests.

> - will make harder to interface/link between different implementations
>
> This seems unreasonable to me.

It seems we won't be able to satisfy every requirement. Creating
interdependent projects will have more work, but we will have to
create tools to facilitate this if/when it becomes a concern.

>
> # Action 2: Use github for issue tracking
>
> Pros:
> - easier for new contributors
> - more flexible in certain ways
> Cons:
> - not the apache way of issue tracking

This isn't true — several other Apache projects use GitHub issues.

> - doubtful outcome for large number of issues

This isn't each of our problem to solve — if the Rust projects become
disorganized in their issues, we can bring it up on the mailing list
and discuss remedies in the future.

>
> I don't like either JIRA but I can live with it, though I understand
> the frustration around it.
> Since GH issues vs. JIRA seems like a hot topic lately we could try to
> experiment with a less radical change: enable github issues for the
> whole project and sync them to JIRA (either by using an existing
> service or by developing a github action for it). We may end up
> preferring github issues eventually.

This seems like a can of worms. I think that there is a cultural
expectation in the Rust community for individual crates to have their
own respective GitHub issues. So this change is allowing for that.

I don't see a need to change our issue management in the rest of the
project. The C++ project and its dependents behave increasingly like
an "enterprise" project in its development culture where the more
structured Jira approach is a good fit.

>
> All in all, I find this proposal way too invasive. It sounds more like
> starting a new project with its own governance rather than making
> releases more accessible to users.

I'm taking a laissez-faire attitude here — if the Rust developers want
to implement this change, I'm happy for them go ahead and do it. Since
it is almost strictly subtractive to apache/arrow, it should not
create extra burdens for non-Rust developers.

In general, the nature of the conflict that we've been having is one
programming language forcing conformity on another. Just as we're
saying to let Rust adopt its cultural norms, there shouldn't be an
expectation that other parts of Apache Arrow should be conforming to
things from the Rust ecosystem.

Regarding governance: there are no governance changes.

* Committers and PMC members still have to be approved in the same way
* The Arrow PMC will vote on releases

In Parquet, for example we have the parquet-mr and parquet-format
repositories, which release separately but share a common
committership and PMC.

>
> Thanks, Krisztian
>
>
> On Fri, Apr 9, 2021 at 5:18 PM Andy Grove <andygrov...@gmail.com> wrote:
> >
> > Following on from the email thread "Rust sync meeting" I would like to
> > start a new discussion about moving the Rust components out to new GitHub
> > repositories and using a new process for issues and release management.
> >
> > I have started a Google document [1] with details and to track the work
> > required for this effort but I will summarize the key points of the
> > proposal here:
> >
> >
> >    -
> >
> >    Move existing Rust code into two new repositories
> >    -
> >
> >       apache/arrow-rs
> >       -
> >
> >          Arrow + Parquet crates
> >          -
> >
> >       apache/datafusion
> >       -
> >
> >          DataFusion + Ballista crates (which are expected to merge to some
> >          degree over time)
> >          -
> >
> >          TPC-H benchmarks
> >          -
> >
> >       Use GitHub issues for issue tracking
> >       -
> >
> >    Decouple release process
> >    -
> >
> >       Crates are released individually
> >       -
> >
> >       A vote on the source release of the released crate is held over the
> >       mailing list as usual.
> >       -
> >
> >       Rust does not need to release a new version when the rest of Arrow
> >       releases; we bundle our latest released crates to the signed tar.
> >       -
> >
> >       Crates can depend on GitHub commit hashes between releases
> >
> >
> > The Google document may be the best place to collaborate on the proposal
> > but I can update the document based on any comments in this email thread as
> > well.
> >
> > Note that I have excluded discussion about arrow2/parquet2 from this
> > proposal and I believe we should discuss that separately as a follow-on
> > discussion.
> >
> > I look forward to hearing opinions on this both from current Rust
> > maintainers and contributors and also from the wider Arrow community.
> >
> > Thanks,
> >
> > Andy.
> >
> > [1]
> > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing

Reply via email to