Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Yunze Xu Sun, 11 Jun 2023 21:07:17 -0700

Hi Lari,

Generally I agree with your points. The cherry-picking criteria is
very ambiguous and not clear. I also found some unexpected
cherry-picks recently. For example, #16014 [1] actually breaks the
compatibility of determining whether to use a TLS service URL for a
built-in Pulsar client. Though it might be acceptable for a major
release IMO, it's unacceptable to be cherry-picked into release
branches without applying the comment here [2]. You can see KoP is
also affected by this commit and we need to handle the breaking change
by these two PRs ([3] and [4]).


The root cause is that there is no clear cherry-picking criteria so
that there is no strict review for adding the `release/x.y.z` labels.
The `release/x.y.z` labels are added very casually.

Based on the example above, could you explain more about the merge
strategy and how could it avoid such cases? I found this thread talks
much about the PROs and CONs but not much about the strategy itself.
>From my understanding, assuming the LTS version is 3.0 and the next
feature release version is 3.1, we need to:
1. Open PRs to branch-3.0 for bug fixes and merge it to branch-3.1?
2. Open PRs to branch-3.1 for new features?

So when releasing Pulsar 3.1.0, we don't need to determine which PRs
to cherry-pick?

[1] https://github.com/apache/pulsar/pull/16014
[2] https://github.com/apache/pulsar/pull/16014#discussion_r896163915
[3] https://github.com/streamnative/kop/pull/1848
[4] https://github.com/streamnative/kop/pull/1870

Thanks,
Yunze

On Fri, Jun 9, 2023 at 3:14 AM Michael Marshall <mmarsh...@apache.org> wrote:
>
> Thanks for this discussion, Lari. I agree with your points and will
> just make a few additions.
>
> @Lari - I wonder if there is a way we can show the strategy more
> definitively. Have you looked into what it would take to merge
> branch-3.0 to master? If that is too hard, we could pilot merging
> branch-3.1 into master once it is cut.
>
> > don't think we should propagate commits from the least recent stable
> > version; especially it's hard to tell
> > which one is the least recently maintained stable version (2.10? 2.9? or
> > even 2.8?).
>
> If we switch to a merge strategy, I think we would need a dual mode
> for some time where we only merge branches that are easy to merge and
> we cherry pick backwards as needed.
>
> Another detail that would make the merge strategy valuable is that it
> would make it much easier for us to cut frequent releases. It
> decreases the work on the release manager by removing the need to make
> a bunch of cherry picks. The work is pushed to committers and will
> have the benefit of some batching of commits.
>
> As it stands, releases take a very long time, which takes up valuable
> committer time and makes it harder to stick to release schedules.
>
> One interesting detail is that Pulsar's commit velocity has decreased.
> That might make it easier to test out a new strategy now.
>
> Thanks,
> Michael
>
>
> On Thu, Jun 8, 2023 at 5:34 AM tison <wander4...@gmail.com> wrote:
> >
> > Hi Lari,
> >
> > Thanks for starting this discussion! I did some cherry-picking recently and
> > encountered a few cases
> > that can be noted in your email. The most surprising process to me is that
> > we pick commit directly
> > without any regression tests (some is associated with PRs, but it's not the
> > normal case).
> >
> > Let me reply to this thread inline.
> >
> > > merge-based strategy
> >
> > I don't think we should propagate commits from the least recent stable
> > version; especially it's hard to tell
> > which one is the least recently maintained stable version (2.10? 2.9? or
> > even 2.8?).
> >
> > Flink maintains cherry-picking well by running regression tests before any
> > commits, and so do the cherry-picked
> > ones. I don't understand why cherry-picking commits without CI coverage is
> > possible since different branches can be
> > regarded as quite different codebases. Besides, Flink uses JIRA to manage
> > tickets, and it's clear to associate a ticket
> > with several fixed versions. Also, any PRs should be associated with a
> > ticket (Pulsar has >50% standalone PRs).
> >
> > However, given that 3.0 is an LTS version is possible to follow Netty's
> > strategy. And OpenJDK even creates different
> > repos for their LTS versions.
> >
> > > a test run encompassing a
> > > sensible quantity of cherry-picked commits
> >
> > I'd call this a manual roll up, where Rust use bors to do auto PR batch[1].
> > Given that we enable "Rebase and merge"
> > button on the main repo, it's possible that keep the atomic commit history
> > while rolling up PRs.
> >
> > [1] https://forge.rust-lang.org/release/rollups.html
> >
> > > what decision-making criteria does the RM use, and
> > > how do they manage quality assurance?
> >
> > I agree this is vague now. At least we can force a PR for cherry-picking
> > and ensure CI workflows run. Thus, we're
> > sure at least regression tests passed, and open the window for comments
> > (comment on commits lacks a lot of
> > visibility while it's possible). Furthermore, we don't require approval or
> > status checks for PR against maintained
> > branches.
> >
> > Although, I agree that it's majorly a transparency issue over a process
> > issue: Flink doesn't set these rules also;
> > a committer can also push a commit or merge a PR without any requirement.
> > But Flink committers agree that
> > PRs are required for any changes, and they generally actively ask for
> > reviews.
> >
> > Back to the criteria, at least only (security) fixes should be picked. This
> > is not easy because, from my observation:
> >
> > 1. Fixes can depend on upgrade dependency, while a valid dep version with
> > fixes can introduce unexpected changes.
> > 2. Some features are unstable but we do not mark them as beta so we'd like
> > to "fix" them in previous versions even it
> > means a significant rewrite.
> >
> > Java uses "--enable-preview" and "--add-modules jdk.incubator.xxx" to fence
> > unstable features and no fixes will be
> > backported.
> >
> > Best,
> > tison.

Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Reply via email to