Hi Lari, Generally I agree with your points. The cherry-picking criteria is very ambiguous and not clear. I also found some unexpected cherry-picks recently. For example, #16014 [1] actually breaks the compatibility of determining whether to use a TLS service URL for a built-in Pulsar client. Though it might be acceptable for a major release IMO, it's unacceptable to be cherry-picked into release branches without applying the comment here [2]. You can see KoP is also affected by this commit and we need to handle the breaking change by these two PRs ([3] and [4]).
The root cause is that there is no clear cherry-picking criteria so that there is no strict review for adding the `release/x.y.z` labels. The `release/x.y.z` labels are added very casually. Based on the example above, could you explain more about the merge strategy and how could it avoid such cases? I found this thread talks much about the PROs and CONs but not much about the strategy itself. >From my understanding, assuming the LTS version is 3.0 and the next feature release version is 3.1, we need to: 1. Open PRs to branch-3.0 for bug fixes and merge it to branch-3.1? 2. Open PRs to branch-3.1 for new features? So when releasing Pulsar 3.1.0, we don't need to determine which PRs to cherry-pick? [1] https://github.com/apache/pulsar/pull/16014 [2] https://github.com/apache/pulsar/pull/16014#discussion_r896163915 [3] https://github.com/streamnative/kop/pull/1848 [4] https://github.com/streamnative/kop/pull/1870 Thanks, Yunze On Fri, Jun 9, 2023 at 3:14 AM Michael Marshall <mmarsh...@apache.org> wrote: > > Thanks for this discussion, Lari. I agree with your points and will > just make a few additions. > > @Lari - I wonder if there is a way we can show the strategy more > definitively. Have you looked into what it would take to merge > branch-3.0 to master? If that is too hard, we could pilot merging > branch-3.1 into master once it is cut. > > > don't think we should propagate commits from the least recent stable > > version; especially it's hard to tell > > which one is the least recently maintained stable version (2.10? 2.9? or > > even 2.8?). > > If we switch to a merge strategy, I think we would need a dual mode > for some time where we only merge branches that are easy to merge and > we cherry pick backwards as needed. > > Another detail that would make the merge strategy valuable is that it > would make it much easier for us to cut frequent releases. It > decreases the work on the release manager by removing the need to make > a bunch of cherry picks. The work is pushed to committers and will > have the benefit of some batching of commits. > > As it stands, releases take a very long time, which takes up valuable > committer time and makes it harder to stick to release schedules. > > One interesting detail is that Pulsar's commit velocity has decreased. > That might make it easier to test out a new strategy now. > > Thanks, > Michael > > > On Thu, Jun 8, 2023 at 5:34 AM tison <wander4...@gmail.com> wrote: > > > > Hi Lari, > > > > Thanks for starting this discussion! I did some cherry-picking recently and > > encountered a few cases > > that can be noted in your email. The most surprising process to me is that > > we pick commit directly > > without any regression tests (some is associated with PRs, but it's not the > > normal case). > > > > Let me reply to this thread inline. > > > > > merge-based strategy > > > > I don't think we should propagate commits from the least recent stable > > version; especially it's hard to tell > > which one is the least recently maintained stable version (2.10? 2.9? or > > even 2.8?). > > > > Flink maintains cherry-picking well by running regression tests before any > > commits, and so do the cherry-picked > > ones. I don't understand why cherry-picking commits without CI coverage is > > possible since different branches can be > > regarded as quite different codebases. Besides, Flink uses JIRA to manage > > tickets, and it's clear to associate a ticket > > with several fixed versions. Also, any PRs should be associated with a > > ticket (Pulsar has >50% standalone PRs). > > > > However, given that 3.0 is an LTS version is possible to follow Netty's > > strategy. And OpenJDK even creates different > > repos for their LTS versions. > > > > > a test run encompassing a > > > sensible quantity of cherry-picked commits > > > > I'd call this a manual roll up, where Rust use bors to do auto PR batch[1]. > > Given that we enable "Rebase and merge" > > button on the main repo, it's possible that keep the atomic commit history > > while rolling up PRs. > > > > [1] https://forge.rust-lang.org/release/rollups.html > > > > > what decision-making criteria does the RM use, and > > > how do they manage quality assurance? > > > > I agree this is vague now. At least we can force a PR for cherry-picking > > and ensure CI workflows run. Thus, we're > > sure at least regression tests passed, and open the window for comments > > (comment on commits lacks a lot of > > visibility while it's possible). Furthermore, we don't require approval or > > status checks for PR against maintained > > branches. > > > > Although, I agree that it's majorly a transparency issue over a process > > issue: Flink doesn't set these rules also; > > a committer can also push a commit or merge a PR without any requirement. > > But Flink committers agree that > > PRs are required for any changes, and they generally actively ask for > > reviews. > > > > Back to the criteria, at least only (security) fixes should be picked. This > > is not easy because, from my observation: > > > > 1. Fixes can depend on upgrade dependency, while a valid dep version with > > fixes can introduce unexpected changes. > > 2. Some features are unstable but we do not mark them as beta so we'd like > > to "fix" them in previous versions even it > > means a significant rewrite. > > > > Java uses "--enable-preview" and "--add-modules jdk.incubator.xxx" to fence > > unstable features and no fixes will be > > backported. > > > > Best, > > tison.