Dear Pulsar Community,

As we prepare for new releases in our maintenance branches, we have once
again encountered issues with our cherry-picking process. Some of our
maintenance branches are currently broken or were recently broken,
containing compilation errors or failing tests. Many have encountered
these issues, as we have seen new PRs come in to address the
problems. The compilation problems are already being addressed by
Heesung (release manager for 3.0.3) and myself. We aim to resolve these
issues as soon as possible. Please join #dev channel on Apache Pulsar
Slack to collaborate in real time to help with this and get updates.

The cherry-picking process has always been problematic and lacks clear
documentation in Apache Pulsar. This often leads to our maintenance
branches breaking, especially as we approach release dates and begin
cherry-picking fixes. This recurring issue has been the subject of
multiple discussions over the years. The "feature freeze" in the release
process does not mitigate the key problem with the cherry-picking
approach.

Furthermore, the cherry-picking process is mostly based on tribal
knowledge and lacks clear documentation. I have previously expressed my
concerns about this on the mailing list in this thread:
https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br

Many problems with cherry-picking arise because cherry-picks occur in
the wrong order, or dependent changes are not picked. Some dependent
changes shouldn't be picked since when we have made bug fixes in the
master branch, it can already contain changes for new features that
shouldn't be applied to maintenance branches. In those cases
a backport of the fix is needed and the original developer of the 
PR might not be available to do this and there could be a significant
delay for the release if delivering the backport takes time.

When cherry-picking and backporting is delegated to other developers, 
in addition to delays, it can lead to coordination problems and commits
being picked and applied in an order that results in even more merge 
conflicts. Thankfully, this isn't usually too painful, but it does
happen once in a while.

A few days ago, I began working on improving the documentation of the
current process. I have added a section where I share some thoughts and
a tool to prevent future problems. You can find the document here:
https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release.
However, this does not fully describe the current process and will only
help to some extent.

The added section should help prevent cherry-picking in the wrong order,
but it still has many gaps. Many developers do not have proper merge
conflict resolution tools configured. Without proper 3-way diff
visualization and merge tools, it's very difficult to resolve many of
the merge conflicts without making mistakes. This also requires a deep
understanding of the module where the conflicts occur.

After we have made the next set of maintenance releases, I plan to
propose an alternative to the cherry-picking process that will address
the main issues that the Apache Pulsar project has been struggling with
every time we do releases.

The alternative would be to designate the LTS branch as the default
branch, make bug fixes primarily in the LTS branch, merge fixes to newer
branches, and cherry-pick to possible older branches. This common
approach in many projects leverages what Git does well: handling
development across multiple branches. This solution ensures that our LTS
branch is always immediately in a releasable state and the branch will
also become the most stable version of Pulsar since bug fixes are
continuously evaluated and integrated into the LTS branch with our CI
where bug fix PRs are targeted to the LTS branch.
Stability was the original goal of PIP-175 where the LTS concept was
introduced to Pulsar.

I hope that our community would be open to making changes to the
maintenance strategy to help resolve the pain that we have to deal with
each time we make releases. Sometimes, this "cherry-picking vs. merging
branches" discussion becomes a "tabs vs. spaces" type of pointless
discussion where personal preferences are emphasized. I hope that we can
avoid that and admit the fact that releasing Apache Pulsar LTS with this
cherry-picking process is a pain and we must fix it to make progress as
a development community.

-Lari

Reply via email to