Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Lari Hotari Fri, 02 Jun 2023 02:26:33 -0700

Shortly after sending my previous email, I recalled that our
cherry-picking process has been a topic of prior discussion. I'd like to
draw your attention to a relevant mailing list thread initiated by
Michael Marshall back in December 2021 titled, 
"[DISCUSS] Add definition to our cherry-picking process":
https://lists.apache.org/thread/zqdqz4jd641vszkj3mzdn6zc3yt56rsk


This thread contains numerous insightful comments.

Furthermore, I discovered Yong Zhang's "[DISCUSS] Introduce a
cherry-pick command for cherry-picking PRs automatically" from October
2020: https://lists.apache.org/thread/49dj5j9yjjqpzjssoc3mqh0yxss5z041

And "[DISCUSS] Propose More Formal Policy for Security Patches and EOL
of Versions" by Michael Marshall in May 2021:
https://lists.apache.org/thread/2bgznyt9fxnosymprot4wyfd01mv0m58

In addition to the "[DISCUSS] Updating our Pulsar Release Plan"
discussion initiated by Michael Marshall in March 2022:
https://lists.apache.org/thread/wkm1slrg341kbq7m83nms97df28kl4of

As well as Yunze Xu's "[DISCUSS] Improvements on the release process" in
September 2022:
https://lists.apache.org/thread/zmwf5mozjqq164fk2r4m2jzv6s1kxyxk

And finally, "[Discuss] Add a phase to process pending PRs before code
freeze" also by Yunze Xu in April 2023:
https://lists.apache.org/thread/p8vgfsg2wfzsnmnwmcnj9xtz54nq45xb

For the sake of completeness, I would like to mention that our current
release policy and process are documented in the links I shared in my
previous email (https://pulsar.apache.org/contribute/release-policy/ and
https://pulsar.apache.org/contribute/release-process/).

A review of our archives and PIPs reveals that Michael, Penghui, Matteo,
Yunze Xu, and Tison (Zili Chen) have been instrumental in enhancing our
process and its documentation, effectively bringing it to its current
documented state. I want to express my gratitude to all of you for your
diligent contributions. I also extend my appreciation to everyone else
who has contributed to this area in the past.

I would like to extend a special thank you to Michael Marshall, who has
been unwavering in his pursuit of refining the release process for
Apache Pulsar. His transparent approach to fostering enhancements within
our community is highly commendable. His impact is well-documented in
our mailing list archives and the Pulsar community meeting minutes [1].

In Apache projects, mailing lists serve as a critical avenue for
community engagement. The phrase "If it's not on the mailing list, it
didn't happen" underscores the importance placed on the transparency,
inclusivity, and openness within the community. These discussions are
integral for tracking project decisions and changes over time. The
accessibility of the mailing lists ensures that every community member
has an opportunity to contribute to and learn from the collective
knowledge.

Let's maintain this momentum and continue enhancing our process! Your
contributions are welcome!

-Lari

[1] - Link to meeting minutes available at
https://github.com/apache/pulsar/wiki/Community-Meeting

On 2023/06/02 08:24:45 Lari Hotari wrote:
> Dear Apache Pulsar Committers,
> 
> I wish to address a few pressing concerns that emerged while I was
> working on cherry-picking PR #20461 [1]. This PR was aimed at upgrading
> Jetty from 9.4.48.v20220622 to 9.4.51.v20230217 to address the CVEs
> (CVE-2023-26048 and CVE-2023-26049). I discovered that Jetty had already
> been upgraded in the maintenance branches through four separate PRs
> (#20162, #20226, #20227, and #20228), all titled "[improve][build]
> Upgrade dependencies to reduce CVE" [2].
> 
> 1. The newly adopted process of combining multiple dependency updates
>    into a single PR, while omitting changes to the master branch, has
>    not been discussed on the mailing list.
> 2. Our current process, which is based on cherry-picking, should
>    maintain traceability across maintenance branches to discern whether
>    a change made to the master branch is available in the maintenance
>    branches. This breaks with the approach that was used.
> 3. It is advised that each dependency (or group of related dependencies)
>    should be upgraded in its own PR, rather than upgrading multiple
>    unrelated dependencies in a single PR.
> 4. We should aim for all changes to be first made to the master branch
>    and then cherry-picked to other branches to prevent the maintenance
>    branches from diverging from the master branch.
> 5. The compilation of release notes becomes challenging when PRs aren't
>    atomic.
> 6. Similarly, detecting regressions can be problematic when PRs aren't
>    atomic.
> 
> Now, I want to clarify that I'm not entirely supportive of the
> cherry-picking process as it currently stands. I personally believe that
> a merge-based strategy could be more effective. This strategy would
> entail initially making changes to the oldest maintenance branch where a
> feature (or a dependency, as in this instance) exists. Subsequently, we
> would propagate all changes in a maintenance branch forward towards the
> master branch using git merges, effectively managing and resolving any
> merge conflicts that might arise along the way. Features wouldn't be
> added to maintenance branches. This strategy is employed in several open
> source projects, such as Grails [3] and Micronaut [4].
> 
> Indeed, there might be exceptions, and for such instances,
> cherry-picking would still be a tool within our strategy. The principal
> advantage of this proposed approach is that it allocates adequate focus
> on the maintenance branch, thereby curbing the instability typically
> experienced with our intermediate maintenance versions.
> 
> Additionally, the merge-based approach addresses the issue with CI
> pipelines. If the PR is made to the maintenance branch, it ensures the
> changes integrate well and all tests pass in the maintenance version,
> enhancing stability. I understand the counter-argument that this could
> confuse our contributors if they have to make the PR against the
> maintenance branch. However, this could be mitigated by guidance from
> the PR reviewer and adding further information in the contribution guide
> and PR template. There are also more radical solutions, such as making
> the main maintenance branch the default branch, like the "4.1" branch in
> Netty.
> 
> The merge strategy also helps ensuring that the LTS maintenance branch
> is always in a releasable state. Currently, it takes a significant
> amount of time to "stabilize" the branch before releasing. This is a
> counterproductive pattern and a waste of time that we must address and
> improve.
> 
> There seem to be inherent obstacles in our existing process, evidenced
> by the recent adoption of bundled PR types that circumvent our
> cherry-picking process. Ordinarily, we insist on creating atomic PRs to
> the master branch prior to initiating cherry-picking and backporting. I
> would be keen to hear about the issues others have encountered with the
> cherry-picking process. Identifying these pain points is the first step
> towards refining and optimizing our process.
> 
> With Pulsar's recent transition to a new Long-Term Support (LTS) release
> strategy, the stability of the LTS release has emerged as a vital
> concern. Our current cherry-picking process, which has sometimes led to
> insufficient integration testing within the maintenance branches, has
> been proven ineffective at maintaining the requisite stability. If we do
> not revisit our maintenance processes, the new LTS release strategy could
> encounter the same instability issues. Thus, in order to fully reap the
> benefits of the LTS release strategy, we must prioritize the improvement
> of our maintenance processes.
> 
> In the existing procedure, the task of cherry-picking individual commits
> can become quite tedious, especially when it necessitates crafting a new
> PR for each cherry-picked commit. One possible solution to this
> inefficiency may be to enhance the coordination of cherry-picking. Under
> such a system, the committer could instigate a test run encompassing a
> sensible quantity of cherry-picked commits, thereby circumventing the
> need for separate PRs for each cherry-picked item. Furthermore, the
> implementation of a nightly build for all maintenance branches, set to
> execute if any changes have transpired since the last run, could be
> advantageous. By employing this approach, we can consistently maintain
> our branches in an optimal and release-ready state.
> 
> A significant deficiency in our current cherry-picking process is its
> status as tribal knowledge, without a clearly documented description in
> place. While we do possess a release process guide [5], it does not
> adequately elaborate on the procedure. Similarly, our release policy [6]
> does not delve into the specifics of this process either. This lack of
> comprehensive documentation leaves a significant knowledge gap in our
> workflow.
> 
> Our existing documentation [6] on the cherry-picking process states,
> "Generally, one committer shall volunteer as the release manager (RM) for
> a specific release. For feature releases and LTS releases, the last 3
> weeks of the release cycle will be marked as a code-freeze period. The
> RM will branch off from master, and the RM is also responsible for
> selecting the changes that will be cherry-picked in the release branch."
> 
> Unfortunately, this description falls short of the actual process. As it
> stands, we frequently cherry-pick commits as soon as the master branch
> PR has been merged. The description mentions Release Manager (RM)
> responsible for selecting the changes which isn't even the usual case.
> This practice is opaque and problematic. This situation prompts several
> crucial questions — what decision-making criteria does the RM use, and
> how do they manage quality assurance? It's currently the case that we
> need a substantial amount of time to prepare a maintenance branch for
> release, which clearly underscores that our current process requires
> significant enhancement.
> 
> Moreover, while the recent implementation of the Long-Term Support (LTS)
> strategy is a significant step, it doesn't appear to have brought about
> a radical shift in our approach. Aside from committing to maintain a
> specific version for a longer duration, our operational methodology
> hasn't undergone substantial enhancements. To truly honor our commitment
> to long-term support, it's incumbent upon us to reform our processes,
> making them more efficient, reliable, and effective. Merely increasing
> the responsibilities of the Release Manager isn't the solution.
> 
> An enterprise IT professional might suggest the introduction of a Change
> Advisory Board (CAB). However, such a measure doesn't necessarily
> address the core issue at hand. As the book "Accelerate: The Science of
> Lean Software and DevOps" [7] describes, approval by an external body
> (such as a manager or CAB), contrary to common belief, often do not
> result in higher levels of stability and can actually slow down the
> development process. We need to seek strategies that not only preserve
> stability but also promote agility and efficiency in our workflows.
> 
> Thank you for your attention, and I look forward to hearing your
> thoughts on these matters. Meanwhile, I kindly request that we stick to
> our established cherry-picking process until a collective decision is
> made on a potential alternative. This implies discontinuing the current
> practice of bundling multiple changes in PRs to maintenance branches.
> 
> Moreover, I earnestly hope for widespread involvement in refining this
> process. Specifically, I look forward to significant participation from
> the Apache Pulsar committers and PMC members in this pivotal discussion.
> Your collective insights and contributions will be important in
> effecting the much-needed improvements. 
> 
> In addition to discussions, there will also be a need for substantial
> effort. We must document the process thoroughly and continuously improve
> it as we gather more feedback during its progress.
> 
> I'm looking forward to an active discussion and concrete contributions
> as PRs to our release policy & process documentation! Sharing the tribal
> knowledge is also welcome if you don't feel like contributing directly
> to documentation. ;)
> 
> -Lari
> 
> [1] - https://github.com/apache/pulsar/pull/20461
> [2] - 
> https://github.com/apache/pulsar/pulls?q=is%3Apr+%22Upgrade+dependencies+to+reduce+CVE%22+is%3Aclosed
> [3] - https://github.com/grails/grails-core
> [4] - https://github.com/micronaut-projects/micronaut-core
> [5] - https://pulsar.apache.org/contribute/release-process/
> [6] - https://pulsar.apache.org/contribute/release-policy/
> [7] - https://itrevolution.com/book/accelerate/
> 
> Appendix: 
> Quote from "Accelerate: The Science of Lean Software and DevOps" [7]
> related to change approval by an external body (such as a manager or
> Change Advisory Board):
> 
> "We investigated further the case of approval by an external body to see
> if this practice correlated with stability. We found that external
> approvals were negatively correlated with lead time, deployment
> frequency, and restore time, and had no correlation with change fail
> rate. In short, approval by an external body (such as a manager or CAB)
> simply doesn’t work to increase the stability of production systems,
> measured by the time to restore service and change fail rate. However,
> it certainly slows things down. It is, in fact, worse than having no
> change approval process at all.
> 
> Our recommendation based on these results is to use a lightweight change
> approval process based on peer review, such as pair programming or
> intrateam code review, combined with a deployment pipeline to detect and
> reject bad changes. This process can be used for all kinds of changes,
> including code, infrastructure, and database changes."
>

Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Reply via email to