https://github.com/apache/pulsar/issues/15966

--------

## Motivation

In PIP-47 
(https://github.com/apache/pulsar/wiki/PIP-47:-Time-Based-Release-Plan),
we have adopted a time-based release plan. This was the first attempt
at establishing a new principle on how releases should b

The main two benefits of this approach have been:

 1. Clarity for users and developers on when to expect a release
 2. Breaking a hard relationship between feature and release: a
particular feature will be included in the release if it is completed
in time. Otherwise, it will be bubbled up to the next release.

The motivation for the current proposal is to extend the existing
process to address the issues that we have seen and that were left out
of the scope of PIP-47.

## Summary of existing issues in the process

### Short maintenance cycles for releases

Since we're doing a 3 months release cycle, we are ending with 4
releases done per year, even though it's more close to 3 releases.

There is a high cost to maintain a lot of old releases, backport bug
fixes, and security patches. In general, we actively support the last
3 minor releases while continuing to develop the next release. E.g.,
2.8, 2.9, and 2.10, while 2.11 is under development.

The result is that a user adopting a particular release is forced to
upgrade in a < 1-year timeframe to keep up to date and use a supported
release. This timeframe is too short for many users as it imposes a
lot of forced upgrades, for which they are not prepared in terms of
available time and required effort.

### Live Upgrade/Downgrade compatibility path

In Pulsar, we guarantee that users have a way to do live upgrades and
downgrades with zero downtime.

This is very powerful because it gives them the freedom to upgrade to
a new release with the assurance of being able to roll back to the
previous release in case any functional or performance regressions are
encountered.

Today, this compatibility is guaranteed across minor versions. Eg: I
can do  `2.7 -> 2.8 -> 2.7` as a live upgrade.

What is not guaranteed is to "skip" releases. E.g.: `2.7 -> 2.9` might
work or not, but it's not guaranteed. In that case an intermediated
upgrade would be required: `2.7 -> 2.8 -> 2.9`.

The reasons for which the "skip" upgrade might not work are multiple:
  1. Incompatible upgrade of some dependency (e.g., ZooKeeper) that
might not be compatible with an older version.
  2. Adoption of a new metadata format or data format on disk.
     Every time we introduce a new incompatible format change (outside
of a regular Protobuf field addition), we do it in a 2 steps way:
      - In a new release, we introduce the new feature/format,
disabled by default. The new release can read both old and new
formats, though it keeps writing the old format by default.
      - In a subsequent release, we change the default to the new format

Note that this consideration is separate from the compatibility
between clients and brokers, where we ***never*** break compatibility.
The oldest available Pulsar client can still talk with the newest
Pulsar broker, and vice versa, a new client, will be perfectly fine
with an older broker (except the new features won't be working).

### Releases getting delayed

Another problem we have been experiencing is that release cycles have
been stretching considerably. Part of this has been because we have
been reaching the end of the release window, preparing a candidate,
and then taking a long time to flush out all issues found at the last
minute in the new release.

We need to ensure that we have a date set in stone to deliver the
release to users.

## Proposal

The proposal to address the above issues is composed of 2 parts.

### 1. Establish Long Term Support releases

We need to provide a way for users to quickly understand the expected
lifecycle timeline of a given release and for that timeline to be long
enough not to be a constant update mandate.

At the same time, we need to ensure that we maintainers are not
spending all the time just maintaining a huge list of old releases.

For that, we can use the established concept of "Long Term Releases" or LTS.

We will perform LTS releases at a fixed cadence every 18 months, and
we will keep doing regular feature releases every 3 months as we're
currently doing.

The LTS releases will be identified by being a `.0` version. For example:
 * `3.0` -> LTS
 * `3.1` -> regular release
 * `3.2` -> regular release
 * `4.0` -> LTS

The major version bump will not carry any special meaning in terms of
"big features" included in the release or breaking API changes.
Instead, it would simply signal the type of the release.

#### Compatibility between releases

It will be guaranteed to be able to do a live upgrade/downgrade
between one LTS and the next one.

For example:

 * `3.0 -> 4.0 -> 3.0` : OK
 * `3.2 -> 4.0 -> 3.2` : OK
 * `3.2 -> 4.4 -> 3.2` : OK
 * `3.2 -> 5.0` : Not OK

#### Release support expectation

We will publish clear guidelines on the Pulsar website regarding the
expected timeline for which each release is supported and when the new
feature and LTS releases will be available.

The support model will be:

 * LTS
   * Released every 18 months
   * Support for 24 months
   * Security patches for 36 months
 * Feature releases
   * Released every 3 months
   * Support for 6 months
   * Security patches for 6 months

This can be translated into:
   * We support the last 2 LTS releases and the last 2 feature releases
   * Security patches are provided for the past 3 LTS releases and 2
feature releases

Users are therefore encouraged to stay in an LTS release until they
are ready to jump into the next LTS unless they want to have access to
some of the features included in the latest feature releases.

### 2. Introduce a code-freeze period in the release cycle

To address the problem with delayed release cycles, we are introducing
a code freeze period that will give us time to stabilize the release
code while not blocking new changes from being merged into master for
the subsequent version.

This code-freeze will only be adopted for LTS/feature releases, not
for any patch release.

In a 3 months release cycle, the last 3 weeks will be marked as a code
freeze period. The release manager will branch off from master, and he
will be responsible for selecting the changes that will be
cherry-picked in the release branch.

>From the code-freeze point, to minimize the risk of delaying the
release, only bug fixes involving a regression of behavior compared to
a previous release should be allowed. Occasional exceptions will be
possible after higher scrutiny of the change.

At the moment of the code freeze, the release manager will also
prepare a release candidate in the same way we are doing today.
Committers, contributors, and users will test this RC to detect issues
as early as possible.

A formal vote by the PMC will not be required at this stage (though
any disagreement should be sent out ASAP).

After 1 week, if there are any changes, the release manager will
provide a new RC release that the community will test again.

After 1 more week, if there are any changes, a third RC will be
prepared, and this will be submitted to vote to the PMC. Otherwise,
the vote will be held on an earlier RC release if no issues are found.

The last 1 week will be used for the voting process and for updating
Pulsar website and the blog post announcing the release, which should
(hopefully) happen on the scheduled day.




--
Matteo Merli
<matteo.me...@gmail.com>

Reply via email to