I propose to cut the first 2.7.2RC on April 12th.

Is this a good trade off?

Enrico



Il Gio 1 Apr 2021, 22:56 Michael Marshall <mikemars...@gmail.com> ha
scritto:

> After discussing this issue in the Pulsar community meeting today, I
> realized that I might not have described the issue well enough. The
> fundamental problem is that brokers create 4 metrics for every cursor and
> there is no way to disable them. The change to make these metrics optional
> has already been merged into the branch-2.7.
>
> I decided to build a custom broker image that included the fix, so I am no
> longer blocked by this issue. However, I do think other users will be
> impacted by it if they have many topic subscriptions.
>
> Thanks,
> Michael
>
>
>
> On Thu, Apr 1, 2021 at 12:09 AM Michael Marshall <mikemars...@gmail.com>
> wrote:
>
> > Thank you for your offer, Enrico.
> >
> > > If you are now blocked, then other users will be blocked as well.
> >
> > I looked into this more today, and I believe that I am blocked on using
> > 2.7.1. I configured prometheus's server side filtering for the high
> > cardinality metrics, but my prometheus instance is still getting
> OOMKilled
> > due to the collective size of the metrics payload returned by my brokers.
> > In my use case, I encountered problems with around 40k topics each with a
> > single subscription. For reference, I ran the same load against
> > 2.7.0 brokers and had no issues with my prometheus instance.
> >
> > Sijie,
> >
> > Thanks for your reply.
> >
> > > The bugfix releases are usually made monthly based on demand. We can
> > > probably wait 1~2 weeks to see if there are any other fixes to include
> > > before cutting a 2.7.2 release. Does that make sense?
> >
> > Are there known bug fixes that you are looking to get merged in the next
> 1
> > or 2 weeks?
> >
> > I agree with the general timeline of doing bug fix releases monthly based
> > on demand. I also think there should be room for extraordinary
> > circumstances where we should release early to fix an issue that impacts
> > many users. Given Pulsar's advertised ability to handle up to a million
> > topics, I think this is such a situation. Let me know what you think.
> >
> > Thanks,
> > Michael
> >
> > On Wed, Mar 31, 2021 at 6:31 PM Sijie Guo <guosi...@gmail.com> wrote:
> >
> >> Michael,
> >>
> >> The bugfix releases are usually made monthly based on demand. We can
> >> probably wait 1~2 weeks to see if there are any other fixes to include
> >> before cutting a 2.7.2 release. Does that make sense?
> >>
> >> Thanks,
> >> Sijie
> >>
> >> On Tue, Mar 30, 2021 at 9:55 PM Michael Marshall <mikemars...@gmail.com
> >
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I propose and request that we release version 2.7.2 to fix a
> regression
> >> > introduced in 2.7.1.
> >> >
> >> > Pulsar 2.7.1 introduced cursor level metrics without including the
> >> ability
> >> > to disable them (https://github.com/apache/pulsar/pull/9618). I
> >> recently
> >> > discovered the metrics when I created a Pulsar 2.7.1 cluster, created
> >> > thousands of topics and subscriptions, and then started to have
> problems
> >> > with my prometheus instance because of an influx of metrics. The fix
> to
> >> > make these metrics optional and disabled by default has already been
> >> merged
> >> > to the "branch-2.7" branch (
> https://github.com/apache/pulsar/pull/9814
> >> ).
> >> >
> >> > Given the cardinality of the metrics produced for every cursor and the
> >> fact
> >> > that Pulsar is supposed to handle many topics and subscriptions with
> >> ease,
> >> > I consider the creation of too many metrics a regression, and I think
> >> it is
> >> > important to release a new, latest version.
> >> >
> >> > Further, 2.7.1 included several important bug fixes (e.g. one to fix
> >> tiered
> >> > storage to AWS S3), so I would prefer to move forward instead of back
> to
> >> > 2.7.0.
> >> >
> >> > What do others think about cutting a 2.7.2 release now? Do others
> agree
> >> > that creating metrics for every cursor should be considered a
> >> regression?
> >> > If not, does the community have a helpful guide to determine what
> >> should be
> >> > considered a regression?
> >> >
> >> > Before writing this email, I consulted PIP 47, Pulsar's time based
> >> release
> >> > plan. (
> >> >
> https://github.com/apache/pulsar/wiki/PIP-47%3A-Time-Based-Release-Plan
> >> ).
> >> > The PIP mentions that there will be bug fix releases for the last 4
> >> > releases, but it doesn't mention a cadence.
> >> >
> >> > Tangentially, I am wondering why the 2.7.1 release wasn't held up to
> >> > include this configuration fix. PR 9814 was submitted before the 2.7.1
> >> tag
> >> > was created and was merged just 2 days after the tag's creation. What
> >> are
> >> > the criteria for holding up a release?
> >> >
> >> > Thanks for considering my request, and thanks for any feedback you can
> >> > provide.
> >> >
> >> > Best,
> >> > Michael Marshall
> >> >
> >>
> >
>

Reply via email to