I propose to cut the first 2.7.2RC on April 12th. Is this a good trade off?
Enrico Il Gio 1 Apr 2021, 22:56 Michael Marshall <mikemars...@gmail.com> ha scritto: > After discussing this issue in the Pulsar community meeting today, I > realized that I might not have described the issue well enough. The > fundamental problem is that brokers create 4 metrics for every cursor and > there is no way to disable them. The change to make these metrics optional > has already been merged into the branch-2.7. > > I decided to build a custom broker image that included the fix, so I am no > longer blocked by this issue. However, I do think other users will be > impacted by it if they have many topic subscriptions. > > Thanks, > Michael > > > > On Thu, Apr 1, 2021 at 12:09 AM Michael Marshall <mikemars...@gmail.com> > wrote: > > > Thank you for your offer, Enrico. > > > > > If you are now blocked, then other users will be blocked as well. > > > > I looked into this more today, and I believe that I am blocked on using > > 2.7.1. I configured prometheus's server side filtering for the high > > cardinality metrics, but my prometheus instance is still getting > OOMKilled > > due to the collective size of the metrics payload returned by my brokers. > > In my use case, I encountered problems with around 40k topics each with a > > single subscription. For reference, I ran the same load against > > 2.7.0 brokers and had no issues with my prometheus instance. > > > > Sijie, > > > > Thanks for your reply. > > > > > The bugfix releases are usually made monthly based on demand. We can > > > probably wait 1~2 weeks to see if there are any other fixes to include > > > before cutting a 2.7.2 release. Does that make sense? > > > > Are there known bug fixes that you are looking to get merged in the next > 1 > > or 2 weeks? > > > > I agree with the general timeline of doing bug fix releases monthly based > > on demand. I also think there should be room for extraordinary > > circumstances where we should release early to fix an issue that impacts > > many users. Given Pulsar's advertised ability to handle up to a million > > topics, I think this is such a situation. Let me know what you think. > > > > Thanks, > > Michael > > > > On Wed, Mar 31, 2021 at 6:31 PM Sijie Guo <guosi...@gmail.com> wrote: > > > >> Michael, > >> > >> The bugfix releases are usually made monthly based on demand. We can > >> probably wait 1~2 weeks to see if there are any other fixes to include > >> before cutting a 2.7.2 release. Does that make sense? > >> > >> Thanks, > >> Sijie > >> > >> On Tue, Mar 30, 2021 at 9:55 PM Michael Marshall <mikemars...@gmail.com > > > >> wrote: > >> > >> > Hi All, > >> > > >> > I propose and request that we release version 2.7.2 to fix a > regression > >> > introduced in 2.7.1. > >> > > >> > Pulsar 2.7.1 introduced cursor level metrics without including the > >> ability > >> > to disable them (https://github.com/apache/pulsar/pull/9618). I > >> recently > >> > discovered the metrics when I created a Pulsar 2.7.1 cluster, created > >> > thousands of topics and subscriptions, and then started to have > problems > >> > with my prometheus instance because of an influx of metrics. The fix > to > >> > make these metrics optional and disabled by default has already been > >> merged > >> > to the "branch-2.7" branch ( > https://github.com/apache/pulsar/pull/9814 > >> ). > >> > > >> > Given the cardinality of the metrics produced for every cursor and the > >> fact > >> > that Pulsar is supposed to handle many topics and subscriptions with > >> ease, > >> > I consider the creation of too many metrics a regression, and I think > >> it is > >> > important to release a new, latest version. > >> > > >> > Further, 2.7.1 included several important bug fixes (e.g. one to fix > >> tiered > >> > storage to AWS S3), so I would prefer to move forward instead of back > to > >> > 2.7.0. > >> > > >> > What do others think about cutting a 2.7.2 release now? Do others > agree > >> > that creating metrics for every cursor should be considered a > >> regression? > >> > If not, does the community have a helpful guide to determine what > >> should be > >> > considered a regression? > >> > > >> > Before writing this email, I consulted PIP 47, Pulsar's time based > >> release > >> > plan. ( > >> > > https://github.com/apache/pulsar/wiki/PIP-47%3A-Time-Based-Release-Plan > >> ). > >> > The PIP mentions that there will be bug fix releases for the last 4 > >> > releases, but it doesn't mention a cadence. > >> > > >> > Tangentially, I am wondering why the 2.7.1 release wasn't held up to > >> > include this configuration fix. PR 9814 was submitted before the 2.7.1 > >> tag > >> > was created and was merged just 2 days after the tag's creation. What > >> are > >> > the criteria for holding up a release? > >> > > >> > Thanks for considering my request, and thanks for any feedback you can > >> > provide. > >> > > >> > Best, > >> > Michael Marshall > >> > > >> > > >