Re: [DISCUSS] Apache Pulsar 2.10.0 release

Michael Marshall Wed, 16 Feb 2022 14:37:32 -0800

When we discussed the code freeze in the community meeting on 2/3, I
was under the impression that it was a new development to our existing
release process. I subsequently learned it was already defined in
PIP 47. Even if we haven't been following this part of PIP 47, what
is the value in waiting until 2.11 to follow our already defined process?
While I agree it is helpful to provide guidance on when a version will ship,
I think it is more important to give the community time to test a release,
even if that means we're a little late on our release schedule. So far,
we haven't even created a branch to begin testing.


Note also that Sijie suggested using a feature freeze early on in this thread.

The 2.9.0 release is relevant here. It had 4 release candidates over 4
weeks and the final result was broken. That indicates to me that tagging
an RC early does not guarantee an early release and that our current
process isn't optimal and likely needs adjustments. I do not think we
should wait to address these issues. I propose we start following
PIP 47's guidance on code freeze and release stabilization periods.

> I don't think that changes the picture here. There are *always* last
> minute issues being discovered, and there is a call to be made on a
> case by case. The feature freeze will reduce the likelihood of
> introducing *more* issues by getting it from the master branch, but
> won't change a comma from issues that were already there.

I thought you wanted to implement a code/feature freeze to allow for
more release stabilization. Can you clarify what you mean here?

Thanks,
Michael







On Wed, Feb 16, 2022 at 2:42 PM Matteo Merli <matteo.me...@gmail.com> wrote:
>
> Michael, as we chatted in last weekly meeting (though not yet
> formalized), since we have never really done a feature freeze on the
> branch during paste releases, we should start from the next release,
> to give a decent preview of what to expect to developers in terms of
> dates.
>
> > While some may feel "behind" in getting out the 2.10 release, our
> > priority must be to give the community time to verify the stability of
> > the release.
>
> I don't think that changes the picture here. There are *always* last
> minute issues being discovered, and there is a call to be made on a
> case by case. The feature freeze will reduce the likelihood of
> introducing *more* issues by getting it from the master branch, but
> won't change a comma from issues that were already there.
>
>
>
>
> --
> Matteo Merli
> <matteo.me...@gmail.com>
>
> On Wed, Feb 16, 2022 at 10:47 AM Michael Marshall <mmarsh...@apache.org> 
> wrote:
> >
> > > I will build the release and start the vote before next Monday(GMT+8)
> >
> > Penghui, is your current plan to create branch-2.10, create the
> > release artifacts, and start a vote on them all within a few days of
> > each other?
> >
> > > I'm doing my best to follow PIP 47, but when seeing a potential break
> > > change, I need to confirm it.
> > > After all the potential break changes have been confirmed and fixed, I 
> > > will
> > > start the vote thread.
> >
> > I think we should review our current release plan before we move
> > forward as proposed above. PIP 47 explicitly says that a month before
> > the release date, the release manager will cut branches [0]. We don't
> > yet have a `branch-2.10`. PIP 47 also defines a period of time for a
> > feature freeze and then a code freeze. We have not yet had either.
> >
> > I propose we create branch-2.10 now and simultaneously announce that
> > we are past the feature freeze period. Then, we can start the 2 week
> > period for bug fixes that precedes the code freeze, as PIP 47
> > prescribes. Then, in two weeks, we can produce the first release
> > candidate (also in PIP 47).
> >
> > While some may feel "behind" in getting out the 2.10 release, our
> > priority must be to give the community time to verify the stability of
> > the release.
> >
> > Thanks,
> > Michael
> >
> > [0] https://github.com/apache/pulsar/wiki/PIP-47%3A-Time-Based-Release-Plan
> >
> >
> >
> >
> > On Wed, Feb 16, 2022 at 9:09 AM PengHui Li <peng...@apache.org> wrote:
> > >
> > > Hi all
> > >
> > > Just put an update here.
> > >
> > > We have 2 PRs[1] https://github.com/apache/pulsar/pull/13376 and
> > > https://github.com/apache/pulsar/pull/13341
> > > need to do the final verification, and you are also very welcome to verify
> > > these 2 changes in your environment, cases.
> > >
> > > I will build the release and start the vote before next Monday(GMT+8)
> > >
> > > Regards
> > > Penghui
> > >
> > > On Wed, Feb 16, 2022 at 10:22 PM PengHui Li <peng...@apache.org> wrote:
> > >
> > > > Hi lari,
> > > >
> > > > > So finally, I understand that "the problem" is that all HTTP server
> > > > threads are blocked and this makes the Pulsar Admin API unavailable.
> > > >
> > > > To support the blocking servlet API, Jetty uses a default thread pool 
> > > > that
> > > > can grow to up to 200 threads (
> > > > https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57)
> > > > .
> > > > However this default of 200 maximum threads is not used in Pulsar.
> > > >
> > > >  Regarding the "make async" changes, It is an optimization to migrate 
> > > > from
> > > > the blocking servlet api to the asynchronous servlet api. This work 
> > > > isn't
> > > > urgent since we can simply mitigate the HTTP server threads getting 
> > > > blocked
> > > > by setting "numHttpServerThreads=200" in broker.conf. "the problem" 
> > > > will be
> > > > resolved immediately without risks of regressions that are involved in
> > > > making the sync -> async changes.
> > > >
> > > > Yes, this is the problem. But I am against using 200 threads as the max
> > > > web server thread by default,
> > > > it can't work for cases that the broker without that much memory, it 
> > > > will
> > > > lead to more serious problems
> > > > that the service quality of messaging API gets worse due to the JVM
> > > > GC, even memory overflow.
> > > >
> > > > Yes, it isn't urgent. So I said it's not a blocker for the 2.10 release,
> > > > and all the PRs are not cherry-picked to branch-2.x
> > > > This is an optimization for pulsar, the current implementation does not
> > > > use jetty async API well, we should fix it,
> > > > we should reduce the code with bad smells, and using async API is also
> > > > a more efficient way without opening such jetty threads.
> > > > Do you have any concerns about the way the modification becomes purely
> > > > async?
> > > >
> > > > > Penghui, would you mind adding a GitHub issue for the problem where 
> > > > > all
> > > > HTTP threads get blocked and the Pulsar Admin API stops responding?
> > > >
> > > > https://github.com/apache/pulsar/issues/4756 the attachment from the
> > > > issue is a good example
> > > >
> > > > Regards,
> > > > Penghui
> > > >
> > > >
> > > > On Wed, Feb 16, 2022 at 9:04 PM Lari Hotari <lhot...@apache.org> wrote:
> > > >
> > > >> I created PR https://github.com/apache/pulsar/pull/14320 to set
> > > >> numHttpServerThreads=200 .
> > > >> Please review
> > > >>
> > > >> On 2022/02/16 12:39:34 Lari Hotari wrote:
> > > >> > On 2022/02/16 00:58:20 PengHui Li wrote:
> > > >> > > Which is a sync method. Ultimately this could lead to all the
> > > >> pulsar-web
> > > >> > > thread
> > > >> > > blocked. we'd better not introduce blocking calls if we use
> > > >> AsyncResponse.
> > > >> > >
> > > >> > > > What issue did you see? Please share more context. Thanks for the
> > > >> > > patience.
> > > >> > >
> > > >> > > It happened very earlier
> > > >> > >
> > > >> > > Here is the issue https://github.com/apache/pulsar/issues/4756
> > > >> > > And here is also a related fix
> > > >> https://github.com/apache/pulsar/pull/10619
> > > >> >
> > > >> > Penghui, Thank you for the patience, and thanks for sharing more
> > > >> context. I happened to send a reply before reading your message, so 
> > > >> please
> > > >> bear with me.
> > > >> >
> > > >> > So finally, I understand that "the problem" is that all HTTP server
> > > >> threads are blocked and this makes the Pulsar Admin API unavailable.
> > > >> >
> > > >> > To support the blocking servlet API, Jetty uses a default thread pool
> > > >> that can grow to up to 200 threads (
> > > >> https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57)
> > > >> .
> > > >> > However this default of 200 maximum threads is not used in Pulsar.
> > > >> >
> > > >> > The problem is that Pulsar uses a low value that assumes asynchronous
> > > >> API usage:
> > > >> >
> > > >> https://github.com/apache/pulsar/blob/5c3ddc26d6e07eb0473b11b5ecc8318c1efe414b/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L201-L204
> > > >> > Pulsar should be using a high value (for example 200) as long as 
> > > >> > there
> > > >> are blocking calls in Admin APIs.
> > > >> >
> > > >> > The mitigation to the issue of all HTTP server threads getting 
> > > >> > blocked
> > > >> is setting "numHttpServerThreads=200" in broker.conf.
> > > >> >
> > > >> > Regarding the "make async" changes, It is an optimization to migrate
> > > >> from the blocking servlet api to the asynchronous servlet api. This 
> > > >> work
> > > >> isn't urgent since we can simply mitigate the HTTP server threads 
> > > >> getting
> > > >> blocked by setting "numHttpServerThreads=200" in broker.conf. "the 
> > > >> problem"
> > > >> will be resolved immediately without risks of regressions that are 
> > > >> involved
> > > >> in making the sync -> async changes.
> > > >> >
> > > >> > Penghui, would you mind adding a GitHub issue for the problem where 
> > > >> > all
> > > >> HTTP threads get blocked and the Pulsar Admin API stops responding?
> > > >> >
> > > >> > I can follow up with a PR which updates the default for
> > > >> numHttpServerThreads to 200. This is a maximum value and Jetty starts 
> > > >> with
> > > >> 8 threads. We can agree on the default value to use in the PR.
> > > >> >
> > > >> > Thank you for the great collaboration on sharing the context and
> > > >> describing the problem patiently.
> > > >> >
> > > >> > BR,
> > > >> >
> > > >> > -Lari
> > > >> >
> > > >>
> > > >

Re: [DISCUSS] Apache Pulsar 2.10.0 release

Reply via email to