Thanks for creating the branch, Penghui.

> Yes, but I think that the code freeze is only meaningful if it’s
> communicated in advance.

Given that our community members who focus on testing are otherwise
about to prepare for a quick 3 day round of testing, I don't believe
they would object to a last minute change that gives them more time
for testing. In your view, who needs this advanced communication to
make our code freeze meaningful?

Thanks,
Michael

On Wed, Feb 16, 2022 at 8:07 PM PengHui Li <peng...@apache.org> wrote:
>
> Hi all,
>
> Put an update here, I have created branch-2.10[1]
>
> [1]https://github.com/apache/pulsar/tree/branch-2.10
>
> On Thu, Feb 17, 2022 at 9:23 AM PengHui Li <peng...@apache.org> wrote:
>
> > Hi Micheal
> >
> > > Penghui, is your current plan to create branch-2.10, create the
> > release artifacts, and start a vote on them all within a few days of
> > each other?
> >
> > Yes, I will create branch-2.10 today.
> >
> > For starting the vote, we need to confirm these 2 PRs[1] will not introduce
> > breaking changes. Very grateful if someone can also help verify them.
> >
> >
> > [1] https://github.com/apache/pulsar/pull/13376,
> > https://github.com/apache/pulsar/pull/13341
> >
> > Thanks,
> > Penghui
> >
> > On Thu, Feb 17, 2022 at 8:59 AM Matteo Merli <matteo.me...@gmail.com>
> > wrote:
> >
> >> Yes, but I think that the code freeze is only meaningful if it’s
> >> communicated in advance.
> >>
> >> The fact that it was included in the original PIP but never followed in
> >> the
> >> practice means it would be a last minute change.
> >>
> >> On Wed, Feb 16, 2022 at 2:37 PM Michael Marshall <mmarsh...@apache.org>
> >> wrote:
> >>
> >> > When we discussed the code freeze in the community meeting on 2/3, I
> >> > was under the impression that it was a new development to our existing
> >> > release process. I subsequently learned it was already defined in
> >> > PIP 47. Even if we haven't been following this part of PIP 47, what
> >> > is the value in waiting until 2.11 to follow our already defined
> >> process?
> >> > While I agree it is helpful to provide guidance on when a version will
> >> > ship,
> >> > I think it is more important to give the community time to test a
> >> release,
> >> > even if that means we're a little late on our release schedule. So far,
> >> > we haven't even created a branch to begin testing.
> >> >
> >> > Note also that Sijie suggested using a feature freeze early on in this
> >> > thread.
> >> >
> >> > The 2.9.0 release is relevant here. It had 4 release candidates over 4
> >> > weeks and the final result was broken. That indicates to me that tagging
> >> > an RC early does not guarantee an early release and that our current
> >> > process isn't optimal and likely needs adjustments. I do not think we
> >> > should wait to address these issues. I propose we start following
> >> > PIP 47's guidance on code freeze and release stabilization periods.
> >> >
> >> > > I don't think that changes the picture here. There are *always* last
> >> > > minute issues being discovered, and there is a call to be made on a
> >> > > case by case. The feature freeze will reduce the likelihood of
> >> > > introducing *more* issues by getting it from the master branch, but
> >> > > won't change a comma from issues that were already there.
> >> >
> >> > I thought you wanted to implement a code/feature freeze to allow for
> >> > more release stabilization. Can you clarify what you mean here?
> >> >
> >> > Thanks,
> >> > Michael
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Feb 16, 2022 at 2:42 PM Matteo Merli <matteo.me...@gmail.com>
> >> > wrote:
> >> > >
> >> > > Michael, as we chatted in last weekly meeting (though not yet
> >> > > formalized), since we have never really done a feature freeze on the
> >> > > branch during paste releases, we should start from the next release,
> >> > > to give a decent preview of what to expect to developers in terms of
> >> > > dates.
> >> > >
> >> > > > While some may feel "behind" in getting out the 2.10 release, our
> >> > > > priority must be to give the community time to verify the stability
> >> of
> >> > > > the release.
> >> > >
> >> > > I don't think that changes the picture here. There are *always* last
> >> > > minute issues being discovered, and there is a call to be made on a
> >> > > case by case. The feature freeze will reduce the likelihood of
> >> > > introducing *more* issues by getting it from the master branch, but
> >> > > won't change a comma from issues that were already there.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Matteo Merli
> >> > > <matteo.me...@gmail.com>
> >> > >
> >> > > On Wed, Feb 16, 2022 at 10:47 AM Michael Marshall <
> >> mmarsh...@apache.org>
> >> > wrote:
> >> > > >
> >> > > > > I will build the release and start the vote before next
> >> Monday(GMT+8)
> >> > > >
> >> > > > Penghui, is your current plan to create branch-2.10, create the
> >> > > > release artifacts, and start a vote on them all within a few days of
> >> > > > each other?
> >> > > >
> >> > > > > I'm doing my best to follow PIP 47, but when seeing a potential
> >> break
> >> > > > > change, I need to confirm it.
> >> > > > > After all the potential break changes have been confirmed and
> >> fixed,
> >> > I will
> >> > > > > start the vote thread.
> >> > > >
> >> > > > I think we should review our current release plan before we move
> >> > > > forward as proposed above. PIP 47 explicitly says that a month
> >> before
> >> > > > the release date, the release manager will cut branches [0]. We
> >> don't
> >> > > > yet have a `branch-2.10`. PIP 47 also defines a period of time for a
> >> > > > feature freeze and then a code freeze. We have not yet had either.
> >> > > >
> >> > > > I propose we create branch-2.10 now and simultaneously announce that
> >> > > > we are past the feature freeze period. Then, we can start the 2 week
> >> > > > period for bug fixes that precedes the code freeze, as PIP 47
> >> > > > prescribes. Then, in two weeks, we can produce the first release
> >> > > > candidate (also in PIP 47).
> >> > > >
> >> > > > While some may feel "behind" in getting out the 2.10 release, our
> >> > > > priority must be to give the community time to verify the stability
> >> of
> >> > > > the release.
> >> > > >
> >> > > > Thanks,
> >> > > > Michael
> >> > > >
> >> > > > [0]
> >> > https://github.com/apache/pulsar/wiki/PIP-47%3A-Time-Based-Release-Plan
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Feb 16, 2022 at 9:09 AM PengHui Li <peng...@apache.org>
> >> wrote:
> >> > > > >
> >> > > > > Hi all
> >> > > > >
> >> > > > > Just put an update here.
> >> > > > >
> >> > > > > We have 2 PRs[1] https://github.com/apache/pulsar/pull/13376 and
> >> > > > > https://github.com/apache/pulsar/pull/13341
> >> > > > > need to do the final verification, and you are also very welcome
> >> to
> >> > verify
> >> > > > > these 2 changes in your environment, cases.
> >> > > > >
> >> > > > > I will build the release and start the vote before next
> >> Monday(GMT+8)
> >> > > > >
> >> > > > > Regards
> >> > > > > Penghui
> >> > > > >
> >> > > > > On Wed, Feb 16, 2022 at 10:22 PM PengHui Li <peng...@apache.org>
> >> > wrote:
> >> > > > >
> >> > > > > > Hi lari,
> >> > > > > >
> >> > > > > > > So finally, I understand that "the problem" is that all HTTP
> >> > server
> >> > > > > > threads are blocked and this makes the Pulsar Admin API
> >> > unavailable.
> >> > > > > >
> >> > > > > > To support the blocking servlet API, Jetty uses a default thread
> >> > pool that
> >> > > > > > can grow to up to 200 threads (
> >> > > > > >
> >> >
> >> https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57
> >> > )
> >> > > > > > .
> >> > > > > > However this default of 200 maximum threads is not used in
> >> Pulsar.
> >> > > > > >
> >> > > > > >  Regarding the "make async" changes, It is an optimization to
> >> > migrate from
> >> > > > > > the blocking servlet api to the asynchronous servlet api. This
> >> > work isn't
> >> > > > > > urgent since we can simply mitigate the HTTP server threads
> >> > getting blocked
> >> > > > > > by setting "numHttpServerThreads=200" in broker.conf. "the
> >> > problem" will be
> >> > > > > > resolved immediately without risks of regressions that are
> >> > involved in
> >> > > > > > making the sync -> async changes.
> >> > > > > >
> >> > > > > > Yes, this is the problem. But I am against using 200 threads as
> >> > the max
> >> > > > > > web server thread by default,
> >> > > > > > it can't work for cases that the broker without that much
> >> memory,
> >> > it will
> >> > > > > > lead to more serious problems
> >> > > > > > that the service quality of messaging API gets worse due to the
> >> JVM
> >> > > > > > GC, even memory overflow.
> >> > > > > >
> >> > > > > > Yes, it isn't urgent. So I said it's not a blocker for the 2.10
> >> > release,
> >> > > > > > and all the PRs are not cherry-picked to branch-2.x
> >> > > > > > This is an optimization for pulsar, the current implementation
> >> > does not
> >> > > > > > use jetty async API well, we should fix it,
> >> > > > > > we should reduce the code with bad smells, and using async API
> >> is
> >> > also
> >> > > > > > a more efficient way without opening such jetty threads.
> >> > > > > > Do you have any concerns about the way the modification becomes
> >> > purely
> >> > > > > > async?
> >> > > > > >
> >> > > > > > > Penghui, would you mind adding a GitHub issue for the problem
> >> > where all
> >> > > > > > HTTP threads get blocked and the Pulsar Admin API stops
> >> responding?
> >> > > > > >
> >> > > > > > https://github.com/apache/pulsar/issues/4756 the attachment
> >> from
> >> > the
> >> > > > > > issue is a good example
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Penghui
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Feb 16, 2022 at 9:04 PM Lari Hotari <lhot...@apache.org
> >> >
> >> > wrote:
> >> > > > > >
> >> > > > > >> I created PR https://github.com/apache/pulsar/pull/14320 to
> >> set
> >> > > > > >> numHttpServerThreads=200 .
> >> > > > > >> Please review
> >> > > > > >>
> >> > > > > >> On 2022/02/16 12:39:34 Lari Hotari wrote:
> >> > > > > >> > On 2022/02/16 00:58:20 PengHui Li wrote:
> >> > > > > >> > > Which is a sync method. Ultimately this could lead to all
> >> the
> >> > > > > >> pulsar-web
> >> > > > > >> > > thread
> >> > > > > >> > > blocked. we'd better not introduce blocking calls if we use
> >> > > > > >> AsyncResponse.
> >> > > > > >> > >
> >> > > > > >> > > > What issue did you see? Please share more context. Thanks
> >> > for the
> >> > > > > >> > > patience.
> >> > > > > >> > >
> >> > > > > >> > > It happened very earlier
> >> > > > > >> > >
> >> > > > > >> > > Here is the issue
> >> > https://github.com/apache/pulsar/issues/4756
> >> > > > > >> > > And here is also a related fix
> >> > > > > >> https://github.com/apache/pulsar/pull/10619
> >> > > > > >> >
> >> > > > > >> > Penghui, Thank you for the patience, and thanks for sharing
> >> more
> >> > > > > >> context. I happened to send a reply before reading your
> >> message,
> >> > so please
> >> > > > > >> bear with me.
> >> > > > > >> >
> >> > > > > >> > So finally, I understand that "the problem" is that all HTTP
> >> > server
> >> > > > > >> threads are blocked and this makes the Pulsar Admin API
> >> > unavailable.
> >> > > > > >> >
> >> > > > > >> > To support the blocking servlet API, Jetty uses a default
> >> > thread pool
> >> > > > > >> that can grow to up to 200 threads (
> >> > > > > >>
> >> >
> >> https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57
> >> > )
> >> > > > > >> .
> >> > > > > >> > However this default of 200 maximum threads is not used in
> >> > Pulsar.
> >> > > > > >> >
> >> > > > > >> > The problem is that Pulsar uses a low value that assumes
> >> > asynchronous
> >> > > > > >> API usage:
> >> > > > > >> >
> >> > > > > >>
> >> >
> >> https://github.com/apache/pulsar/blob/5c3ddc26d6e07eb0473b11b5ecc8318c1efe414b/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L201-L204
> >> > > > > >> > Pulsar should be using a high value (for example 200) as long
> >> > as there
> >> > > > > >> are blocking calls in Admin APIs.
> >> > > > > >> >
> >> > > > > >> > The mitigation to the issue of all HTTP server threads
> >> getting
> >> > blocked
> >> > > > > >> is setting "numHttpServerThreads=200" in broker.conf.
> >> > > > > >> >
> >> > > > > >> > Regarding the "make async" changes, It is an optimization to
> >> > migrate
> >> > > > > >> from the blocking servlet api to the asynchronous servlet api.
> >> > This work
> >> > > > > >> isn't urgent since we can simply mitigate the HTTP server
> >> threads
> >> > getting
> >> > > > > >> blocked by setting "numHttpServerThreads=200" in broker.conf.
> >> > "the problem"
> >> > > > > >> will be resolved immediately without risks of regressions that
> >> > are involved
> >> > > > > >> in making the sync -> async changes.
> >> > > > > >> >
> >> > > > > >> > Penghui, would you mind adding a GitHub issue for the problem
> >> > where all
> >> > > > > >> HTTP threads get blocked and the Pulsar Admin API stops
> >> > responding?
> >> > > > > >> >
> >> > > > > >> > I can follow up with a PR which updates the default for
> >> > > > > >> numHttpServerThreads to 200. This is a maximum value and Jetty
> >> > starts with
> >> > > > > >> 8 threads. We can agree on the default value to use in the PR.
> >> > > > > >> >
> >> > > > > >> > Thank you for the great collaboration on sharing the context
> >> and
> >> > > > > >> describing the problem patiently.
> >> > > > > >> >
> >> > > > > >> > BR,
> >> > > > > >> >
> >> > > > > >> > -Lari
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> >
> >> --
> >> --
> >> Matteo Merli
> >> <matteo.me...@gmail.com>
> >>
> >

Reply via email to