Hi all, Put an update here, I have created branch-2.10[1]
[1]https://github.com/apache/pulsar/tree/branch-2.10 On Thu, Feb 17, 2022 at 9:23 AM PengHui Li <peng...@apache.org> wrote: > Hi Micheal > > > Penghui, is your current plan to create branch-2.10, create the > release artifacts, and start a vote on them all within a few days of > each other? > > Yes, I will create branch-2.10 today. > > For starting the vote, we need to confirm these 2 PRs[1] will not introduce > breaking changes. Very grateful if someone can also help verify them. > > > [1] https://github.com/apache/pulsar/pull/13376, > https://github.com/apache/pulsar/pull/13341 > > Thanks, > Penghui > > On Thu, Feb 17, 2022 at 8:59 AM Matteo Merli <matteo.me...@gmail.com> > wrote: > >> Yes, but I think that the code freeze is only meaningful if it’s >> communicated in advance. >> >> The fact that it was included in the original PIP but never followed in >> the >> practice means it would be a last minute change. >> >> On Wed, Feb 16, 2022 at 2:37 PM Michael Marshall <mmarsh...@apache.org> >> wrote: >> >> > When we discussed the code freeze in the community meeting on 2/3, I >> > was under the impression that it was a new development to our existing >> > release process. I subsequently learned it was already defined in >> > PIP 47. Even if we haven't been following this part of PIP 47, what >> > is the value in waiting until 2.11 to follow our already defined >> process? >> > While I agree it is helpful to provide guidance on when a version will >> > ship, >> > I think it is more important to give the community time to test a >> release, >> > even if that means we're a little late on our release schedule. So far, >> > we haven't even created a branch to begin testing. >> > >> > Note also that Sijie suggested using a feature freeze early on in this >> > thread. >> > >> > The 2.9.0 release is relevant here. It had 4 release candidates over 4 >> > weeks and the final result was broken. That indicates to me that tagging >> > an RC early does not guarantee an early release and that our current >> > process isn't optimal and likely needs adjustments. I do not think we >> > should wait to address these issues. I propose we start following >> > PIP 47's guidance on code freeze and release stabilization periods. >> > >> > > I don't think that changes the picture here. There are *always* last >> > > minute issues being discovered, and there is a call to be made on a >> > > case by case. The feature freeze will reduce the likelihood of >> > > introducing *more* issues by getting it from the master branch, but >> > > won't change a comma from issues that were already there. >> > >> > I thought you wanted to implement a code/feature freeze to allow for >> > more release stabilization. Can you clarify what you mean here? >> > >> > Thanks, >> > Michael >> > >> > >> > >> > >> > >> > >> > >> > On Wed, Feb 16, 2022 at 2:42 PM Matteo Merli <matteo.me...@gmail.com> >> > wrote: >> > > >> > > Michael, as we chatted in last weekly meeting (though not yet >> > > formalized), since we have never really done a feature freeze on the >> > > branch during paste releases, we should start from the next release, >> > > to give a decent preview of what to expect to developers in terms of >> > > dates. >> > > >> > > > While some may feel "behind" in getting out the 2.10 release, our >> > > > priority must be to give the community time to verify the stability >> of >> > > > the release. >> > > >> > > I don't think that changes the picture here. There are *always* last >> > > minute issues being discovered, and there is a call to be made on a >> > > case by case. The feature freeze will reduce the likelihood of >> > > introducing *more* issues by getting it from the master branch, but >> > > won't change a comma from issues that were already there. >> > > >> > > >> > > >> > > >> > > -- >> > > Matteo Merli >> > > <matteo.me...@gmail.com> >> > > >> > > On Wed, Feb 16, 2022 at 10:47 AM Michael Marshall < >> mmarsh...@apache.org> >> > wrote: >> > > > >> > > > > I will build the release and start the vote before next >> Monday(GMT+8) >> > > > >> > > > Penghui, is your current plan to create branch-2.10, create the >> > > > release artifacts, and start a vote on them all within a few days of >> > > > each other? >> > > > >> > > > > I'm doing my best to follow PIP 47, but when seeing a potential >> break >> > > > > change, I need to confirm it. >> > > > > After all the potential break changes have been confirmed and >> fixed, >> > I will >> > > > > start the vote thread. >> > > > >> > > > I think we should review our current release plan before we move >> > > > forward as proposed above. PIP 47 explicitly says that a month >> before >> > > > the release date, the release manager will cut branches [0]. We >> don't >> > > > yet have a `branch-2.10`. PIP 47 also defines a period of time for a >> > > > feature freeze and then a code freeze. We have not yet had either. >> > > > >> > > > I propose we create branch-2.10 now and simultaneously announce that >> > > > we are past the feature freeze period. Then, we can start the 2 week >> > > > period for bug fixes that precedes the code freeze, as PIP 47 >> > > > prescribes. Then, in two weeks, we can produce the first release >> > > > candidate (also in PIP 47). >> > > > >> > > > While some may feel "behind" in getting out the 2.10 release, our >> > > > priority must be to give the community time to verify the stability >> of >> > > > the release. >> > > > >> > > > Thanks, >> > > > Michael >> > > > >> > > > [0] >> > https://github.com/apache/pulsar/wiki/PIP-47%3A-Time-Based-Release-Plan >> > > > >> > > > >> > > > >> > > > >> > > > On Wed, Feb 16, 2022 at 9:09 AM PengHui Li <peng...@apache.org> >> wrote: >> > > > > >> > > > > Hi all >> > > > > >> > > > > Just put an update here. >> > > > > >> > > > > We have 2 PRs[1] https://github.com/apache/pulsar/pull/13376 and >> > > > > https://github.com/apache/pulsar/pull/13341 >> > > > > need to do the final verification, and you are also very welcome >> to >> > verify >> > > > > these 2 changes in your environment, cases. >> > > > > >> > > > > I will build the release and start the vote before next >> Monday(GMT+8) >> > > > > >> > > > > Regards >> > > > > Penghui >> > > > > >> > > > > On Wed, Feb 16, 2022 at 10:22 PM PengHui Li <peng...@apache.org> >> > wrote: >> > > > > >> > > > > > Hi lari, >> > > > > > >> > > > > > > So finally, I understand that "the problem" is that all HTTP >> > server >> > > > > > threads are blocked and this makes the Pulsar Admin API >> > unavailable. >> > > > > > >> > > > > > To support the blocking servlet API, Jetty uses a default thread >> > pool that >> > > > > > can grow to up to 200 threads ( >> > > > > > >> > >> https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57 >> > ) >> > > > > > . >> > > > > > However this default of 200 maximum threads is not used in >> Pulsar. >> > > > > > >> > > > > > Regarding the "make async" changes, It is an optimization to >> > migrate from >> > > > > > the blocking servlet api to the asynchronous servlet api. This >> > work isn't >> > > > > > urgent since we can simply mitigate the HTTP server threads >> > getting blocked >> > > > > > by setting "numHttpServerThreads=200" in broker.conf. "the >> > problem" will be >> > > > > > resolved immediately without risks of regressions that are >> > involved in >> > > > > > making the sync -> async changes. >> > > > > > >> > > > > > Yes, this is the problem. But I am against using 200 threads as >> > the max >> > > > > > web server thread by default, >> > > > > > it can't work for cases that the broker without that much >> memory, >> > it will >> > > > > > lead to more serious problems >> > > > > > that the service quality of messaging API gets worse due to the >> JVM >> > > > > > GC, even memory overflow. >> > > > > > >> > > > > > Yes, it isn't urgent. So I said it's not a blocker for the 2.10 >> > release, >> > > > > > and all the PRs are not cherry-picked to branch-2.x >> > > > > > This is an optimization for pulsar, the current implementation >> > does not >> > > > > > use jetty async API well, we should fix it, >> > > > > > we should reduce the code with bad smells, and using async API >> is >> > also >> > > > > > a more efficient way without opening such jetty threads. >> > > > > > Do you have any concerns about the way the modification becomes >> > purely >> > > > > > async? >> > > > > > >> > > > > > > Penghui, would you mind adding a GitHub issue for the problem >> > where all >> > > > > > HTTP threads get blocked and the Pulsar Admin API stops >> responding? >> > > > > > >> > > > > > https://github.com/apache/pulsar/issues/4756 the attachment >> from >> > the >> > > > > > issue is a good example >> > > > > > >> > > > > > Regards, >> > > > > > Penghui >> > > > > > >> > > > > > >> > > > > > On Wed, Feb 16, 2022 at 9:04 PM Lari Hotari <lhot...@apache.org >> > >> > wrote: >> > > > > > >> > > > > >> I created PR https://github.com/apache/pulsar/pull/14320 to >> set >> > > > > >> numHttpServerThreads=200 . >> > > > > >> Please review >> > > > > >> >> > > > > >> On 2022/02/16 12:39:34 Lari Hotari wrote: >> > > > > >> > On 2022/02/16 00:58:20 PengHui Li wrote: >> > > > > >> > > Which is a sync method. Ultimately this could lead to all >> the >> > > > > >> pulsar-web >> > > > > >> > > thread >> > > > > >> > > blocked. we'd better not introduce blocking calls if we use >> > > > > >> AsyncResponse. >> > > > > >> > > >> > > > > >> > > > What issue did you see? Please share more context. Thanks >> > for the >> > > > > >> > > patience. >> > > > > >> > > >> > > > > >> > > It happened very earlier >> > > > > >> > > >> > > > > >> > > Here is the issue >> > https://github.com/apache/pulsar/issues/4756 >> > > > > >> > > And here is also a related fix >> > > > > >> https://github.com/apache/pulsar/pull/10619 >> > > > > >> > >> > > > > >> > Penghui, Thank you for the patience, and thanks for sharing >> more >> > > > > >> context. I happened to send a reply before reading your >> message, >> > so please >> > > > > >> bear with me. >> > > > > >> > >> > > > > >> > So finally, I understand that "the problem" is that all HTTP >> > server >> > > > > >> threads are blocked and this makes the Pulsar Admin API >> > unavailable. >> > > > > >> > >> > > > > >> > To support the blocking servlet API, Jetty uses a default >> > thread pool >> > > > > >> that can grow to up to 200 threads ( >> > > > > >> >> > >> https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57 >> > ) >> > > > > >> . >> > > > > >> > However this default of 200 maximum threads is not used in >> > Pulsar. >> > > > > >> > >> > > > > >> > The problem is that Pulsar uses a low value that assumes >> > asynchronous >> > > > > >> API usage: >> > > > > >> > >> > > > > >> >> > >> https://github.com/apache/pulsar/blob/5c3ddc26d6e07eb0473b11b5ecc8318c1efe414b/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L201-L204 >> > > > > >> > Pulsar should be using a high value (for example 200) as long >> > as there >> > > > > >> are blocking calls in Admin APIs. >> > > > > >> > >> > > > > >> > The mitigation to the issue of all HTTP server threads >> getting >> > blocked >> > > > > >> is setting "numHttpServerThreads=200" in broker.conf. >> > > > > >> > >> > > > > >> > Regarding the "make async" changes, It is an optimization to >> > migrate >> > > > > >> from the blocking servlet api to the asynchronous servlet api. >> > This work >> > > > > >> isn't urgent since we can simply mitigate the HTTP server >> threads >> > getting >> > > > > >> blocked by setting "numHttpServerThreads=200" in broker.conf. >> > "the problem" >> > > > > >> will be resolved immediately without risks of regressions that >> > are involved >> > > > > >> in making the sync -> async changes. >> > > > > >> > >> > > > > >> > Penghui, would you mind adding a GitHub issue for the problem >> > where all >> > > > > >> HTTP threads get blocked and the Pulsar Admin API stops >> > responding? >> > > > > >> > >> > > > > >> > I can follow up with a PR which updates the default for >> > > > > >> numHttpServerThreads to 200. This is a maximum value and Jetty >> > starts with >> > > > > >> 8 threads. We can agree on the default value to use in the PR. >> > > > > >> > >> > > > > >> > Thank you for the great collaboration on sharing the context >> and >> > > > > >> describing the problem patiently. >> > > > > >> > >> > > > > >> > BR, >> > > > > >> > >> > > > > >> > -Lari >> > > > > >> > >> > > > > >> >> > > > > > >> > >> -- >> -- >> Matteo Merli >> <matteo.me...@gmail.com> >> >