Re: PIP 112: Generate Release Notes Automatically

2021-12-16 Thread Yu
Hi all,

This is a follow-up to the last email.

Previously, we use markdown files to create issue templates [1].

For the doc_request issue template, I've changed it to a customized issue
form by adding a YAML form definition file, which is more intuitive and
easy to use.

Feel free to comment on this PR [2], thanks.

[1] https://github.com/apache/pulsar/tree/master/.github/ISSUE_TEMPLATE
[2] https://github.com/apache/pulsar/pull/13359

On Tue, Dec 14, 2021 at 9:09 PM Yu Liu  wrote:

> Spot on.
> This also reminds me that we can create custom issue forms by adding YAML
> form definition files [1], which is more user-friendly and easy to maintain.
>
> [1]
> https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/syntax-for-issue-forms#about-yaml-syntax-for-issue-forms
>
>
> On Tue, Dec 14, 2021 at 2:25 AM Michael Marshall 
> wrote:
>
>> +1  Yu, thank you for putting together this thorough document. This is
>> a great initiative.
>>
>> I think it might help to review and possibly update the PR template as
>> part of this PIP. For example, the current template does not prompt
>> authors whether the PR should be mentioned in release notes. Such a
>> prompt could help committers determine the right labels for a PR.
>>
>> Thanks,
>> Michael
>>
>> On Mon, Dec 13, 2021 at 4:56 AM Li Li 
>> wrote:
>> >
>> > +1
>> >
>> > Good idea, I think I can be part of this PIP after I finished upgrading
>> pulsar website.
>> >
>> > Thanks,
>> > LiLi
>> >
>> > > On Dec 13, 2021, at 4:18 PM, Yu  wrote:
>> > >
>> > > Hi Pulsarers,
>> > >
>> > > As we know[1], there are some issues in the current Pulsar release
>> notes
>> > > (RN), for example:
>> > >
>> > > - For Pulsar users
>> > > They cannot capture the highlights quickly since the RN is a raw dump
>> of
>> > > PRs.
>> > >
>> > > - For Pulsar release managers (RM)
>> > > They feel overwhelmed by the **manual** workload of generating RN
>> since it
>> > > is created based on git commit messages, while many people do not
>> provide
>> > > clear and meaningful info.
>> > > It’s time-consuming to clear up all info especially for a major
>> release
>> > > with lots of PRs.
>> > >
>> > > If RN is regarded as an afterthought and finished as a last-minute
>> task, it
>> > > is likely not written well.
>> > > Instead of rushing, treating RN as a part of development not only
>> reduces
>> > > RM's workload and makes communication more coordinated,
>> > > but also allows more time for us to choose the most valuable
>> highlights
>> > > shown to users.
>> > > Consequently, the process of the current workflow should be improved.
>> > >
>> > > Therefore, I propose the PIP 112: Generate Release Notes
>> Automatically [2]
>> > > and add some initial thoughts and research there.
>> > > It is only a draft but I would like to invite you to join us to bring
>> > > another major change to Pulsar. I believe this would bring many
>> benefits to
>> > > all of us, thanks!
>> > >
>> > > [1] https://lists.apache.org/thread/dl3jb9p3zvlc6ntlkpmxf1m8dw5dcd8z
>> > > [2]
>> > >
>> https://github.com/apache/pulsar/wiki/PIP-112%3A-Generate-Release-Notes-Automatically
>> >
>>
>


[VOTE] Apache Pulsar 2.9.1 candidate 2

2021-12-16 Thread Enrico Olivelli
This is the second release candidate for Apache Pulsar, version 2.9.1.

The first release candidate was aborted without starting a VOTE because we
had to pick up high priority dependency upgrades.

It fixes the following issues:
https://github.com/apache/pulsar/pulls?q=is%3Apr++label%3Arelease%2F2.9.1+

*** Please download, test and vote on this release. This vote will stay open
for at least 72 hours ***

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.1-candidate-2/

SHA-512 checksums:

5ca7d2c6a8ac51413214796481095bbde50b5bda95d8b8f2467989931b29c75e679aabcfebd82e9e3e90dd1644c580214e0a05eca8652a500f042c84cb21becd
 apache-pulsar-2.9.1-bin.tar.gz
34a1e22fb0ff2e69e7e880a9432526990610113cf89d93c953dff82cc443510dcf724eaa0e1fade82464f9bf5443655bd23bcf2064e312c4a9da70bb4c9937ba
 apache-pulsar-2.9.1-src.tar.gz

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1110

The tag to be voted upon:
v2.9.1-candidate-2 (f52ac045f41acbb6c31da21a3463df3cfbe8f1b4)
https://github.com/apache/pulsar/releases/tag/v2.9.1-candidate-2

Link to the release notes:
https://github.com/apache/pulsar/pull/13357

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/pulsar/KEYS

Please download the source package, and follow the README to build
and run the Pulsar standalone service.


Enrico Olivelli


Re: [VOTE] Apache Pulsar 2.9.1 candidate 2

2021-12-16 Thread Nicolò Boschi
+1 (non binding)

Checks:
- Checksum and signatures
- Apache Rat check passes
- OWASP check passes (I created this PR for fix a false positive
https://github.com/apache/pulsar/pull/13364)
- Compile from source w JDK11
- Build docker image from source
- Run Pulsar standalone and produce-consume from CLI
- verified the presence of Log4j 2.16.0 jar in docker and tarball

Il giorno gio 16 dic 2021 alle ore 14:25 Enrico Olivelli <
eolive...@gmail.com> ha scritto:

> This is the second release candidate for Apache Pulsar, version 2.9.1.
>
> The first release candidate was aborted without starting a VOTE because we
> had to pick up high priority dependency upgrades.
>
> It fixes the following issues:
> https://github.com/apache/pulsar/pulls?q=is%3Apr++label%3Arelease%2F2.9.1+
>
> *** Please download, test and vote on this release. This vote will stay
> open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.1-candidate-2/
>
> SHA-512 checksums:
>
>
> 5ca7d2c6a8ac51413214796481095bbde50b5bda95d8b8f2467989931b29c75e679aabcfebd82e9e3e90dd1644c580214e0a05eca8652a500f042c84cb21becd
>  apache-pulsar-2.9.1-bin.tar.gz
>
> 34a1e22fb0ff2e69e7e880a9432526990610113cf89d93c953dff82cc443510dcf724eaa0e1fade82464f9bf5443655bd23bcf2064e312c4a9da70bb4c9937ba
>  apache-pulsar-2.9.1-src.tar.gz
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1110
>
> The tag to be voted upon:
> v2.9.1-candidate-2 (f52ac045f41acbb6c31da21a3463df3cfbe8f1b4)
> https://github.com/apache/pulsar/releases/tag/v2.9.1-candidate-2
>
> Link to the release notes:
> https://github.com/apache/pulsar/pull/13357
>
> Pulsar's KEYS file containing PGP keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
>
> Please download the source package, and follow the README to build
> and run the Pulsar standalone service.
>
>
> Enrico Olivelli
>


-- 
Nicolò Boschi


Re: [DISCUSS] How to handle stale PRs

2021-12-16 Thread Dave Fisher
I just saw another project - https://github.com/openmessaging/benchmark uses 
probot-stale https://github.com/probot/stale

This looks like it has all the features needed to close both stale issues and 
PRs. It allows labels to be used to prevent closure of certain issues and PRs.

Here is their configuration: 
https://github.com/openmessaging/benchmark/blob/master/.github/stale.yml

This bot is allowed in GitHub.com/apache/ where 11 repositories are currently 
using it. When we are ready we will simply create an INFRA JIRA.

> On Dec 15, 2021, at 4:15 PM, Dave Fisher  wrote:
> 
> 
> 
>> On Dec 15, 2021, at 4:06 PM, Matteo Merli  wrote:
>> 
>>> Is #3267 Support set publish time on broker side one of those very valuable 
>>> ideas that was later rejected, likely for performance reasons?
>> 
>> No, this was one that was superseded by other changes.
> 
> Then I’ll close it.
> 
>> 
>>> One problem with the current state is that PRs and even higher level ideas
>>> have a shelf life.  Declaring PR bankruptcy does in fact solve this problem.
>> 
>> I don't believe that is true in all cases and I absolutely don't
>> believe that it is not possible to keep up with the PRs, when the
>> reviewing workload is well balanced.
> 
> 
>> 
>> I'm seeing a lot of opinions here, but at the end of the day the
>> people doing the hard work of reviewing are always the same few ones.
> 
> (1) These are opinions about how to do the work. If you want someone to JFDI 
> it then I’m happy to start closing and labeling as I suggested.

I started closing PRs with a new label - status/stale

https://github.com/apache/pulsar/issues?q=label%3Astatus%2Fstale+is%3Aclosed


> 
> (2) There is a kind of deference being shown to those individuals based on 
> who the contributor selects for review. I wish there was a way for a 
> contributor to ask the dev list for a review.

I plan to research how we might modify how reviews are requested. I think that 
can be in another thread.

> 
> 
>> 
>>> Once we have guidance, I am happy to add it to the Committer Guide on
>> the wiki [0].
>> 
>> Michael, I agree 100% with that. We should write clear guidelines to
>> describe when it makes sense to close, leave for the record, call for
>> "help" to continue working on and so on. That will help committers and
>> contributors.
>> 
>>> Matteo, your comment raises an additional question for me. What are
>>> Apache's rules for completing someone else's contribution? If someone
>>> opens a PR to fix a bug, but it is incomplete and they become
>>> unresponsive, how can we move their contribution forward? These are
>>> the PRs we don't want to close.
>> 
>> I don't think there is any problem in completing someone else's PR,
>> provided that:
>> * The original author is non-responsive or has no time to work on it
>> at the moment (otherwise it would be kind of rude).
>> * We give the right credit to the original author (github has good
>> support for multiple authorship of a commit)
>> 
>> Continuing with a PR is not very different from merging the WIP and
>> fixing it later in a second commit, from a legal perspective.
>> 
>> IANAL, though *AFAIU*, when a contributor is opening a PR is already
>> assigning the IP to the ASF. A committer will merge that code (after
>> due diligence that it doesn't contain inadmissible code), but the code
>> is already "donated to the ASF" at the moment of the PR.
> 
> +1.
> 
> Regards,
> Dave
> 
>> 
>> 
>> --
>> Matteo Merli
>> 
>> 
>> 
>> On Wed, Dec 15, 2021 at 3:14 PM Chris Herzog  
>> wrote:
>>> 
>>> It isn't even an issue related to OSS - every long lived project suffers
>>> from this same issue.  Whether it's a long lingering defect report or a fix
>>> that never got integrated in a timely manner, time wounds all heels.
>>> 
>>> Careful considered review is perfection which can't be hit; if it could be
>>> done, the situation would never have occured in the first place.  Having a
>>> time-to-live is pragmatic, not perfect, but pragmatic.
>>> 
>>> As Jonathan mentioned, if ideas or changes linger too long, they often are
>>> superceded or replaced with more applicable alternatives or might not have
>>> been that important in the first place.  It's a shame because each
>>> languishing PR represents some amount of work from someone (sometimes a
>>> non-trivial amount) but there really isn't a more practical alternative IMO.
>>> 
>>> 
>>> 
>>> On Wed, Dec 15, 2021 at 5:05 PM Jonathan Ellis  wrote:
>>> 
 One problem with the current state is that PRs and even higher level ideas
 have a shelf life.  Declaring PR bankruptcy does in fact solve this
 problem.
 
 The other problem is that from a new contributor's perspective it's
 impossible to tell which issues are relevant and which are clutter that we
 haven't gotten around to closing out.
 
 For this, declaring PR bankruptcy isn't as good as somehow having the
 capacity to review and respond to everything, but it's still better than
 the

Re: [VOTE] Apache Pulsar 2.9.1 candidate 2

2021-12-16 Thread Enrico Olivelli
I have pushed the docker images to my personal dockehub account

eolivelli/pulsar:2.9.1rc2
eolivelli/pulsar-all:2.9.1rc2

Enrico

Il Gio 16 Dic 2021, 15:57 Nicolò Boschi  ha scritto:

> +1 (non binding)
>
> Checks:
> - Checksum and signatures
> - Apache Rat check passes
> - OWASP check passes (I created this PR for fix a false positive
> https://github.com/apache/pulsar/pull/13364)
> - Compile from source w JDK11
> - Build docker image from source
> - Run Pulsar standalone and produce-consume from CLI
> - verified the presence of Log4j 2.16.0 jar in docker and tarball
>
> Il giorno gio 16 dic 2021 alle ore 14:25 Enrico Olivelli <
> eolive...@gmail.com> ha scritto:
>
> > This is the second release candidate for Apache Pulsar, version 2.9.1.
> >
> > The first release candidate was aborted without starting a VOTE because
> we
> > had to pick up high priority dependency upgrades.
> >
> > It fixes the following issues:
> >
> https://github.com/apache/pulsar/pulls?q=is%3Apr++label%3Arelease%2F2.9.1+
> >
> > *** Please download, test and vote on this release. This vote will stay
> > open
> > for at least 72 hours ***
> >
> > Note that we are voting upon the source (tag), binaries are provided for
> > convenience.
> >
> > Source and binary files:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.1-candidate-2/
> >
> > SHA-512 checksums:
> >
> >
> >
> 5ca7d2c6a8ac51413214796481095bbde50b5bda95d8b8f2467989931b29c75e679aabcfebd82e9e3e90dd1644c580214e0a05eca8652a500f042c84cb21becd
> >  apache-pulsar-2.9.1-bin.tar.gz
> >
> >
> 34a1e22fb0ff2e69e7e880a9432526990610113cf89d93c953dff82cc443510dcf724eaa0e1fade82464f9bf5443655bd23bcf2064e312c4a9da70bb4c9937ba
> >  apache-pulsar-2.9.1-src.tar.gz
> >
> > Maven staging repo:
> > https://repository.apache.org/content/repositories/orgapachepulsar-1110
> >
> > The tag to be voted upon:
> > v2.9.1-candidate-2 (f52ac045f41acbb6c31da21a3463df3cfbe8f1b4)
> > https://github.com/apache/pulsar/releases/tag/v2.9.1-candidate-2
> >
> > Link to the release notes:
> > https://github.com/apache/pulsar/pull/13357
> >
> > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > Please download the source package, and follow the README to build
> > and run the Pulsar standalone service.
> >
> >
> > Enrico Olivelli
> >
>
>
> --
> Nicolò Boschi
>


Re: [VOTE] Apache Pulsar 2.9.1 candidate 2

2021-12-16 Thread Matteo Merli
+1

Checked:
 * Signatures
 * Bin distribution:
 - NOTICE, README, LICENSE
 - Start standalone service and producer/consumer test
 * Src distribution:
 - NOTICE,  README, LICENSE
 - Compile and unit tests
 - Start standalone service
 * Checked staging maven repository artifacts
 * Checked docker images


Matteo

--
Matteo Merli




On Thu, Dec 16, 2021 at 12:53 PM Enrico Olivelli  wrote:
>
> I have pushed the docker images to my personal dockehub account
>
> eolivelli/pulsar:2.9.1rc2
> eolivelli/pulsar-all:2.9.1rc2
>
> Enrico
>
> Il Gio 16 Dic 2021, 15:57 Nicolò Boschi  ha scritto:
>
> > +1 (non binding)
> >
> > Checks:
> > - Checksum and signatures
> > - Apache Rat check passes
> > - OWASP check passes (I created this PR for fix a false positive
> > https://github.com/apache/pulsar/pull/13364)
> > - Compile from source w JDK11
> > - Build docker image from source
> > - Run Pulsar standalone and produce-consume from CLI
> > - verified the presence of Log4j 2.16.0 jar in docker and tarball
> >
> > Il giorno gio 16 dic 2021 alle ore 14:25 Enrico Olivelli <
> > eolive...@gmail.com> ha scritto:
> >
> > > This is the second release candidate for Apache Pulsar, version 2.9.1.
> > >
> > > The first release candidate was aborted without starting a VOTE because
> > we
> > > had to pick up high priority dependency upgrades.
> > >
> > > It fixes the following issues:
> > >
> > https://github.com/apache/pulsar/pulls?q=is%3Apr++label%3Arelease%2F2.9.1+
> > >
> > > *** Please download, test and vote on this release. This vote will stay
> > > open
> > > for at least 72 hours ***
> > >
> > > Note that we are voting upon the source (tag), binaries are provided for
> > > convenience.
> > >
> > > Source and binary files:
> > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.1-candidate-2/
> > >
> > > SHA-512 checksums:
> > >
> > >
> > >
> > 5ca7d2c6a8ac51413214796481095bbde50b5bda95d8b8f2467989931b29c75e679aabcfebd82e9e3e90dd1644c580214e0a05eca8652a500f042c84cb21becd
> > >  apache-pulsar-2.9.1-bin.tar.gz
> > >
> > >
> > 34a1e22fb0ff2e69e7e880a9432526990610113cf89d93c953dff82cc443510dcf724eaa0e1fade82464f9bf5443655bd23bcf2064e312c4a9da70bb4c9937ba
> > >  apache-pulsar-2.9.1-src.tar.gz
> > >
> > > Maven staging repo:
> > > https://repository.apache.org/content/repositories/orgapachepulsar-1110
> > >
> > > The tag to be voted upon:
> > > v2.9.1-candidate-2 (f52ac045f41acbb6c31da21a3463df3cfbe8f1b4)
> > > https://github.com/apache/pulsar/releases/tag/v2.9.1-candidate-2
> > >
> > > Link to the release notes:
> > > https://github.com/apache/pulsar/pull/13357
> > >
> > > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > >
> > > Please download the source package, and follow the README to build
> > > and run the Pulsar standalone service.
> > >
> > >
> > > Enrico Olivelli
> > >
> >
> >
> > --
> > Nicolò Boschi
> >


Re: [DISCUSS] Release Pulsar 2.7.4

2021-12-16 Thread guo jiwei
Hi,
After we have fixed some issue like ZookeeperCache NPE, listing namespace
exception, and skip some flaky tests (verified locally), now the CI have
passed.
Skipped flaky tests are tracked here:
https://github.com/apache/pulsar/issues/13299
Now we decide to vote for releasing 2.7.4.

Regards
Jiwei Guo (Tboy)


On Tue, Dec 14, 2021 at 11:58 AM PengHui Li  wrote:

> Thanks for the update, I will move it 2.7.5
>
> Thanks,
> Penghui
>
> On Tue, Dec 14, 2021 at 9:47 AM Matteo Merli 
> wrote:
>
> > Let's take https://github.com/apache/pulsar/pull/12484 out of the
> > picture since it's failing the tests.
> >
> >
> > --
> > Matteo Merli
> > 
> >
> > On Sun, Dec 12, 2021 at 11:06 PM PengHui Li  wrote:
> > >
> > > Yes,
> > >
> > > https://github.com/apache/pulsar/pull/13215 has cherry-picked, so we
> can
> > > close it.
> > > https://github.com/apache/pulsar/pull/12484 blocked by the test.
> > >
> > > Penghui
> > >
> > > On Mon, Dec 13, 2021 at 2:35 PM Dave Fisher 
> > wrote:
> > >
> > > > I see 2 PRs still open at
> > > >
> >
> https://github.com/apache/pulsar/pulls?q=is%3Aopen+is%3Apr+label%3Arelease%2F2.7.4
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 12, 2021, at 8:22 PM, guo jiwei 
> wrote:
> > > > >
> > > > > I have pushed out some fixes in
> > > > https://github.com/apache/pulsar/pull/13243
> > > > > After the tests get passed, I will send out the RC-1 VOTE for 2.7.4
> > > > >
> > > > > Regards
> > > > > Jiwei Guo (Tboy)
> > > > >
> > > > >
> > > > >> On Sun, Dec 12, 2021 at 3:11 PM PengHui Li 
> > wrote:
> > > > >>
> > > > >> Just put an update here. We have done the PR cherry-picking
> > > > >>
> > > > >> https://github.com/apache/pulsar/commits/branch-2.7
> > > > >>
> > > > >> And most of the integration tests are fixed due to the docker
> image
> > > > issue
> > > > >> or the testcontainer issue, now some integration tests get passed,
> > but
> > > > some
> > > > >> are not.
> > > > >> And there are some failed tests, maybe a flaky test, we need to
> > ensure
> > > > it's
> > > > >> not a regression.
> > > > >>
> > > > >> We are continuing on the test part.
> > > > >>
> > > > >> Penghui
> > > > >>
> > > > >>
> > > > >>
> > > > >>> On Sat, Dec 11, 2021 at 5:36 PM PengHui Li 
> > wrote:
> > > > >>>
> > > > >>> Hi Michael,
> > > > >>>
> > > > >>> +1,
> > > > >>>
> > > > >>> Thanks for the great work.
> > > > >>> We will continue on the PR cherry-picking and the release process
> > to
> > > > make
> > > > >>> sure the urgent release can be done ASAP.
> > > > >>>
> > > > >>> Penghui
> > > > >>>
> > > > >>> On Sat, Dec 11, 2021 at 3:42 PM Michael Marshall <
> > mmarsh...@apache.org
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > >  Given the log4j CVE, we should work to release 2.7.4.
> > > > 
> > > >  I started preparing the release today by cherry-picking merged
> PRs
> > > >  that have the `release/2.7.4` label but have not yet been
> > > >  cherry-picked to `branch-2.7` [0]. There are still 37 PRs that
> > have
> > > >  not been cherry picked. I think it will take too long to cherry
> > pick
> > > >  all of these commits, as many have conflicts, and we should
> > prioritize
> > > >  releasing 2.7.4. The main commits that we should get
> cherry-picked
> > > >  before creating the git tag are any labeled with
> > `component/security`.
> > > >  There are only a few remaining commits to cherry pick. Please
> let
> > me
> > > >  know if you think any other commits ought to be cherry-picked.
> > > > 
> > > >  The earliest I'll be able to build the release is Monday. If we
> > need
> > > >  to start sooner, perhaps someone else will be available to
> manage
> > this
> > > >  urgent release.
> > > > 
> > > >  Thanks,
> > > >  Michael
> > > > 
> > > >  [0] -
> > > > 
> > > > >>
> > > >
> >
> https://github.com/apache/pulsar/pulls?page=2&q=label%3Arelease%2F2.7.4+sort%3Acreated-asc+is%3Apr+-label%3Acherry-picked%2Fbranch-2.7
> > > >  [1] -
> > > > 
> > > > >>
> > > >
> >
> https://github.com/apache/pulsar/pulls?q=label%3Arelease%2F2.7.4+sort%3Acreated-asc+is%3Apr+-label%3Acherry-picked%2Fbranch-2.7+label%3Acomponent%2Fsecurity
> > > > 
> > > > 
> > > >  On Thu, Dec 9, 2021 at 4:03 PM Neng Lu 
> wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > On 2021/12/09 15:29:55 Michael Marshall wrote:
> > > > >> Hello Pulsar Community,
> > > > >>
> > > > >> I'd like to propose that we release 2.7.4. We have merged
> > several
> > > > >> important fixes since we released 2.7.3 in August.
> > > > >>
> > > > >> I am happy to volunteer to be the release manager.
> > > > >>
> > > > >> Here [0] you can find the list of 36 commits cherry-picked to
> > > > >> branch-2.7 since 2.7.3 release. It looks like there are more
> PRs
> > > > >> labeled with `release/2.7.4` than commits cherry-picked, so I
> > will
> > > > >> need to work on cherry-picking those befo

Re: [DISCUSSION] PIP-117: Change Pulsar standalone defaults

2021-12-16 Thread Sijie Guo
+1

On Tue, Dec 14, 2021 at 9:18 AM Matteo Merli  wrote:

> https://github.com/apache/pulsar/issues/13302
>
> Copying here for quoting convenience
> 
>
>
>
>
> ## Motivation
>
> Pulsar standalone is the "Pulsar in a box" version of a Pulsar cluster,
> where
> all the components are started within the context of a single JVM process.
>
> Users are using the standalone as a way to get quickly started with Pulsar
> or
> in all the cases where it makes sense to have a single node deployment.
>
> Right now, the standalone is starting by default with many components,
> several of
> which are quite complex, since they are designed to be deployed in a
> distributed
> fashion.
>
> ## Goal
>
> Simplify the components of Pulsar standalone to achieve:
>
>  1. Reduce complexity
>  2. Reduce startup time
>  3. Reduce memory and CPU footprint of running standalone
>
> ## Proposed changes
>
> The proposal here is to change some of the default implementations that are
> used for the Pulsar standalone.
>
>  1. **Metadata Store implementation** -->
>   Change from ZooKeeper to RocksDB
>
>  2. **Pulsar functions package backend** -->
>   Change from using DistributedLog to using local filesystem, storing
> the
>   jars directly in the data folder instead of uploading them into BK.
>
>  3. **Pulsar functions state store implementation** -->
>   Change the state store to be backed by a MetadataStore based backed,
>   with the RocksDB implementation.
>
>  4. **Table Service** -->
>   Do not start BK table service by default
>
> ## Compatibility considerations
>
> In order to avoid compatibility issues where users have existing Pulsar
> standalone services that they want to upgrade without conflicts, we will
> follow the principle of keeping the old defaults where there is existing
> data on the disk.
>
> We will add a file, serving the purpose as a flag, in the `data/standalone`
> directory, for example `new-2.10-defaults`.
>
> If the file is present, or if the data directory is completely missing, we
> will adopt the new set of default configuration settings.
>
> If the file is not there, we will continue to use existing defaults and we
> will
> not break the upgrade operation.
>
>
>
>
>
> --
> Matteo Merli
> 
>


Re: [DISCUSSION] PIP-118: Do not restart brokers when ZooKeeper session expires

2021-12-16 Thread Sijie Guo
+1

On Tue, Dec 14, 2021 at 10:03 AM Matteo Merli  wrote:

> https://github.com/apache/pulsar/issues/13304
>
>
> Pasted below for quoting convenience.
>
> ---
>
>
> ## Motivation
>
> After all the work done for PIP-45 that was already included in 2.8 and 2.9
> releases, it enabled the concept of re-acquirable resource locks and leader
> election.
>
> Another important change was to avoid doing any deferrable metadata
> operation
> when we know that we are not currently connected to the metadata service.
>
> Finally, that enabled stabilization in 2.9 the configuration setting that
> allows
> brokers to continue operating in a safe mode when the session with
> ZooKeeper
> expires.
>
> The way it works is that, when we lose a ZooKeeper session, the data plane
> will
> continue to work undisturbed, relying on the BookKeeper fencing to avoid
> any
> inconsistencies.
>
> New topics are not able to get started, but existing topics will see no
> impact.
>
> The original intention for shutting down the brokers was to ensure that we
> would automatically go back to a consistent state, with respect to which
> resources are "owned" in ZooKeeper by a given broker.
>
> With the re-acquirable resource locks, that problem was solved and
> thoroughly
> tested to be robust.
>
> ## Proposed changes
>
> In 2.10 release, for the setting:
>
> ```properties
> # There are two policies to apply when a broker metadata session
> expires: session expired happens, "shutdown" or "reconnect".
> # With "shutdown", the broker will be restarted.
> # With "reconnect", the broker will keep serving the topics, while
> attempting to recreate a new session.
> zookeeperSessionExpiredPolicy=shutdown
> ```
>
> Change its default value to `reconnect`.
>
>
> --
> Matteo Merli
> 
>


Re: [DISCUSSION] PIP-119: Enable consistent hashing by default on KeyShared dispatcher

2021-12-16 Thread Sijie Guo
+1

On Tue, Dec 14, 2021 at 10:15 AM Matteo Merli  wrote:

> Pasted below for quoting convenience.
>
>
> 
>
> ## Motivation
>
> The consistent hashing implementation to uniformly assign keys to consumers
> in the context of a KeyShared subscription, was introduced in
> https://github.com/apache/pulsar/pull/6791, which was released in Pulsar
> 2.6.0.
>
> While consistent hashing can use slightly more memory in certain cases, it
> is
> more suitable as a general default implementation, as it leads to a fairer
> distribution of keys across consumers, and avoiding corner cases that
> depend
> on the sequence of addition/removal of consumers.
>
> ## Proposed changes
>
> In 2.10 release, for the setting:
>
> ```properties
> # On KeyShared subscriptions, with default AUTO_SPLIT mode, use
> splitting ranges or
> # consistent hashing to reassign keys to new consumers
> subscriptionKeySharedUseConsistentHashing=false
> ```
>
> Change its default value to `true`.
>
> The `AUTO_SPLIT` mode will not be removed nor deprecated. Users will still
> be
> able to use the old implementation.
>
>
>
> --
> Matteo Merli
> 
>


Re: [DISCUSSION] PIP-120: Enable client memory limit by default

2021-12-16 Thread Sijie Guo
+1

On Tue, Dec 14, 2021 at 11:20 AM Matteo Merli  wrote:

> https://github.com/apache/pulsar/issues/13306
>
>
> Pasted below for quoting convenience.
>
>
> 
>
> ## Motivation
>
> In Pulsar 2.8, we have introduced a setting to control the amount of memory
> used by a client instance.
>
> ```java
> interface ClientBuilder {
> ClientBuilder memoryLimit(long memoryLimit, SizeUnit unit);
> }
> ```
>
> By default, in 2.8 and 2.9 this setting is set to 0, meaning no limit is
> being
> enforced.
>
> I think it's a good time for 2.10 to enable this setting by default and,
> correspondingly, to disable by default the producer queue size limit.
>
> This will simplify a lot the configuration that a producer application will
> have to come up with, when publishing with many topic/partitions or
> when messages
> are bigger than expected.
>
> ## Proposed changes
>
> In 2.10 release, for the `ClientBuilder`, change
>   * `memoryLimit`: 0 -> 64 MB
>
> For the `ProducerBuilder`, changes
>   * `maxPendingMessages`: 1000 -> 0
>
> 64MB is picked because it's a small enough memory size that will guarantee
> a very high producer throughput, irrespective of the individual messages
> size.
>
>
>
> --
> Matteo Merli
> 
>


Re: [DISCUSSION] PIP-120: Enable client memory limit by default

2021-12-16 Thread 陳智弘
+1

Sijie Guo  於 2021年12月17日 週五 12:38 寫道:

> +1
>
> On Tue, Dec 14, 2021 at 11:20 AM Matteo Merli  wrote:
>
> > https://github.com/apache/pulsar/issues/13306
> >
> >
> > Pasted below for quoting convenience.
> >
> >
> > 
> >
> > ## Motivation
> >
> > In Pulsar 2.8, we have introduced a setting to control the amount of
> memory
> > used by a client instance.
> >
> > ```java
> > interface ClientBuilder {
> > ClientBuilder memoryLimit(long memoryLimit, SizeUnit unit);
> > }
> > ```
> >
> > By default, in 2.8 and 2.9 this setting is set to 0, meaning no limit is
> > being
> > enforced.
> >
> > I think it's a good time for 2.10 to enable this setting by default and,
> > correspondingly, to disable by default the producer queue size limit.
> >
> > This will simplify a lot the configuration that a producer application
> will
> > have to come up with, when publishing with many topic/partitions or
> > when messages
> > are bigger than expected.
> >
> > ## Proposed changes
> >
> > In 2.10 release, for the `ClientBuilder`, change
> >   * `memoryLimit`: 0 -> 64 MB
> >
> > For the `ProducerBuilder`, changes
> >   * `maxPendingMessages`: 1000 -> 0
> >
> > 64MB is picked because it's a small enough memory size that will
> guarantee
> > a very high producer throughput, irrespective of the individual messages
> > size.
> >
> >
> >
> > --
> > Matteo Merli
> > 
> >
>


Re: [DISCUSSION] PIP-118: Do not restart brokers when ZooKeeper session expires

2021-12-16 Thread Enrico Olivelli
+1

Enrico

Il Ven 17 Dic 2021, 05:36 Sijie Guo  ha scritto:

> +1
>
> On Tue, Dec 14, 2021 at 10:03 AM Matteo Merli  wrote:
>
> > https://github.com/apache/pulsar/issues/13304
> >
> >
> > Pasted below for quoting convenience.
> >
> > ---
> >
> >
> > ## Motivation
> >
> > After all the work done for PIP-45 that was already included in 2.8 and
> 2.9
> > releases, it enabled the concept of re-acquirable resource locks and
> leader
> > election.
> >
> > Another important change was to avoid doing any deferrable metadata
> > operation
> > when we know that we are not currently connected to the metadata service.
> >
> > Finally, that enabled stabilization in 2.9 the configuration setting that
> > allows
> > brokers to continue operating in a safe mode when the session with
> > ZooKeeper
> > expires.
> >
> > The way it works is that, when we lose a ZooKeeper session, the data
> plane
> > will
> > continue to work undisturbed, relying on the BookKeeper fencing to avoid
> > any
> > inconsistencies.
> >
> > New topics are not able to get started, but existing topics will see no
> > impact.
> >
> > The original intention for shutting down the brokers was to ensure that
> we
> > would automatically go back to a consistent state, with respect to which
> > resources are "owned" in ZooKeeper by a given broker.
> >
> > With the re-acquirable resource locks, that problem was solved and
> > thoroughly
> > tested to be robust.
> >
> > ## Proposed changes
> >
> > In 2.10 release, for the setting:
> >
> > ```properties
> > # There are two policies to apply when a broker metadata session
> > expires: session expired happens, "shutdown" or "reconnect".
> > # With "shutdown", the broker will be restarted.
> > # With "reconnect", the broker will keep serving the topics, while
> > attempting to recreate a new session.
> > zookeeperSessionExpiredPolicy=shutdown
> > ```
> >
> > Change its default value to `reconnect`.
> >
> >
> > --
> > Matteo Merli
> > 
> >
>


[PR] Pulsar non root docker image

2021-12-16 Thread Michael Marshall
Hi Pulsar Community,

I opened a PR to make our pulsar and pulsar-all docker images non root
and OpenShift compliant [0]. As some may remember, we had issues with
these changes before due to lack of testing. I plan to test thoroughly
before we merge this PR, and it'd be great to have others test too. I
published a build of my PR [1]. I also have an issue [2] tracking this
work.

Please take a look. I hope to make our 2.10 release a non root release!

Thanks,
Michael

[0] https://github.com/apache/pulsar/pull/13376
[1] michaelmarshall/pulsar:2.10.0-SNAPSHOT
[2] https://github.com/apache/pulsar/issues/11269


Re: [DISCUSSION] PIP-120: Enable client memory limit by default

2021-12-16 Thread mattison chao
+1

On Fri, 17 Dec 2021 at 13:56, 陳智弘  wrote:

> +1
>
> Sijie Guo  於 2021年12月17日 週五 12:38 寫道:
>
> > +1
> >
> > On Tue, Dec 14, 2021 at 11:20 AM Matteo Merli  wrote:
> >
> > > https://github.com/apache/pulsar/issues/13306
> > >
> > >
> > > Pasted below for quoting convenience.
> > >
> > >
> > > 
> > >
> > > ## Motivation
> > >
> > > In Pulsar 2.8, we have introduced a setting to control the amount of
> > memory
> > > used by a client instance.
> > >
> > > ```java
> > > interface ClientBuilder {
> > > ClientBuilder memoryLimit(long memoryLimit, SizeUnit unit);
> > > }
> > > ```
> > >
> > > By default, in 2.8 and 2.9 this setting is set to 0, meaning no limit
> is
> > > being
> > > enforced.
> > >
> > > I think it's a good time for 2.10 to enable this setting by default
> and,
> > > correspondingly, to disable by default the producer queue size limit.
> > >
> > > This will simplify a lot the configuration that a producer application
> > will
> > > have to come up with, when publishing with many topic/partitions or
> > > when messages
> > > are bigger than expected.
> > >
> > > ## Proposed changes
> > >
> > > In 2.10 release, for the `ClientBuilder`, change
> > >   * `memoryLimit`: 0 -> 64 MB
> > >
> > > For the `ProducerBuilder`, changes
> > >   * `maxPendingMessages`: 1000 -> 0
> > >
> > > 64MB is picked because it's a small enough memory size that will
> > guarantee
> > > a very high producer throughput, irrespective of the individual
> messages
> > > size.
> > >
> > >
> > >
> > > --
> > > Matteo Merli
> > > 
> > >
> >
>


Re: [VOTE] Apache Pulsar 2.9.1 candidate 2

2021-12-16 Thread PengHui Li
Checked:

- Build from the src
- Check signatures
- Follow the validation process

But when I try to verify PulsarSQL, got following exceptions:

```
2021-12-17T14:58:18.958+0800 ERROR remote-task-callback-3
io.prestosql.execution.StageStateMachine Stage
20211217_065818_1_cahiv.1 failed
com.google.common.util.concurrent.UncheckedExecutionException:
java.nio.BufferUnderflowException
 at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
 at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
 at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
 at
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4935)
 at
org.apache.pulsar.sql.presto.PulsarSqlSchemaInfoProvider.getSchemaByVersion(PulsarSqlSchemaInfoProvider.java:76)
 at
org.apache.pulsar.sql.presto.PulsarRecordCursor.advanceNextPosition(PulsarRecordCursor.java:485)
 at
io.prestosql.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:90)
 at
io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:302)
 at io.prestosql.operator.Driver.processInternal(Driver.java:379)
 at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
 at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
 at io.prestosql.operator.Driver.processFor(Driver.java:276)
 at
io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
 at
io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
 at
io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
 at
io.prestosql.$gen.Presto_332__testversion20211217_065757_2.run(Unknown
Source)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.BufferUnderflowException
 at java.nio.Buffer.nextGetIndex(Buffer.java:532)
 at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:417)
 at
org.apache.pulsar.sql.presto.PulsarSqlSchemaInfoProvider.loadSchema(PulsarSqlSchemaInfoProvider.java:106)
 at
org.apache.pulsar.sql.presto.PulsarSqlSchemaInfoProvider.access$000(PulsarSqlSchemaInfoProvider.java:49)
 at
org.apache.pulsar.sql.presto.PulsarSqlSchemaInfoProvider$1.load(PulsarSqlSchemaInfoProvider.java:61)
 at
org.apache.pulsar.sql.presto.PulsarSqlSchemaInfoProvider$1.load(PulsarSqlSchemaInfoProvider.java:58)
 at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
 at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
 at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
 at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
 ... 18 more
```

An issue can be found here https://github.com/apache/pulsar/issues/12284,
my test steps are very simple:

1. Start presto worker, `bin/pulsar sql-worker run`
2. Produce some messages, `bin/pulsar-client produce -m "hello" -n 10
test_wordcount_src`
3. Query the data from the topic, `select * from
pulsar."public/default"."test_wordcount_src";`

Not able to query the produced data and get errors in the Pulsar SQL worker.

Penghui

On Fri, Dec 17, 2021 at 5:33 AM Matteo Merli  wrote:

> +1
>
> Checked:
>  * Signatures
>  * Bin distribution:
>  - NOTICE, README, LICENSE
>  - Start standalone service and producer/consumer test
>  * Src distribution:
>  - NOTICE,  README, LICENSE
>  - Compile and unit tests
>  - Start standalone service
>  * Checked staging maven repository artifacts
>  * Checked docker images
>
>
> Matteo
>
> --
> Matteo Merli
> 
>
>
>
> On Thu, Dec 16, 2021 at 12:53 PM Enrico Olivelli 
> wrote:
> >
> > I have pushed the docker images to my personal dockehub account
> >
> > eolivelli/pulsar:2.9.1rc2
> > eolivelli/pulsar-all:2.9.1rc2
> >
> > Enrico
> >
> > Il Gio 16 Dic 2021, 15:57 Nicolò Boschi  ha
> scritto:
> >
> > > +1 (non binding)
> > >
> > > Checks:
> > > - Checksum and signatures
> > > - Apache Rat check passes
> > > - OWASP check passes (I created this PR for fix a false positive
> > > https://github.com/apache/pulsar/pull/13364)
> > > - Compile from source w JDK11
> > > - Build docker image from source
> > > - Run Pulsar standalone and produce-consume from CLI
> > > - verified the presence of Log4j 2.16.0 jar in docker and tarball
> > >
> > > Il giorno gio 16 dic 2021 alle ore 14:25 Enrico Olivelli <
> > > eolive...@gmail.com> ha scritto:
> > >
> > > > This is the second release candidate for Apache Pulsar, version
> 2.9.1.
> > > >
> > > > The first release candidate was aborted without starting a VOTE
> because
> > > we
> > > > had to pick up high priority dependency upgrades.
> > > >
> > > > It fixes the following issues:
> > > >
> > >
> https://github.com/apache/pulsar/pulls?q=is%3Apr++label%3Arelease%2F2.9.1+
> > > >
> > > > *** Please download, test and vote on this release

Re: Dropping Presto SQL in 2.9.0 - status ?

2021-12-16 Thread Lari Hotari
Hi Marvin,

Great work on the Trino PR! It's been a lot of work to get it to match the
Trino code conventions.

I hope we could drop Presto & Pulsar SQL from the apache/pulsar code
repository as planned in PIP-62[1], "PIP 62: Move connectors, adapters and
Pulsar Presto to separate repositories", which was created in April
2020. Let's work together to complete this effort.

Is there anything that others could help with to complete the Trino PR
https://github.com/trinodb/trino/pull/8020 ?

BR,

Lari

[1]
https://github.com/apache/pulsar/wiki/PIP-62%3A-Move-connectors%2C-adapters-and-Pulsar-
Presto-to-separate-repositories


On Wed, Nov 17, 2021 at 3:40 PM Zhengxin Cai  wrote:

> Hi there,
> I think the pr is still open, https://github.com/trinodb/trino/pull/8020,
> will try to push it.
> But even after the pr is merged, I actually still think we might still want
> to keep a copy of the connector in Pulsar repo and push changes to Trino
> repo periodically, as this will allow much faster bug fix and feature
> iteration.
> Best,
> Marvin,
>
> Lari Hotari  于2021年11月17日周三 下午2:19写道:
>
> > Dear Pulsar community members,
> >
> > PIP-62[1], "PIP 62: Move connectors, adapters and Pulsar Presto to
> separate
> > repositories" was created in April 2020. The repositories for
> > pulsar-connectors, pulsar-adapters and pulsar-sql were created about a
> year
> > ago [2].
> >
> > What is the current roadmap for completing PIP-62 and moving
> > pulsar-connectors and pulsar-sql out of apache/pulsar repository?
> >
> > BR,
> >
> > Lari
> >
> > [1]
> >
> >
> https://github.com/apache/pulsar/wiki/PIP-62%3A-Move-connectors%2C-adapters-and-Pulsar-Presto-to-separate-repositories
> > [2]
> >
> >
> https://lists.apache.org/thread.html/r9e6ec742e2896da1f0ce7d4adc7cb84fc6db6dbf797732ccdd50fb86%40%3Cdev.pulsar.apache.org%3E
> >
> > Other email threads:
> > * [Discuss] Don't include presto/trino in the normal Pulsar distribution
> -
> > https://lists.apache.org/thread/jn96tct54mn0tvdot62vdslrvs38fm6d
> > * Updates on Presto connector for PIP-62 -
> > https://lists.apache.org/thread/f9n6sc2mrboq5sxhjbr7gvdl8vqp9fpk
> >
> > On Tue, Nov 2, 2021 at 3:59 PM Nicolò Boschi 
> wrote:
> >
> > > Resurrecting this thread.
> > >
> > > 2.9 is almost released and it hasn't been merged yet
> > >
> > > Extending the discussion to other connectors, it looks like there has
> > been
> > > no progress on PIP-62.
> > > My concern is that a lot of Pulsar IO connectors dependencies we are
> > > running are obsolete with several security reports
> > >
> > > I see there are interesting comments in the issue (
> > > https://github.com/apache/pulsar/issues/10219) and Sijie exported the
> > > pulsar-io dir to https://github.com/apache/pulsar-connectors but it's
> > > outdated
> > >
> > > From my point of view, we have to:
> > > - reimport all the connectors source codes with newest ones (including
> > > integration tests)
> > > - add periodic CI jobs for connectors to run against master,
> 2.9-latest,
> > > 2.8-latest, 2.7-latest to verify breaking changes
> > > - define a release cycle/management for connectors (we should improve
> the
> > > PIP doc). IMO it's not clear if each connector will have its own
> release
> > > versions and how we'll handle it (git tags, artifacts deployment..)
> > > - update pulsar release script in order to get the connectors artifacts
> > > (retrieving the .nar or building it from source?)
> > > - update docs
> > > - remove pulsar-io dir from Pulsar repo
> > >
> > > It's the perfect timing to schedule this work for 2.10
> > >
> > > What is missing? How's the situation? Is there a roadblock I haven't
> > seen?
> > > I think it's better to take another discussion for Presto since it will
> > > come to another end
> > >
> > >
> > > Il giorno sab 14 ago 2021 alle ore 15:21 Enrico Olivelli <
> > > eolive...@gmail.com> ha scritto:
> > >
> > > > Sijie
> > > >
> > > > Il Ven 13 Ago 2021, 22:00 Sijie Guo  ha scritto:
> > > >
> > > > > You can follow the progress at
> > > > https://github.com/trinodb/trino/pull/8020.
> > > > >
> > > >
> > > > Thanks for the pointer
> > > >
> > > > >
> > > > > The original code doesn't conform to TrinoDB's standard. Marvin is
> > > > > actively following up on that.
> > > > >
> > > > > Our goal is still to get this completed as part of the 2.9 release.
> > > > >
> > > >
> > > > Wonderful
> > > >
> > > > Thanks
> > > > Enrico
> > > >
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Fri, Aug 13, 2021 at 2:04 AM Enrico Olivelli <
> eolive...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > Hello,
> > > > > > How is the Presto work going ?
> > > > > > IIRC the plan was to remove it from the Pulsar code base and let
> it
> > > be
> > > > > > hosted at Trino.
> > > > > >
> > > > > > If this is not going to happen within the 2.9.0 release timeline
> > > > > > (September?) I would prefer to upgrade to "Trino".
> > > > > > Probably we will have a downside problem that recent versions of
> > > > > > Presto/Trino