Re: [DISCUSS] Improvements on the release process

2022-09-06 Thread Haiting Jiang
Great work!  Yunze.

There are a lot of work before current release process, maybe we should
also include these:

1. Start a discussion on the mail list about the release. We can provide a
template to include more clear info about opening PRs and PRs to be
cherry-picked to the released branch.

2. Handling all the opening PRs and PRs to be cherry-picked.

2.1 Revisit the PR if it should be ported to the released branch. The
release label may not be correct.

2.2 Cherry-pick merged PR to the released branch. If there are too many
conflicts, we can ask the PR author to open a new PR to the released
branch. If the PR can be cherry-picked directly, we should also check the
CI status of the branch after we push them directly.

2.3 It would be better if we have a clear time window that we should wait
until we postpone the PR to the next release.


Thanks,
Haiting

On Tue, Sep 6, 2022 at 2:24 PM Yunze Xu 
wrote:

> Hi Yu,
>
> Thanks for your reminder@
>
> Thanks,
> Yunze
>
>
>
>
> > On Sep 6, 2022, at 11:56, Yu  wrote:
> >
> > Hi Yunze,
> >
> > Thanks for updating the workflow!
> >
> > ~~
> >
> > Hi all,
> >
> > For the release process, we've updated the doc-related workflow [1] since
> > we changed the doc maintenance strategy [2].
> >
> > TL;DR
> > Breaking change:
> > For release managers: from 2.8.x, you do not need to generate
> > an independent doc set for each release.
> >
> > ~~
> >
> > [1] Workflow changes:
> >
> > - For doc contributors:
> >
> https://docs.google.com/document/d/1-1uJyd1k9_h56xiiVRVOnrLcCnTmg9n7SrHhNVNEEi4/edit#bookmark=id.q5m2r5pimdi6
> >
> > - For release managers:
> >
> https://github.com/apache/pulsar/pull/17130/files#diff-f3115b8be648c3dc440594799619e7ce4408a34efab13b9d57a902030b62562c
> >
> > [2] https://github.com/apache/pulsar/issues/16637
> >
> > ~~
> >
> > Feel free to comment, thank you!
> >
> > Yu
> >
> > On Tue, Sep 6, 2022 at 10:57 AM Yunze Xu 
> > wrote:
> >
> >> Hi all,
> >>
> >> I'm working on 2.8.4 release recently. When I followed the release
> >> process, I found many steps are outdated so that I turned to the
> >> previous release managers for help frequently. Since the release
> >> process is now in the codebase [1], I opened a PR for some
> >> improvements on it. [2]
> >>
> >> PTAL especially if you have been a release manager before.
> >>
> >> [1] https://lists.apache.org/thread/mv1to079cznkxdldrpoq5518l2ozl5kr
> >> [2] https://github.com/apache/pulsar/pull/17470
> >>
> >> Thanks,
> >> Yunze
> >>
> >>
> >>
> >>
> >>
>
>


[DISCUSS] Pulsar Node.js Client Release 1.7.0

2022-09-06 Thread Yuto Furuta
Hello everyone,

I'd like to propose to release Pulsar Node.js Client Release 1.7.0

Currently, we have 14 commits [0] and there are bug fixes, security fixes and 
performance improvements.
And there are open PRs [1].

If you have any important fixes or any questions, please reply to this email, 
we will evaluate whether to include it in 1.7.0

[0]
https://github.com/apache/pulsar-client-node/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2Fv1.7.0

[1]
https://github.com/apache/pulsar-client-node/pulls?q=is%3Aopen


Best Regards,

Yuto Furuta
Yahoo Japan Corp.
E-mail: yfur...@yahoo-corp.jp


[GitHub] [pulsar-test-infra] nodece commented on pull request #69: Refactor docbot and add tests

2022-09-06 Thread GitBox


nodece commented on PR #69:
URL: https://github.com/apache/pulsar-test-infra/pull/69#issuecomment-1237888481

   I can split this PR because this PR is complex for reviewer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] Call to improve Pulsar contributor's experience

2022-09-06 Thread Lari Hotari
Hi Yu, 

> For these special doc PRs, can we set them to run only build (compile)
> tests and skip other code tests?

If code is updated, the build & tests shouldn't be skipped in the current 
solution.

If we'd like to make the build faster and skip unnecessary tests, we would need 
a more advanced build solution such as Gradle with build caching for sharing 
results between CI builds. [1]

-Lari

[1] Gradle Build cache use cases - Share results between CI builds: 
https://docs.gradle.org/current/userguide/build_cache_use_cases.html#share_results_between_ci_builds


On 2022/09/02 09:02:54 Yu wrote:
> Thank you Lari!
> 
> Except for updating .md docs, we need to update various API docs [1]. These
> docs are generated from code automatically (annotations in code files).
> 
> For these special doc PRs, can we set them to run only build (compile)
> tests and skip other code tests?
> 
> ~~
> 
> [1]
> 
> Admin API docs:
> - pulsar-admin: update
> https://github.com/apache/pulsar/tree/master/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli
> - REST API: update
> https://github.com/apache/pulsar/tree/master/pulsar-broker/src/main/java/org/apache/pulsar
> - Java admin API: update
> https://github.com/apache/pulsar/tree/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin
> 
> 
> Client API docs:
> - Java: update
> https://github.com/apache/pulsar/tree/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api
> - CPP: update
> https://github.com/apache/pulsar/tree/master/pulsar-client-cpp/include/pulsar
> - Python: have no idea
> 
> ~~
> 
> Thank you!
> 
> Yu
> 
> 
> On Fri, Sep 2, 2022 at 1:17 AM Lari Hotari  wrote:
> 
> > On 2022/09/01 08:36:11 Yu wrote:
> > > # 1
> > > For pure doc PRs (only update .md files), do they run the same tests as
> > > code PRs?
> > > If so, can we set them to run only doc-related tests and skip code tests
> > > (since they're easily failed)?
> > > In this way, docs can be iterated faster.
> >
> > The solution is already in place where the CI pipeline for docs is
> > expedited.
> > Some technical details about the solution: All builds steps in the GitHub
> > Actions workflow build jobs are skipped for PRs that include changes only
> > to docs. The reason why the workflows and build jobs aren't completely
> > skipped is that we use the "required checks" feature and it is necessary to
> > run all required checks also for PRs with only doc changes.
> >
> > >
> > > 
> > >
> > > # 2
> > > Does it make sense to add instructions for tests to the Pulsar
> > Contribution
> > > Guide?
> > >
> > > For example,
> > >
> > > * For users:
> > > - How to resolve test issues (common test failure reasons and solutions)
> > > - Who can ask for help if users are blocked and can not resolve problems
> > > themselves
> > > - How to report test bugs
> > >
> > > * For developers:
> > > - How do tests work? (mechanism, Apache rules, etc)
> > > - How can I add/update tests? (quotas [1], limitations, notes, etc)
> >
> > Good suggestions.
> >
> > In general, I hope we find better ways to listen to the voice of our
> > contributors. What is their contribution experience? How did they feel
> > about it?
> >
> > Perhaps we could decide to conduct surveys? GitHub discussions has support
> > for polls [1] so that is one option as a technical solution for asking for
> > feedback in a way where there would be a low barrier to respond.
> >
> > -Lari
> >
> > [1] https://github.blog/changelog/2022-04-12-discussions-polls/
> >
> 


Zookeeper exception handler in Pulsar

2022-09-06 Thread Yong Zhang
Hi all,

I saw in the Pulsar Metadata handler, we retry the operation when zookeeper
throws a connection loss exception. But the operation may fail after the
retry.

For example, we update the ledgers map in memory after successfully
updating the LedgerInfo in the zookeeper. If the zookeeper update operation
execute successfully on the server but throws connection loss on the
client, and
we have to retry on the connection loss exception, then the callback may
be received
a BadVersion exception. At this moment, the memory ledgers list is
different from
the zookeeper server. And that may cause some other issues on the broker.

We need to do some work on the metastore and managed ledger to keep the
consistency between them. But that would change most of the callback of the
meta store to handle it.

I want to know more ideas from yours. WDYT?

Regards,
Yong


Re: [DISCUSS] User-friendly acknowledgeCumulative API on a partitioned topic or multi-topics

2022-09-06 Thread Shivji Kumar Jha
++ Tarun

Hi Yunze,

We would love to have this.

```java
// the key is the partitioned topic name like my-topic-partition-0
void acknowledgeCumulative(Map topicToMessageId);
```

If you are busy with other things, do you mind Tarun taking this up ? Happy
to have you as a reviewer.

Regards,
Shivji Kumar Jha
http://www.shivjijha.com/
+91 8884075512


On Sun, 4 Sept 2022 at 21:25, Yunze Xu  wrote:

> I am busy on other things recently so there is no further update. But
> I found there is already two methods to acknowledge multiple messages
> in Java client.
>
> ```java
> void acknowledge(Messages messages) throws PulsarClientException;
>
> void acknowledge(List messageIdList) throws
> PulsarClientException;
> ```
>
> And here is the issue to track the catch up:
>
> https://github.com/apache/pulsar/issues/17428
>
> Yunze
>
>
>
>
> > On Sep 4, 2022, at 22:37, Asaf Mesika  wrote:
> >
> > What eventually happened with this idea?
> >
> > On Fri, Jul 29, 2022 at 8:02 AM PengHui Li 
> wrote:
> >
> >> +1
> >>
> >> Penghui
> >> On Jul 28, 2022, 20:14 +0800, lordcheng10 <1572139...@qq.com.invalid>,
> >> wrote:
> >>> Nice feature!
> >>>
> >>>
> >>>
> >>>
> >>> -- Original --
> >>> From: "Yunze Xu" >>> Date: 2022Äê7ÔÂ15ÈÕ(ÐÇÆÚÎå) ÍíÉÏ6:04
> >>> To: "dev" >>> Subject: [DISCUSS] User-friendly acknowledgeCumulative API on a
> >> partitioned topic or multi-topics
> >>>
> >>>
> >>>
> >>> Hi all,
> >>>
> >>> Long days ago I opened a PR to support cumulative acknowledgement
> >>> for C++ client, but it's controversial about whether should a
> >>> partitioned consumer acknowledge a message ID cumulatively.
> >>>
> >>> See https://github.com/apache/pulsar/pull/6796 for discussion.
> >>>
> >>> Currently, the Java client acknowledges the specific partition of the
> >>> message ID, while the C++ client just fails when calling
> >>> `acknowledgeCumulative` on a partitioned topic. However, even if the
> >>> Java client doesn't fail, it's not user friendly.
> >>>
> >>> Assuming users called `acknowledgeCumulative` periodically, there is a
> >>> chance that some messages of the specific partition has never been
> >>> passed to the method.
> >>>
> >>> For example, a consumer received:
> >>>
> >>> P0-M0, P1-M0, P0-M1, P1-M1, P0-M2, P1-M2...
> >>>
> >>> And the user acknowledged every two messages, i.e.
> >>>
> >>> P0-M0, P0-M1, P0-M2
> >>>
> >>> Eventually, partition 1 has never been acknowledged.
> >>>
> >>> User must maintain its own `Map >>> partitioned topic or multi-topics consumer with the existing
> >>> `acknowledgeCumulative` API.
> >>>
> >>> Should we make it more friendly for users? For example, we can make
> >>> `acknowledgeCumulative` accept the map to remind users to maintain
> >>> the map from topic name to message ID:
> >>>
> >>> ```java
> >>> // the key is the partitioned topic name like my-topic-partition-0
> >>> void acknowledgeCumulative(Map >>> ```
> >>>
> >>> For those who don't want to maintain the map by themselves, maybe we
> >>> can provide a simpler API like:
> >>>
> >>> ```java
> >>> // acknowlegde all latest received messages
> >>> void acknowledgeCumulative();
> >>> ```
> >>>
> >>> and provide an option to enable this behavior.
> >>>
> >>> Do you have any suggestion on this idea? I will prepare a proposal if
> >>> there is no disagreement.
> >>>
> >>> Thanks,
> >>> Yunze
> >>
>
>


Re: [DISCUSS] Call to improve Pulsar contributor's experience

2022-09-06 Thread Christophe Bornet
Hi Lari,

Thanks for launching this discussion. There's a lot to do to improve
contributor experience, esp for newcomers I think.
One of the pain points is obviously the CI flakyness and I believe we are
all aware of that.
Related to that, we have the pulsarbot to relaunch the tests that failed
but:
* when you're new to Pulsar, you don't even know it exists
* when you know it exists, you never (I never) remember what the command
is. So you try to find an issue somewhere that has it in its comments and
you copy/paste. But the comment has the wrong syntax and you copy/paste
"/pulsar-bot rerun-failure-checks" (notice the -). And you don't understand
why the CI isn't relaunched. You feel miserable and want to rage quit.
* the command only works when the workflow is

Some things that could be done:
* Describe the pulsarbot commands somewhere to make it visible and a ref to
copy/paste. In the README, but also in the PR template bc that's something
you read carefully on your first PR.
* Indicate that you can save replies in GitHub. Very nice tips I got from
Nicolo (thanks!).
* Have the pulsarbot comment that it received the command and will relaunch
the CI
* Accept /pulsar-bot command as it's a common typo.

Best regards.

Christophe

Le jeu. 1 sept. 2022 à 10:11, Lari Hotari  a écrit :

> Hi,
>
> I think that we would need to improve the experience for contributors.
> It's currently a big challenge to get a PR to the state where tests pass,
> mainly because of the large amount of flaky tests and frequent congestions
> in Pulsar CI. We don't tell this to the contributors in the PR template [1]
> or the contributors guide [2] and finding this out without anyone
> explaining could be a frustrating experience.
>
> Let's improve our contributor experience. "The hard part isn't solving the
> problems, it's identifying the right problems to solve." [3]
>
> Would someone like to share their Pulsar contribution experience and what
> you think that needs improvement? What was painful?
>
> -Lari
>
> [1]
> https://raw.githubusercontent.com/apache/pulsar/master/.github/PULL_REQUEST_TEMPLATE.md
> [2] https://pulsar.apache.org/contributing/
> [3] https://www.youtube.com/watch?v=qqaOpSJKdWc ,
> https://leanpub.com/ideaflow . Janelle Arty Starr: "Idea Flow: How to
> Measure the PAIN in Software Development"
>


Re: Pulsar CI congested, master branch build broken

2022-09-06 Thread Lari Hotari
Pulsar CI continues to be congested, and the build queue [1] is very long at 
the moment. There are 147 build jobs in the queue and 16 jobs in progress at 
the moment.

I would strongly advice everyone to use "personal CI" to mitigate the issue of 
the long delay of CI feedback. You can simply open a PR to your own personal 
fork of apache/pulsar to run the builds in your "personal CI". There's more 
details in the previous emails in this thread.

-Lari

[1] - build queue: https://github.com/apache/pulsar/actions?query=is%3Aqueued

On 2022/08/30 12:39:19 Lari Hotari wrote:
> Pulsar CI continues to be congested, and the build queue is long.
> 
> I would strongly advice everyone to use "personal CI" to mitigate the issue 
> of the long delay of CI feedback. You can simply open a PR to your own 
> personal fork of apache/pulsar to run the builds in your "personal CI". 
> There's more details in the previous email in this thread.
> 
> Some updates:
> 
> There has been a discussion with Gavin McDonald from ASF infra on the-asf 
> slack about getting usage reports from GitHub to support the investigation. 
> Slack thread is the same one mentioned in the previous email, 
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . Gavin 
> already requested the usage report in GitHub UI, but it produced invalid 
> results.
> 
> I made a change to mitigate a source of additional GitHub Actions overhead. 
> In the past, each cherry-picked commit to a maintenance branch of Pulsar has 
> triggered a lot of workflow runs. 
> 
> The solution for cancelling duplicate builds automatically is to add this 
> definition to the workflow definition:
> concurrency:
>   group: ${{ github.workflow }}-${{ github.ref }}
>   cancel-in-progress: true
> 
> I added this to all maintenance branch GitHub Actions workflows:
> 
> branch-2.10 change:
> https://github.com/apache/pulsar/commit/5d2c9851f4f4d70bfe74b1e683a41c5a040a6ca7
> branch-2.9 change:
> https://github.com/apache/pulsar/commit/3ea124924fecf636cc105de75c62b3a99050847b
> branch-2.8 change:
> https://github.com/apache/pulsar/commit/48187bb5d95e581f8322a019b61d986e18a31e54
> branch-2.7:
> https://github.com/apache/pulsar/commit/744b62c99344724eacdbe97c881311869d67f630
> 
> branch-2.11 already contains the necessary config for cancelling duplicate 
> builds.
> 
> The benefit of the above change is that when multiple commits are 
> cherry-picked to a branch at once, only the build of the last commit will get 
> run eventually. The builds for the intermediate commits will get cancelled. 
> Obviously there's a tradeoff here that we don't get the information if one of 
> the earlier commits breaks the build. It's the cost that we need to pay. 
> Nevertheless our build is so flaky that it's hard to determine whether a 
> failed build result is only caused by bad flaky test or whether it's an 
> actual failure. Because of this we don't lose anything by cancelling builds. 
> It's more important to save build resources. In the maintenance branches for 
> 2.10 and older, the average total build time consumed is around 20 hours 
> which is a lot.
> 
> At this time, the overhead of maintenance branch builds doesn't seem to be 
> the source of the problems. There must be some other issue which is possibly 
> related to exceeding a usage quota. Hopefully we get the CI slowness issue 
> solved asap.
> 
> BR,
> 
> Lari
> 
> 
> On 2022/08/26 12:00:20 Lari Hotari wrote:
> > Hi,
> > 
> > GitHub Actions builds have been piling up in the build queue in the last 
> > few days.
> > I posted on bui...@apache.org 
> > https://lists.apache.org/thread/6lbqr0f6mqt9s8ggollp5kj2nv7rlo9s and 
> > created INFRA ticket https://issues.apache.org/jira/browse/INFRA-23633 
> > about this issue.
> > There's also a thread on the-asf slack, 
> > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . 
> > 
> > It seems that our build queue is finally getting picked up, but it would be 
> > great to see if we hit quota and whether that is the cause of pauses. 
> > 
> > Another issue is that the master branch broke after merging 2 conflicting 
> > PRs. 
> > The fix is in https://github.com/apache/pulsar/pull/17300 . 
> > 
> > Merging PRs will be slow until we have these 2 problems solved and existing 
> > PRs rebased over the changes. Let's prioritize merging #17300 before 
> > pushing more changes.
> > 
> > I'd like to point out that a good way to get build feedback before sending 
> > a PR, is to run builds on your personal GitHub Actions CI. The benefit of 
> > this is that it doesn't consume the shared quota and builds usually start 
> > instantly.
> > There are instructions in the contributors guide about this. 
> > https://pulsar.apache.org/contributing/#ci-testing-in-your-fork
> > You simply open PRs to your own fork of apache/pulsar to run builds on your 
> > personal GitHub Actions CI.
> > 
> > BR,
> > 
> > Lari
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> 


Re: Pulsar CI congested, master branch build broken

2022-09-06 Thread Lari Hotari
I asked for an update on the Apache org GitHub Actions usage stats from Gavin 
McDonald on the-asf slack in this thread: 
https://the-asf.slack.com/archives/CBX4TSBQ8/p1662464113873539?thread_ts=1661512133.913279&cid=CBX4TSBQ8
 .

I hope we get this issue resolved since it delays PR processing a lot.

-Lari

On 2022/09/06 11:16:07 Lari Hotari wrote:
> Pulsar CI continues to be congested, and the build queue [1] is very long at 
> the moment. There are 147 build jobs in the queue and 16 jobs in progress at 
> the moment.
> 
> I would strongly advice everyone to use "personal CI" to mitigate the issue 
> of the long delay of CI feedback. You can simply open a PR to your own 
> personal fork of apache/pulsar to run the builds in your "personal CI". 
> There's more details in the previous emails in this thread.
> 
> -Lari
> 
> [1] - build queue: https://github.com/apache/pulsar/actions?query=is%3Aqueued
> 
> On 2022/08/30 12:39:19 Lari Hotari wrote:
> > Pulsar CI continues to be congested, and the build queue is long.
> > 
> > I would strongly advice everyone to use "personal CI" to mitigate the issue 
> > of the long delay of CI feedback. You can simply open a PR to your own 
> > personal fork of apache/pulsar to run the builds in your "personal CI". 
> > There's more details in the previous email in this thread.
> > 
> > Some updates:
> > 
> > There has been a discussion with Gavin McDonald from ASF infra on the-asf 
> > slack about getting usage reports from GitHub to support the investigation. 
> > Slack thread is the same one mentioned in the previous email, 
> > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . Gavin 
> > already requested the usage report in GitHub UI, but it produced invalid 
> > results.
> > 
> > I made a change to mitigate a source of additional GitHub Actions overhead. 
> > In the past, each cherry-picked commit to a maintenance branch of Pulsar 
> > has triggered a lot of workflow runs. 
> > 
> > The solution for cancelling duplicate builds automatically is to add this 
> > definition to the workflow definition:
> > concurrency:
> >   group: ${{ github.workflow }}-${{ github.ref }}
> >   cancel-in-progress: true
> > 
> > I added this to all maintenance branch GitHub Actions workflows:
> > 
> > branch-2.10 change:
> > https://github.com/apache/pulsar/commit/5d2c9851f4f4d70bfe74b1e683a41c5a040a6ca7
> > branch-2.9 change:
> > https://github.com/apache/pulsar/commit/3ea124924fecf636cc105de75c62b3a99050847b
> > branch-2.8 change:
> > https://github.com/apache/pulsar/commit/48187bb5d95e581f8322a019b61d986e18a31e54
> > branch-2.7:
> > https://github.com/apache/pulsar/commit/744b62c99344724eacdbe97c881311869d67f630
> > 
> > branch-2.11 already contains the necessary config for cancelling duplicate 
> > builds.
> > 
> > The benefit of the above change is that when multiple commits are 
> > cherry-picked to a branch at once, only the build of the last commit will 
> > get run eventually. The builds for the intermediate commits will get 
> > cancelled. Obviously there's a tradeoff here that we don't get the 
> > information if one of the earlier commits breaks the build. It's the cost 
> > that we need to pay. Nevertheless our build is so flaky that it's hard to 
> > determine whether a failed build result is only caused by bad flaky test or 
> > whether it's an actual failure. Because of this we don't lose anything by 
> > cancelling builds. It's more important to save build resources. In the 
> > maintenance branches for 2.10 and older, the average total build time 
> > consumed is around 20 hours which is a lot.
> > 
> > At this time, the overhead of maintenance branch builds doesn't seem to be 
> > the source of the problems. There must be some other issue which is 
> > possibly related to exceeding a usage quota. Hopefully we get the CI 
> > slowness issue solved asap.
> > 
> > BR,
> > 
> > Lari
> > 
> > 
> > On 2022/08/26 12:00:20 Lari Hotari wrote:
> > > Hi,
> > > 
> > > GitHub Actions builds have been piling up in the build queue in the last 
> > > few days.
> > > I posted on bui...@apache.org 
> > > https://lists.apache.org/thread/6lbqr0f6mqt9s8ggollp5kj2nv7rlo9s and 
> > > created INFRA ticket https://issues.apache.org/jira/browse/INFRA-23633 
> > > about this issue.
> > > There's also a thread on the-asf slack, 
> > > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . 
> > > 
> > > It seems that our build queue is finally getting picked up, but it would 
> > > be great to see if we hit quota and whether that is the cause of pauses. 
> > > 
> > > Another issue is that the master branch broke after merging 2 conflicting 
> > > PRs. 
> > > The fix is in https://github.com/apache/pulsar/pull/17300 . 
> > > 
> > > Merging PRs will be slow until we have these 2 problems solved and 
> > > existing PRs rebased over the changes. Let's prioritize merging #17300 
> > > before pushing more changes.
> > > 
> > > I'd like to point out that a good way to get build fee

Re: Pulsar CI congested, master branch build broken

2022-09-06 Thread Lari Hotari
The Apache Infra ticket is https://issues.apache.org/jira/browse/INFRA-23633 . 

-Lari

On 2022/09/06 11:36:46 Lari Hotari wrote:
> I asked for an update on the Apache org GitHub Actions usage stats from Gavin 
> McDonald on the-asf slack in this thread: 
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1662464113873539?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>  .
> 
> I hope we get this issue resolved since it delays PR processing a lot.
> 
> -Lari
> 
> On 2022/09/06 11:16:07 Lari Hotari wrote:
> > Pulsar CI continues to be congested, and the build queue [1] is very long 
> > at the moment. There are 147 build jobs in the queue and 16 jobs in 
> > progress at the moment.
> > 
> > I would strongly advice everyone to use "personal CI" to mitigate the issue 
> > of the long delay of CI feedback. You can simply open a PR to your own 
> > personal fork of apache/pulsar to run the builds in your "personal CI". 
> > There's more details in the previous emails in this thread.
> > 
> > -Lari
> > 
> > [1] - build queue: 
> > https://github.com/apache/pulsar/actions?query=is%3Aqueued
> > 
> > On 2022/08/30 12:39:19 Lari Hotari wrote:
> > > Pulsar CI continues to be congested, and the build queue is long.
> > > 
> > > I would strongly advice everyone to use "personal CI" to mitigate the 
> > > issue of the long delay of CI feedback. You can simply open a PR to your 
> > > own personal fork of apache/pulsar to run the builds in your "personal 
> > > CI". There's more details in the previous email in this thread.
> > > 
> > > Some updates:
> > > 
> > > There has been a discussion with Gavin McDonald from ASF infra on the-asf 
> > > slack about getting usage reports from GitHub to support the 
> > > investigation. Slack thread is the same one mentioned in the previous 
> > > email, https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . 
> > > Gavin already requested the usage report in GitHub UI, but it produced 
> > > invalid results.
> > > 
> > > I made a change to mitigate a source of additional GitHub Actions 
> > > overhead. 
> > > In the past, each cherry-picked commit to a maintenance branch of Pulsar 
> > > has triggered a lot of workflow runs. 
> > > 
> > > The solution for cancelling duplicate builds automatically is to add this 
> > > definition to the workflow definition:
> > > concurrency:
> > >   group: ${{ github.workflow }}-${{ github.ref }}
> > >   cancel-in-progress: true
> > > 
> > > I added this to all maintenance branch GitHub Actions workflows:
> > > 
> > > branch-2.10 change:
> > > https://github.com/apache/pulsar/commit/5d2c9851f4f4d70bfe74b1e683a41c5a040a6ca7
> > > branch-2.9 change:
> > > https://github.com/apache/pulsar/commit/3ea124924fecf636cc105de75c62b3a99050847b
> > > branch-2.8 change:
> > > https://github.com/apache/pulsar/commit/48187bb5d95e581f8322a019b61d986e18a31e54
> > > branch-2.7:
> > > https://github.com/apache/pulsar/commit/744b62c99344724eacdbe97c881311869d67f630
> > > 
> > > branch-2.11 already contains the necessary config for cancelling 
> > > duplicate builds.
> > > 
> > > The benefit of the above change is that when multiple commits are 
> > > cherry-picked to a branch at once, only the build of the last commit will 
> > > get run eventually. The builds for the intermediate commits will get 
> > > cancelled. Obviously there's a tradeoff here that we don't get the 
> > > information if one of the earlier commits breaks the build. It's the cost 
> > > that we need to pay. Nevertheless our build is so flaky that it's hard to 
> > > determine whether a failed build result is only caused by bad flaky test 
> > > or whether it's an actual failure. Because of this we don't lose anything 
> > > by cancelling builds. It's more important to save build resources. In the 
> > > maintenance branches for 2.10 and older, the average total build time 
> > > consumed is around 20 hours which is a lot.
> > > 
> > > At this time, the overhead of maintenance branch builds doesn't seem to 
> > > be the source of the problems. There must be some other issue which is 
> > > possibly related to exceeding a usage quota. Hopefully we get the CI 
> > > slowness issue solved asap.
> > > 
> > > BR,
> > > 
> > > Lari
> > > 
> > > 
> > > On 2022/08/26 12:00:20 Lari Hotari wrote:
> > > > Hi,
> > > > 
> > > > GitHub Actions builds have been piling up in the build queue in the 
> > > > last few days.
> > > > I posted on bui...@apache.org 
> > > > https://lists.apache.org/thread/6lbqr0f6mqt9s8ggollp5kj2nv7rlo9s and 
> > > > created INFRA ticket https://issues.apache.org/jira/browse/INFRA-23633 
> > > > about this issue.
> > > > There's also a thread on the-asf slack, 
> > > > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . 
> > > > 
> > > > It seems that our build queue is finally getting picked up, but it 
> > > > would be great to see if we hit quota and whether that is the cause of 
> > > > pauses. 
> > > > 
> > > > Another issue is that the master branch broke after merging 

[ANNOUNCE] Apache Pulsar 2.7.5 released

2022-09-06 Thread Haiting Jiang
The Apache Pulsar team is proud to announce Apache Pulsar version 2.7.5.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management for
subscribers, and cross-datacenter replication.

For Pulsar release details and downloads, visit:

https://pulsar.apache.org/download

Release Notes are at:
https://pulsar.apache.org/release-notes

We would like to thank the contributors that made the release possible.

Regards,

The Pulsar Team


Re: Pulsar CI congested, master branch build broken

2022-09-06 Thread Dave Fisher
We are going to need to take actions to fix our problems. See 
https://issues.apache.org/jira/browse/INFRA-23633?focusedCommentId=17600749&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17600749

Jarek has done a large amount of GitHub Action work with Apache Airflow and his 
suggestions might be helpful. One of his suggestions was Apache Yetus. I think 
he means using the Maven plugins - 
https://yetus.apache.org/documentation/0.14.0/yetus-maven-plugin/


> On Sep 6, 2022, at 4:48 AM, Lari Hotari  wrote:
> 
> The Apache Infra ticket is https://issues.apache.org/jira/browse/INFRA-23633 
> . 
> 
> -Lari
> 
> On 2022/09/06 11:36:46 Lari Hotari wrote:
>> I asked for an update on the Apache org GitHub Actions usage stats from 
>> Gavin McDonald on the-asf slack in this thread: 
>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1662464113873539?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>  .
>> 
>> I hope we get this issue resolved since it delays PR processing a lot.
>> 
>> -Lari
>> 
>> On 2022/09/06 11:16:07 Lari Hotari wrote:
>>> Pulsar CI continues to be congested, and the build queue [1] is very long 
>>> at the moment. There are 147 build jobs in the queue and 16 jobs in 
>>> progress at the moment.
>>> 
>>> I would strongly advice everyone to use "personal CI" to mitigate the issue 
>>> of the long delay of CI feedback. You can simply open a PR to your own 
>>> personal fork of apache/pulsar to run the builds in your "personal CI". 
>>> There's more details in the previous emails in this thread.
>>> 
>>> -Lari
>>> 
>>> [1] - build queue: 
>>> https://github.com/apache/pulsar/actions?query=is%3Aqueued
>>> 
>>> On 2022/08/30 12:39:19 Lari Hotari wrote:
 Pulsar CI continues to be congested, and the build queue is long.
 
 I would strongly advice everyone to use "personal CI" to mitigate the 
 issue of the long delay of CI feedback. You can simply open a PR to your 
 own personal fork of apache/pulsar to run the builds in your "personal 
 CI". There's more details in the previous email in this thread.
 
 Some updates:
 
 There has been a discussion with Gavin McDonald from ASF infra on the-asf 
 slack about getting usage reports from GitHub to support the 
 investigation. Slack thread is the same one mentioned in the previous 
 email, https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . 
 Gavin already requested the usage report in GitHub UI, but it produced 
 invalid results.
 
 I made a change to mitigate a source of additional GitHub Actions 
 overhead. 
 In the past, each cherry-picked commit to a maintenance branch of Pulsar 
 has triggered a lot of workflow runs. 
 
 The solution for cancelling duplicate builds automatically is to add this 
 definition to the workflow definition:
 concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
 
 I added this to all maintenance branch GitHub Actions workflows:
 
 branch-2.10 change:
 https://github.com/apache/pulsar/commit/5d2c9851f4f4d70bfe74b1e683a41c5a040a6ca7
 branch-2.9 change:
 https://github.com/apache/pulsar/commit/3ea124924fecf636cc105de75c62b3a99050847b
 branch-2.8 change:
 https://github.com/apache/pulsar/commit/48187bb5d95e581f8322a019b61d986e18a31e54
 branch-2.7:
 https://github.com/apache/pulsar/commit/744b62c99344724eacdbe97c881311869d67f630
 
 branch-2.11 already contains the necessary config for cancelling duplicate 
 builds.
 
 The benefit of the above change is that when multiple commits are 
 cherry-picked to a branch at once, only the build of the last commit will 
 get run eventually. The builds for the intermediate commits will get 
 cancelled. Obviously there's a tradeoff here that we don't get the 
 information if one of the earlier commits breaks the build. It's the cost 
 that we need to pay. Nevertheless our build is so flaky that it's hard to 
 determine whether a failed build result is only caused by bad flaky test 
 or whether it's an actual failure. Because of this we don't lose anything 
 by cancelling builds. It's more important to save build resources. In the 
 maintenance branches for 2.10 and older, the average total build time 
 consumed is around 20 hours which is a lot.
 
 At this time, the overhead of maintenance branch builds doesn't seem to be 
 the source of the problems. There must be some other issue which is 
 possibly related to exceeding a usage quota. Hopefully we get the CI 
 slowness issue solved asap.
 
 BR,
 
 Lari
 
 
 On 2022/08/26 12:00:20 Lari Hotari wrote:
> Hi,
> 
> GitHub Actions builds have been piling up in the build queue in the last 
> few days.
> I posted on bui...@apache.org 
> https://lists.apache.org/thread/6lbqr0f6mqt9s8ggollp5kj2nv7rlo9s a

[GitHub] [pulsar-test-infra] lhotari merged pull request #69: Refactor docbot and add tests

2022-09-06 Thread GitBox


lhotari merged PR #69:
URL: https://github.com/apache/pulsar-test-infra/pull/69


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] lhotari closed issue #59: [docbot] Cannot to label for PR correctly

2022-09-06 Thread GitBox


lhotari closed issue #59: [docbot] Cannot to label for PR correctly
URL: https://github.com/apache/pulsar-test-infra/issues/59


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar] momo-jun added a comment to the discussion: [Design] Pulsar All Releases Page

2022-09-06 Thread GitBox


GitHub user momo-jun added a comment to the discussion: [Design] Pulsar All 
Releases Page

Cross-post the 
[comment](https://github.com/apache/pulsar/issues/16938#issuecomment-1214783189)
 here since the Versions page will be deprecated, and the explanation for 
versioning change can be added to this new page.

GitHub link: 
https://github.com/apache/pulsar/discussions/17310#discussioncomment-3580826


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [DISCUSSION] PiP196 TransactionBuffer Multiple-snapshots

2022-09-06 Thread 丛搏
Hi Xiangying

I think this is a very good optimization solution, it solves the
problem that users have a lot of aborted transactions.

Thanks!
Bo

Yubiao Feng  于2022年8月16日周二 11:33写道:
>
> Hi Xiangying
>
> >> Can the sequence id generation strategy be added to the proposal?
>
> >  think it's an implementation detail that shouldn't be exposed to the user
> at all.
>
> OK.
>
> Thanks
> Yubiao Feng
>
> On Mon, Aug 15, 2022 at 8:11 PM Xiangying Meng  wrote:
>
> > Hi, yubiao,
> > I think it's an implementation detail that shouldn't be exposed to the user
> > at all.
> > Yours sincerely,
> > Xiangying Meng
> >
> > On Mon, Aug 15, 2022 at 8:02 PM Yubiao Feng
> >  wrote:
> >
> > > Hi Xiangying
> > >
> > > Thank you for your reply. Sorry, I have one more question:
> > >
> > > > If these operations are failed at operation 2, the old snapshots will
> > be
> > > covered by the new large snapshot when compact due to they have the same
> > > sequence ID.
> > >
> > > Can the sequence id generation strategy be added to the doc?
> > >
> > > On Mon, Aug 15, 2022 at 6:35 PM Xiangying Meng 
> > > wrote:
> > >
> > > > Hi, yubiao,
> > > > First of all, thanks for the attention and questions. Then for your
> > three
> > > > questions:
> > > > 1.
> > > >  > Does the merge take place in memory or in BK?
> > > > The snapshot will merge in BK. For specific details, you can see
> > detailed
> > > > instructions in the* ### Merge snapshot section.*
> > > > 2.
> > > > >How do we ensure the atomicity of the two writes, I suggest adding a
> > > check
> > > > We do not guarantee their atomicity. The position of the snapshot is
> > > > generally unchanged, so the previous index is also valid. If the index
> > > > write fails after a snapshot is written, the final result is that the
> > > > snapshot write fails this time. There will be no other worse results,
> > and
> > > > no dirty data will be introduced due to compression.
> > > > 3.
> > > > >Clean up unused aborts data
> > > > Snapshot cleanup can be found in *take snapshot # How*.
> > > > The cleanup of the index is done automatically by the compressor. I
> > will
> > > > add it at *### Snapshot index topic.*
> > > >
> > > > yours sincerely,
> > > > Xiangying Meng
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Aug 15, 2022 at 3:56 PM Yubiao Feng
> > > >  wrote:
> > > >
> > > > > Hi Xiangying
> > > > >
> > > > > I think Multiple-snapshots for TB is a good idea. And I have these
> > > > > questions:
> > > > >
> > > > >
> > > > > > The number of the transactions in a snapshot can be configured, and
> > > we
> > > > > hope it is small, then we can merge the small snapshots into a large
> > > > > snapshot when it reaches a configured number.
> > > > >
> > > > > Does the merge take place in memory or in BK?
> > > > >
> > > > > - If we merge small-snapshot in memory, can we just use
> > large-snapshot?
> > > > > - If we merge small-snapshot in BK, how to do it?
> > > > >
> > > > >
> > > > >
> > > > > > The index is written after each multiple-snapshot is written.
> > > > >
> > > > > Snapshot and index are stored in different topics, right?
> > > > >
> > > > > How do we ensure the atomicity of the two writes, I suggest adding a
> > > > check
> > > > > mechanism that snapshot not recorded in the index is invalid.
> > > > >
> > > > >
> > > > >
> > > > > >  Clean up unused aborts data
> > > > >
> > > > > Now, this section only has instructions for clear snapshots.
> > > > > I think we should add this: how to delete/override the index data.
> > > > >
> > > > > Thanks
> > > > > Yubiao Feng
> > > > >
> > > > > On Thu, Aug 4, 2022 at 10:27 AM Xiangying Meng  > >
> > > > > wrote:
> > > > >
> > > > > > Hi, Pulsar community,
> > > > > > I`d like to start a discussion about transaction multiple-snapshot.
> > > > > > In order to get rid of the capacity limitation of the bookkeeper
> > > entry,
> > > > > we
> > > > > > plan to use multiple snapshots. More details can be found here
> > > > > > .
> > > > > >
> > > > > > Yours sincerely,
> > > > > > Xiangying Meng
> > > > > >
> > > > >
> > > >
> > >
> >


Re: [DISCUSS] Remove timestamp from Prometheus metrics

2022-09-06 Thread Michael Marshall
Merged. Depending on whether [0] will be cherry picked to release
branches, I will cherry pick [1] or [2] to all active release
branches.

Thanks,
Michael

[0] https://github.com/apache/pulsar/pull/15558
[1] https://github.com/apache/pulsar/pull/17419
[2] 
https://github.com/apache/pulsar/commit/b5cb02deb06760a2b6fe7b6c221e08acfabdf830

On Thu, Sep 1, 2022 at 11:05 PM Michael Marshall  wrote:
>
> Hi Pulsar Community,
>
> Recently, we noticed in certain Grafana metrics from the broker that
> it appeared a topic had metrics reported by two different brokers at
> the same time.
>
> It turns out that the root of the problem is a concept called
> "staleness" in prometheus and it is directly related to the fact that
> we export timestamps with our metrics.
>
> As such, I wrote a PR to remove these timestamps [0]. In it, I propose
> that we remove the timestamps and cherry pick this fix to all active
> branches of Pulsar. The PR has more detail, so please see it if you're
> interested.
>
> If removing these timestamps will break your use case, please let me
> know. By my reading, we do not qualify as an application that needs to
> report timestamps. Additionally, I tried to make it configurable, but
> many of these classes are static, so it would be non-trivial to make
> the behavior configurable.
>
> Thanks,
> Michael
>
> [0] https://github.com/apache/pulsar/pull/17419


Re: [DISCUSS] ARM Support for Pulsar 2.11 Docker Image

2022-09-06 Thread Michael Marshall
I see there was some recent activity on
https://github.com/apache/pulsar/issues/12944, but I don't know of any
other progress.

I won't be able to work on this in the near future, but I'll be happy
to help review any PRs related to it.

Thanks,
Michael

On Mon, Sep 5, 2022 at 8:05 AM Asaf Mesika  wrote:
>
> What ended up for this?
>
> On Tue, Aug 9, 2022 at 11:11 AM Alexander Preuss
>  wrote:
>
> > Hi Michael,
> >
> > Thank you for bringing up this topic.
> > I was just running into an issue that prevented me from using the standard
> > Pulsar image in Testcontainers and found this discussion.
> >
> > In my opinion, refactoring the docker builds to allow us to use the ASF
> > infra is a great idea.
> > I'm also looping in Kay, as she might be able to provide more insights.
> >
> > Best,
> > Alex
> >
> > On 2022/07/09 07:18:31 Michael Marshall wrote:
> > > Hi Pulsar Community,
> > >
> > > I would like to see the 2.11 docker image ship with support to run on
> > > ARM architecture. The issue asking for this feature [0] has had a lot
> > > of traction.
> > >
> > > The Bookkeeper 4.15 upgrade was the last blocker, and since we
> > > upgraded to BK 4.15 in May, we should be able to upgrade the docker
> > > build to make it a multi-arch build.
> > >
> > > kezhenxu94 opened a PR [1] to upgrade our build process to include a
> > > multi-arch docker image build, but he is unable to finish the PR and
> > > has asked for someone else to pick up the work.
> > >
> > > Before we continue the work, does anyone have strong opinions on how
> > > we should update our docker image build? Dave indicated on a separate
> > > thread that we should revisit where the docker images are hosted, and
> > > Enrico indicated on the PR [2] that we might want to consider
> > > automating our docker image build so that the ASF Infra Docker hub bot
> > > builds our images. Once we have consensus on these topics, it should
> > > be straightforward to update the docker build process for the
> > > multi-arch build.
> > >
> > > In my opinion, we need to support a manual build option to be used by
> > > the integration tests (and probably by some users building modified
> > > versions of Pulsar). I also think it could be very convenient to have
> > > our image built by the ASF bot and hosted in the apache docker hub
> > > repo.
> > >
> > > Let me know what you think.
> > >
> > > Thanks,
> > > Michael
> > >
> > > [0] https://github.com/apache/pulsar/issues/12944
> > > [1] https://github.com/apache/pulsar/pull/14005
> > > [2]
> > https://github.com/apache/pulsar/pull/14005#pullrequestreview-913331330
> > >
> >