[GitHub] [pulsar-test-infra] Anonymitaet commented on pull request #70: Fix sync and remove labels

2022-09-08 Thread GitBox


Anonymitaet commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240329162

   @maxsxu 
   could you please review this PR from a technical perspective? Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Lari Hotari
If my assumption of the GitHub usage metrics bug in the GitHub Actions build 
job queue fairness algorithm is correct, what would help is running the flaky 
unit test group outside of Pulsar CI workflow. In that case, the impact of the 
usage metrics would be limited.

The example of https://github.com/apache/pulsar/actions/runs/3003787409/usage 
shows this flaw as explained in the previous email. The total reported 
execution time in that report is 1d 1h 40m 21s of usage and the actual usage is 
about 1/3 of this. 

When we move the most commonly failing job out of Pulsar CI workflow, the 
impact of the possible usage metrics bug would be much less. I hope GitHub 
support responds to my issue and queries about this bug. It might take up to 7 
days to get a reply and for technical questions more time. In the meantime we 
need a solution for getting over this CI slowness issue.

-Lari



On 2022/09/08 06:34:42 Lari Hotari wrote:
> My current assumption of the CI slowness problem is that the usage metrics 
> for Apache Pulsar builds on GitHub side is done incorrectly and that is 
> resulting in apache/pulsar builds getting throttled. This assumption might be 
> wrong, but it's the best guess at the moment.
> 
> The facts that support this assumption is that when re-running failed jobs in 
> a workflow, the execution times for previously successful jobs get counted as 
> if they have all run:
> Here's an example: 
> https://github.com/apache/pulsar/actions/runs/3003787409/usage
> The reported total usage is about 3x than the actual usage.
> 
> The assumption that I have is that the "fairness algorithm" that GitHub uses 
> to provide all Apache projects about the same amount of GitHub Actions 
> resources would take this flawed usage as the basis of it's decisions and it 
> decides to throttle apache/pulsar builds.
> 
> The reason why we are getting hit by this now is that there is a high number 
> of flaky test failures that cause almost every build to fail and we have been 
> re-running a lot of builds.
> 
> The other fact to support the theory of flawed usage metrics used in the 
> fairness algorithm is that other Apache projects aren't reporting issues 
> about GitHub Actions slowness. This is mentioned in Jarek Potiuk's comments 
> on INFRA-23633 [1]:
> > Unlike the case 2 years ago, the problem is not affecting all projects. In 
> > Apache Airflow we do > not see any particular slow-down with Public Runners 
> > at this moment (just checked - > 
> > everything is "as usual").. So I'd say it is something specific to Pulsar 
> > not to "ASF" as a whole.
> 
> There are also other comments from Jarek about the GitHub "fairness 
> algorithm" (comment [2], other comment [3])
> > But I believe the current problem is different - it might be (looking at 
> > your jobs) simply a bug 
> > in GA that you hit or indeed your demands are simply too high. 
> 
> I have opened tickets (2 tickets: 2 days ago and yesterday) to 
> support.github.com and there hasn't been any response to the ticket. It might 
> take up to 7 days to get a response. We cannot rely on GitHub Support 
> resolving this issue.
> 
> I propose that we go ahead with the previously suggested action plan
> > One possible way forward:
> > 1. Cancel all existing builds in_progress or queued
> > 2. Edit .asf.yaml and drop the "required checks" requirement for merging 
> > PRs.
> > 3. Wait for build to run for .asf.yaml change, merge it
> > 4. Disable all workflows
> > 5. Process specific PRs manually to improve the situation.
> >- Make GHA workflow improvements such as 
> > https://github.com/apache/pulsar/pull/17491 and 
> > https://github.com/apache/pulsar/pull/17490
> >- Quarantine all very flaky tests so that everyone doesn't waste time 
> > with those. It should be possible to merge a PR even when a quarantined 
> > test fails.
> > 6. Rebase PRs (or close and re-open) that would be processed next so that 
> > changes are picked up
> > 7. Enable workflows
> > 8. Start processing PRs with checks to see if things are handled in a 
> > better way.
> > 9. When things are stable, enable required checks again in .asf.yaml, in 
> > the meantime be careful about merging PRs
> > 10. Fix quarantined flaky tests
> 
> To clarify, steps 1-6 would be done optimally in 1 day and we would stop 
> processing ordinary PRs during this time. We would only handle PRs that fix 
> the CI situation during this exceptional period.
> 
> -Lari
> 
> Links to Jarek's comments:
> [1] 
> https://issues.apache.org/jira/browse/INFRA-23633?focusedCommentId=17600749&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17600749
> [2] 
> https://issues.apache.org/jira/browse/INFRA-23633?focusedCommentId=17600893&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17600893
> [3] 
> https://issues.apache.org/jira/browse/INFRA-23633?focusedCommentId=17600893&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#commen

[GitHub] [pulsar-client-node] rkaw92 commented on issue #230: await flush() does not actually flush

2022-09-08 Thread GitBox


rkaw92 commented on issue #230:
URL: 
https://github.com/apache/pulsar-client-node/issues/230#issuecomment-1240333920

   I currently rely on back-off, but it's not optimal. As for filling the 
queue, I'd think that issuing exactly as many `send()` operations as should fit 
in the buffer (exactly equal to maxPendingMessages) and then flushing should 
get rid of them all, so that the new pending messages = 0.
   
   The code is running the latest master version from this repository. As for 
the C++ counterpart, it's 2.9.1 from a .deb.
   
   I'll try to debug the C++ library - clearly, there's something going on with 
flush unreliability. I'll also check if the behavior is the same between the 
normal Producer and the PartitionedProducer.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] tisonkun commented on pull request #70: Fix sync and remove labels

2022-09-08 Thread GitBox


tisonkun commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240336055

   Can I summarize the issue on the current master is that the `doc` related 
labels can be more than they should be?
   
   That's, no comments mangled regression as 
https://github.com/apache/pulsar-test-infra/issues/47, or the box is checked 
but still complain doc-label-missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] tisonkun commented on pull request #70: Fix sync and remove labels

2022-09-08 Thread GitBox


tisonkun commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240338681

   I think we should try to decide which input is the source. Right now,
   
   1. A change on labels manually by committers can trigger PR description 
modified.
   2. A change in PR description can update labels.
   
   I suppose we just use the PR description checkbox as the source, and this 
can also resolve @michaeljmarshall's #47 that the bot should not update PR 
description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] maxsxu commented on a diff in pull request #70: Fix sync and remove labels

2022-09-08 Thread GitBox


maxsxu commented on code in PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#discussion_r965611984


##
docbot/action.go:
##
@@ -47,11 +48,17 @@ func NewActionWithClient(ctx context.Context, ac 
*ActionConfig, client *github.C
}
 }
 
+const openedActionType = "opened"
+const editedActionType = "edited"
+const unlabeledActionType = "unlabeled"
+const labeledActionType = "labeled"
+
 func (a *Action) Run(prNumber int, actionType string) error {
a.prNumber = prNumber
+   a.actionType = actionType
 
switch actionType {
-   case "opened", "edited", "labeled", "unlabeled":
+   case openedActionType, editedActionType, labeledActionType, 
unlabeledActionType:

Review Comment:
   Suggest handling labeled and unlabled event in another case:



##
docbot/action.go:
##
@@ -47,11 +48,17 @@ func NewActionWithClient(ctx context.Context, ac 
*ActionConfig, client *github.C
}
 }
 
+const openedActionType = "opened"
+const editedActionType = "edited"
+const unlabeledActionType = "unlabeled"
+const labeledActionType = "labeled"

Review Comment:
   Suggest using one const group like:
   ```
   
   const (
openedActionType = "opened"
editedActionType = "edited"
unlabeledActionType = "unlabeled"
labeledActionType = "labeled"
   )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Nicolò Boschi
Thanks Lari for the detailed explanation. This is kind of an emergency
situation and I believe your plan is the way to go now.

I already prepared a pull for moving the flaky suite out of the Pulsar CI
workflow: https://github.com/nicoloboschi/pulsar/pull/8
I can take care of the execution of the plan.

> 1. Cancel all existing builds in_progress or queued

I have a script locally that uses GHA to check and cancel the pending runs.
We can extend it to all the queued builds (will share it soon).

> 2. Edit .asf.yaml and drop the "required checks" requirement for merging
PRs.
> 3. Wait for build to run for .asf.yaml change, merge it

After the pull is out, we'll need to cancel all other workflows that
contributors may inadvertently have triggered.

> 4. Disable all workflows
> 5. Process specific PRs manually to improve the situation.
>- Make GHA workflow improvements such as
https://github.com/apache/pulsar/pull/17491 and
https://github.com/apache/pulsar/pull/17490
>- Quarantine all very flaky tests so that everyone doesn't waste time
with those. It should be possible to merge a PR even when a quarantined
test fails.

in this step we will merge this
https://github.com/nicoloboschi/pulsar/pull/8

I want to add to the list this improvement to reduce runners usage in case
of doc or cpp changes.
https://github.com/nicoloboschi/pulsar/pull/7


> 6. Rebase PRs (or close and re-open) that would be processed next so that
changes are picked up

It's better to leave this task to the author of the pull in order to not
create too much load at the same time

> 7. Enable workflows
> 8. Start processing PRs with checks to see if things are handled in a
better way.
> 9. When things are stable, enable required checks again in .asf.yaml, in
the meantime be careful about merging PRs
> 10. Fix quarantined flaky tests


Nicolò Boschi


Il giorno gio 8 set 2022 alle ore 09:27 Lari Hotari  ha
scritto:

> If my assumption of the GitHub usage metrics bug in the GitHub Actions
> build job queue fairness algorithm is correct, what would help is running
> the flaky unit test group outside of Pulsar CI workflow. In that case, the
> impact of the usage metrics would be limited.
>
> The example of
> https://github.com/apache/pulsar/actions/runs/3003787409/usage shows this
> flaw as explained in the previous email. The total reported execution time
> in that report is 1d 1h 40m 21s of usage and the actual usage is about 1/3
> of this.
>
> When we move the most commonly failing job out of Pulsar CI workflow, the
> impact of the possible usage metrics bug would be much less. I hope GitHub
> support responds to my issue and queries about this bug. It might take up
> to 7 days to get a reply and for technical questions more time. In the
> meantime we need a solution for getting over this CI slowness issue.
>
> -Lari
>
>
>
> On 2022/09/08 06:34:42 Lari Hotari wrote:
> > My current assumption of the CI slowness problem is that the usage
> metrics for Apache Pulsar builds on GitHub side is done incorrectly and
> that is resulting in apache/pulsar builds getting throttled. This
> assumption might be wrong, but it's the best guess at the moment.
> >
> > The facts that support this assumption is that when re-running failed
> jobs in a workflow, the execution times for previously successful jobs get
> counted as if they have all run:
> > Here's an example:
> https://github.com/apache/pulsar/actions/runs/3003787409/usage
> > The reported total usage is about 3x than the actual usage.
> >
> > The assumption that I have is that the "fairness algorithm" that GitHub
> uses to provide all Apache projects about the same amount of GitHub Actions
> resources would take this flawed usage as the basis of it's decisions and
> it decides to throttle apache/pulsar builds.
> >
> > The reason why we are getting hit by this now is that there is a high
> number of flaky test failures that cause almost every build to fail and we
> have been re-running a lot of builds.
> >
> > The other fact to support the theory of flawed usage metrics used in the
> fairness algorithm is that other Apache projects aren't reporting issues
> about GitHub Actions slowness. This is mentioned in Jarek Potiuk's comments
> on INFRA-23633 [1]:
> > > Unlike the case 2 years ago, the problem is not affecting all
> projects. In Apache Airflow we do > not see any particular slow-down with
> Public Runners at this moment (just checked - >
> > > everything is "as usual").. So I'd say it is something specific to
> Pulsar not to "ASF" as a whole.
> >
> > There are also other comments from Jarek about the GitHub "fairness
> algorithm" (comment [2], other comment [3])
> > > But I believe the current problem is different - it might be (looking
> at your jobs) simply a bug
> > > in GA that you hit or indeed your demands are simply too high.
> >
> > I have opened tickets (2 tickets: 2 days ago and yesterday) to
> support.github.com and there hasn't been any response to the ticket. It
> might take 

[GitHub] [pulsar-test-infra] Anonymitaet commented on pull request #70: Fix sync and remove labels

2022-09-08 Thread GitBox


Anonymitaet commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240351605

   @tisonkun thanks!
   
   "One source" (only modifying PR description triggers bot) is an ideal 
situation, but we can not enforce ppl not to update labels from **Label** icon 
because it's convenient for committers, that's why we want to sync label info 
from **Label** icon to label info in PR description.
   
   https://user-images.githubusercontent.com/50226895/189065192-d93f3e7f-3c93-4215-a888-80c6a684f0b1.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-client-node] rkaw92 commented on issue #230: await flush() does not actually flush

2022-09-08 Thread GitBox


rkaw92 commented on issue #230:
URL: 
https://github.com/apache/pulsar-client-node/issues/230#issuecomment-1240355308

   Aha!
   
   
https://github.com/apache/pulsar-client-node/blob/master/src/ProducerConfig.cc#L103
   
   Setting `maxPendingMessagesAcrossPartitions` actually calls 
`pulsar_producer_configuration_set_max_pending_messages`, not 
`pulsar_producer_configuration_set_max_pending_messages_across_partitions`.
   
   So the default stays the same at 50k: 
https://github.com/apache/pulsar/blob/bff34000385f6faf6dbff4385d0dc562602ac623/pulsar-client-cpp/lib/ProducerConfigurationImpl.h#L36
   
   And indeed, I'm only able to reproduce this error on a partitioned topic. 
Let's see:
   * I set maxPendingMessages to 1 (ten thousand) in code
   * I set maxPendingMessagesAcrossPartitions to some really high value like 1 
million
   * I'm writing to a topic with 16 partitions
   
   So, by my logic, the maximum number I can actually fit between `flush()` 
calls is 5 (**the default from C++**), divided by the partition count. 
5 / 16 = 3125. This is regardless of my settings. Let's see if this is true!
   
   ```
   node perf/perf_producer.js -m 3125 -i 10 -u pulsar://mycluster.url --topic 
persistent://rkaw-test/benchmark/perf_producer
   # this completes perfectly
   ```
   
   and...
   ```
   node perf/perf_producer.js -m 3126 -i 10 -u pulsar://mycluster.url --topic 
persistent://rkaw-test/benchmark/perf_producer
   [...]
   [Error: Failed to send message: ProducerQueueIsFull]
   ```
   
   So yeah, it's the configuration not making it to the actual client. I think 
flush() works, after all. Some caveats:
   * It only works with `batchEnabled: true` - flushing seems to have no effect 
if you disable batches (by design?), so you can still get ProducerQueueIsFull 
in non-batch mode
   * This problem only affects publishing to a partitioned topic
   * I'm not sure why the Producer is erroring out so easily, i.e. if you 
exceed the *real* maxPendingMessages by 1. I'd expect the messages to go to 
different partitions - maybe batching is causing them to bunch up in one 
underlying Producer for 1 partition.
   
   I'll try a local fix that actually propagates 
`pulsar_producer_configuration_set_max_pending_messages_across_partitions` and 
see if that alleviates the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-client-node] rkaw92 opened a new pull request, #231: Respect maxPendingMessagesAcrossPartitions

2022-09-08 Thread GitBox


rkaw92 opened a new pull request, #231:
URL: https://github.com/apache/pulsar-client-node/pull/231

   Fixes #230 
   
   This change should help avoid `ProducerQueueIsFull` errors when producing to 
a partitioned topic with batchingEnabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Nicolò Boschi
This is the pull for step 2. https://github.com/apache/pulsar/pull/17539

This is the script I'm going to use to cancel pending workflows.
https://github.com/nicoloboschi/pulsar-validation-tool/blob/master/pulsar-scripts/pulsar-gha/cancel-workflows.js

I'm going to run the script in minutes.

I advertised on Slack about what is happening:
https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1662624668695339?thread_ts=1662463042.016709&cid=C5ZSVEN4E

>we’re going to execute the plan described in the ML. So any queued actions
will be cancelled. In order to validate your pull it is suggested to run
the actions in your own Pulsar fork. Please don’t re-run failed jobs or
push any other commits to avoid triggering new actions


Nicolò Boschi


Il giorno gio 8 set 2022 alle ore 09:42 Nicolò Boschi 
ha scritto:

> Thanks Lari for the detailed explanation. This is kind of an emergency
> situation and I believe your plan is the way to go now.
>
> I already prepared a pull for moving the flaky suite out of the Pulsar CI
> workflow: https://github.com/nicoloboschi/pulsar/pull/8
> I can take care of the execution of the plan.
>
> > 1. Cancel all existing builds in_progress or queued
>
> I have a script locally that uses GHA to check and cancel the pending
> runs. We can extend it to all the queued builds (will share it soon).
>
> > 2. Edit .asf.yaml and drop the "required checks" requirement for merging
> PRs.
> > 3. Wait for build to run for .asf.yaml change, merge it
>
> After the pull is out, we'll need to cancel all other workflows that
> contributors may inadvertently have triggered.
>
> > 4. Disable all workflows
> > 5. Process specific PRs manually to improve the situation.
> >- Make GHA workflow improvements such as
> https://github.com/apache/pulsar/pull/17491 and
> https://github.com/apache/pulsar/pull/17490
> >- Quarantine all very flaky tests so that everyone doesn't waste time
> with those. It should be possible to merge a PR even when a quarantined
> test fails.
>
> in this step we will merge this
> https://github.com/nicoloboschi/pulsar/pull/8
>
> I want to add to the list this improvement to reduce runners usage in case
> of doc or cpp changes.
> https://github.com/nicoloboschi/pulsar/pull/7
>
>
> > 6. Rebase PRs (or close and re-open) that would be processed next so
> that changes are picked up
>
> It's better to leave this task to the author of the pull in order to not
> create too much load at the same time
>
> > 7. Enable workflows
> > 8. Start processing PRs with checks to see if things are handled in a
> better way.
> > 9. When things are stable, enable required checks again in .asf.yaml, in
> the meantime be careful about merging PRs
> > 10. Fix quarantined flaky tests
>
>
> Nicolò Boschi
>
>
> Il giorno gio 8 set 2022 alle ore 09:27 Lari Hotari 
> ha scritto:
>
>> If my assumption of the GitHub usage metrics bug in the GitHub Actions
>> build job queue fairness algorithm is correct, what would help is running
>> the flaky unit test group outside of Pulsar CI workflow. In that case, the
>> impact of the usage metrics would be limited.
>>
>> The example of
>> https://github.com/apache/pulsar/actions/runs/3003787409/usage shows
>> this flaw as explained in the previous email. The total reported execution
>> time in that report is 1d 1h 40m 21s of usage and the actual usage is about
>> 1/3 of this.
>>
>> When we move the most commonly failing job out of Pulsar CI workflow, the
>> impact of the possible usage metrics bug would be much less. I hope GitHub
>> support responds to my issue and queries about this bug. It might take up
>> to 7 days to get a reply and for technical questions more time. In the
>> meantime we need a solution for getting over this CI slowness issue.
>>
>> -Lari
>>
>>
>>
>> On 2022/09/08 06:34:42 Lari Hotari wrote:
>> > My current assumption of the CI slowness problem is that the usage
>> metrics for Apache Pulsar builds on GitHub side is done incorrectly and
>> that is resulting in apache/pulsar builds getting throttled. This
>> assumption might be wrong, but it's the best guess at the moment.
>> >
>> > The facts that support this assumption is that when re-running failed
>> jobs in a workflow, the execution times for previously successful jobs get
>> counted as if they have all run:
>> > Here's an example:
>> https://github.com/apache/pulsar/actions/runs/3003787409/usage
>> > The reported total usage is about 3x than the actual usage.
>> >
>> > The assumption that I have is that the "fairness algorithm" that GitHub
>> uses to provide all Apache projects about the same amount of GitHub Actions
>> resources would take this flawed usage as the basis of it's decisions and
>> it decides to throttle apache/pulsar builds.
>> >
>> > The reason why we are getting hit by this now is that there is a high
>> number of flaky test failures that cause almost every build to fail and we
>> have been re-running a lot of builds.
>> >
>> > The other fact to support the theory of flawed us

Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Lari Hotari
Thank you Nicolo.
There's lazy consensus, let's go forward with the action plan.

-Lari

On 2022/09/08 08:16:05 Nicolò Boschi wrote:
> This is the pull for step 2. https://github.com/apache/pulsar/pull/17539
> 
> This is the script I'm going to use to cancel pending workflows.
> https://github.com/nicoloboschi/pulsar-validation-tool/blob/master/pulsar-scripts/pulsar-gha/cancel-workflows.js
> 
> I'm going to run the script in minutes.
> 
> I advertised on Slack about what is happening:
> https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1662624668695339?thread_ts=1662463042.016709&cid=C5ZSVEN4E
> 
> >we’re going to execute the plan described in the ML. So any queued actions
> will be cancelled. In order to validate your pull it is suggested to run
> the actions in your own Pulsar fork. Please don’t re-run failed jobs or
> push any other commits to avoid triggering new actions
> 
> 
> Nicolò Boschi
> 
> 
> Il giorno gio 8 set 2022 alle ore 09:42 Nicolò Boschi 
> ha scritto:
> 
> > Thanks Lari for the detailed explanation. This is kind of an emergency
> > situation and I believe your plan is the way to go now.
> >
> > I already prepared a pull for moving the flaky suite out of the Pulsar CI
> > workflow: https://github.com/nicoloboschi/pulsar/pull/8
> > I can take care of the execution of the plan.
> >
> > > 1. Cancel all existing builds in_progress or queued
> >
> > I have a script locally that uses GHA to check and cancel the pending
> > runs. We can extend it to all the queued builds (will share it soon).
> >
> > > 2. Edit .asf.yaml and drop the "required checks" requirement for merging
> > PRs.
> > > 3. Wait for build to run for .asf.yaml change, merge it
> >
> > After the pull is out, we'll need to cancel all other workflows that
> > contributors may inadvertently have triggered.
> >
> > > 4. Disable all workflows
> > > 5. Process specific PRs manually to improve the situation.
> > >- Make GHA workflow improvements such as
> > https://github.com/apache/pulsar/pull/17491 and
> > https://github.com/apache/pulsar/pull/17490
> > >- Quarantine all very flaky tests so that everyone doesn't waste time
> > with those. It should be possible to merge a PR even when a quarantined
> > test fails.
> >
> > in this step we will merge this
> > https://github.com/nicoloboschi/pulsar/pull/8
> >
> > I want to add to the list this improvement to reduce runners usage in case
> > of doc or cpp changes.
> > https://github.com/nicoloboschi/pulsar/pull/7
> >
> >
> > > 6. Rebase PRs (or close and re-open) that would be processed next so
> > that changes are picked up
> >
> > It's better to leave this task to the author of the pull in order to not
> > create too much load at the same time
> >
> > > 7. Enable workflows
> > > 8. Start processing PRs with checks to see if things are handled in a
> > better way.
> > > 9. When things are stable, enable required checks again in .asf.yaml, in
> > the meantime be careful about merging PRs
> > > 10. Fix quarantined flaky tests
> >
> >
> > Nicolò Boschi
> >
> >
> > Il giorno gio 8 set 2022 alle ore 09:27 Lari Hotari 
> > ha scritto:
> >
> >> If my assumption of the GitHub usage metrics bug in the GitHub Actions
> >> build job queue fairness algorithm is correct, what would help is running
> >> the flaky unit test group outside of Pulsar CI workflow. In that case, the
> >> impact of the usage metrics would be limited.
> >>
> >> The example of
> >> https://github.com/apache/pulsar/actions/runs/3003787409/usage shows
> >> this flaw as explained in the previous email. The total reported execution
> >> time in that report is 1d 1h 40m 21s of usage and the actual usage is about
> >> 1/3 of this.
> >>
> >> When we move the most commonly failing job out of Pulsar CI workflow, the
> >> impact of the possible usage metrics bug would be much less. I hope GitHub
> >> support responds to my issue and queries about this bug. It might take up
> >> to 7 days to get a reply and for technical questions more time. In the
> >> meantime we need a solution for getting over this CI slowness issue.
> >>
> >> -Lari
> >>
> >>
> >>
> >> On 2022/09/08 06:34:42 Lari Hotari wrote:
> >> > My current assumption of the CI slowness problem is that the usage
> >> metrics for Apache Pulsar builds on GitHub side is done incorrectly and
> >> that is resulting in apache/pulsar builds getting throttled. This
> >> assumption might be wrong, but it's the best guess at the moment.
> >> >
> >> > The facts that support this assumption is that when re-running failed
> >> jobs in a workflow, the execution times for previously successful jobs get
> >> counted as if they have all run:
> >> > Here's an example:
> >> https://github.com/apache/pulsar/actions/runs/3003787409/usage
> >> > The reported total usage is about 3x than the actual usage.
> >> >
> >> > The assumption that I have is that the "fairness algorithm" that GitHub
> >> uses to provide all Apache projects about the same amount of GitHub Actions
> >> resources would t

[GitHub] [pulsar-test-infra] nodece commented on pull request #70: Fix remove labels

2022-09-08 Thread GitBox


nodece commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240448306

   We have discussed this offline, using PR description as the only input, 
thanks all.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] tisonkun commented on a diff in pull request #70: Fix remove labels

2022-09-08 Thread GitBox


tisonkun commented on code in PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#discussion_r965721600


##
docbot/action.go:
##
@@ -47,11 +47,16 @@ func NewActionWithClient(ctx context.Context, ac 
*ActionConfig, client *github.C
}
 }
 
+const (
+   openedActionType = "opened"
+   editedActionType = "edited"
+)
+
 func (a *Action) Run(prNumber int, actionType string) error {
a.prNumber = prNumber
 
switch actionType {
-   case "opened", "edited", "labeled", "unlabeled":
+   case openedActionType, editedActionType:

Review Comment:
   I think we should handle labeled/unlabeled action here to undo unintended 
editions?
   
   For example, if only "doc-not-needed" box is checked and a committer check 
"doc" label, we should remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] nodece commented on a diff in pull request #70: Fix remove labels

2022-09-08 Thread GitBox


nodece commented on code in PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#discussion_r965725119


##
docbot/action.go:
##
@@ -47,11 +47,16 @@ func NewActionWithClient(ctx context.Context, ac 
*ActionConfig, client *github.C
}
 }
 
+const (
+   openedActionType = "opened"
+   editedActionType = "edited"
+)
+
 func (a *Action) Run(prNumber int, actionType string) error {
a.prNumber = prNumber
 
switch actionType {
-   case "opened", "edited", "labeled", "unlabeled":
+   case openedActionType, editedActionType:

Review Comment:
   Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] nodece commented on pull request #70: Fix remove labels

2022-09-08 Thread GitBox


nodece commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240491889

   To publish to the Apache repository, we need PMC operations.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-manager] DimitryVlasenko opened a new issue, #487: Set password returns 404

2022-09-08 Thread GitBox


DimitryVlasenko opened a new issue, #487:
URL: https://github.com/apache/pulsar-manager/issues/487

   **Pulsar manager was deployed using official helm chart** 
   
   **when trying to set an admin password as per** 
https://github.com/apache/pulsar-manager#access-pulsar-manager
   
   root@pulsar-pulsar-manager-6b54664667-wfc2g:/pulsar-manager/pulsar-manager# 
curl -H "X-XSRF-TOKEN: $CSRF_TOKEN" -H "Cookie: 
XSRF-TOKEN=$CSRF_TOKEN;" -H 'Content-Type: application/json' -X PUT 
http://127.0.0.1:5432/pulsar-manager/users/superuser -d '{"name": "admin", 
"password": "apachepulsar", "description": "test", "email": 
"usern...@test.org"}'
   curl: (7) Failed to connect to 127.0.0.1 port 5432: Connection refused
   root@pulsar-pulsar-manager-6b54664667-wfc2g:/pulsar-manager/pulsar-manager# 
curl -H "X-XSRF-TOKEN: $CSRF_TOKEN" -H "Cookie: 
XSRF-TOKEN=$CSRF_TOKEN;" -H 'Content-Type: application/json' -X PUT 
http://127.0.0.1:9527/pulsar-manager/users/superuser -d '{"name": "admin", 
"password": "apachepulsar", "description": "test", "email": "usern...@test.org"}
   
   **_executed inside the container returns_** 
   
   
   404 Not Found
   
   404 Not Found
   nginx/1.18.0
   
   
   
   as well as 
   
   root@pulsar-pulsar-manager-6b54664667-wfc2g:/pulsar-manager/pulsar-manager# 
curl http://127.0.0.1:9527/pulsar-manager/users/superuser
   
   404 Not Found
   
   404 Not Found
   nginx/1.18.0
   
   
   
   **But service itself up-and-running** 
   
   root@pulsar-pulsar-manager-6b54664667-wfc2g:/pulsar-manager/pulsar-manager# 
curl http://127.0.0.1:9527
   Pulsar
 Admin UI!function(e){function n(n){for(var 
t,u,o=n[0],f=n[1],d=n[2],h=0,b=[];h

Re: [VOTE] PIP-196 Segmented transaction buffer snapshot

2022-09-08 Thread Xiangying Meng
Hi, community
This proposal has some updates. The latest version of the proposal can be
found here

.
Feel free to comment on this doc.
Sincerely,
Xiangying

On Wed, Sep 7, 2022 at 4:55 PM Xiangying Meng  wrote:

> Hi, community
> I,d like to start a vote for the PIP-196
>  Segmented transaction
> buffer snapshot.
> And the discussion can be found here
> .
>
> Sincerely,
> Xiangying
>
>


[GitHub] [pulsar] jinxiaoyi created a discussion: pulsar-daemon start standalone

2022-09-08 Thread GitBox


GitHub user jinxiaoyi created a discussion: pulsar-daemon start standalone

when i start pulsar2.10.1 use  ./pulsar-daemon start standalone
3181 port is already use.   how can i  fix it.  may i change the port?

GitHub link: https://github.com/apache/pulsar/discussions/17550


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar-test-infra] maxsxu commented on a diff in pull request #70: Fix remove labels

2022-09-08 Thread GitBox


maxsxu commented on code in PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#discussion_r966069645


##
docbot/action.go:
##
@@ -130,10 +137,13 @@ func (a *Action) checkLabels() error {
for label, checked := range expectedLabelsMap {
_, found := prLabels[label]
if found {
-   continue
-   }
-   if checked {
-   labelsToAdd[label] = struct{}{}
+   if !checked {
+   labelsToRemove[label] = struct{}{}
+   }
+   } else {
+   if checked {
+   labelsToAdd[label] = struct{}{}
+   }

Review Comment:
   Suggest making code more compact:
   ```
   if _, found := prLabels[label]; found && !checked {
labelsToRemove[label] = struct{}{}
   } else if !found && checked {
labelsToAdd[label] = struct{}{}
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] nodece merged pull request #70: Fix remove labels

2022-09-08 Thread GitBox


nodece merged PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] tisonkun commented on pull request #70: Fix remove labels

2022-09-08 Thread GitBox


tisonkun commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240869571

   I noticed that this PR doesn't have a diff to remove PR edit logic. Will 
take a closer look now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] tisonkun commented on issue #47: Remove Doc Bot Logic To Update PR Descriptions

2022-09-08 Thread GitBox


tisonkun commented on issue #47:
URL: 
https://github.com/apache/pulsar-test-infra/issues/47#issuecomment-1240874814

   After #69 & #70 this issue has been resolved. We no longer edit PR 
description in docbot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] nodece commented on pull request #70: Fix remove labels

2022-09-08 Thread GitBox


nodece commented on PR #70:
URL: https://github.com/apache/pulsar-test-infra/pull/70#issuecomment-1240875788

   `addAndCleanupHelpComment()` method is safe, which doesn't add comments more 
than once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] nodece commented on issue #47: Remove Doc Bot Logic To Update PR Descriptions

2022-09-08 Thread GitBox


nodece commented on issue #47:
URL: 
https://github.com/apache/pulsar-test-infra/issues/47#issuecomment-1240879014

   Fixed #69.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-test-infra] nodece closed issue #47: Remove Doc Bot Logic To Update PR Descriptions

2022-09-08 Thread GitBox


nodece closed issue #47: Remove Doc Bot Logic To Update PR Descriptions
URL: https://github.com/apache/pulsar-test-infra/issues/47


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] Consumer reconnection causes repeated consumption messages

2022-09-08 Thread 丛搏
Hi Haiting

When using cumulative ack, we can save the maximum received MessageId
on the consumer client side to filter the message duplication caused
by reconnection, if the consumer client process restarts the maximum
received MessageId will not exist in the consumer client. This
requires the user to be responsible for the received messages, and if
the user wants to re-consume the received messages, they need to call
`void redeliverUnacknowledgedMessages().` then clear the maximum
received MessageId from the consumer client

Thanks,
Bo

Haiting Jiang  于2022年9月8日周四 14:51写道:
>
> From the user's perspective, I think we should always avoid delivering
> repeated messages.
> But can the broker tell if the reconnection is caused by topic
> unloading or consumer client process restarting?
> For the latter case, the message should be redelivered, it's the whole
> point of user explicit acking.
>
> Thanks,
> Haiting
>
> On Thu, Sep 8, 2022 at 10:56 AM 丛搏  wrote:
> >
> > Hello, Pulsar community:
> >
> >
> > Now the consumer does not filter messages that have already been
> > consumed. After consumer reconnection, the broker will dispatch
> > messages to the consumer from the markDeletePosition. In Failover and
> > Exclusive subscription type, all messages in a topic will be
> > dispatched to the same consumer. Let's look at the following example:
> >
> > ```
> >
> > @Test
> >
> > public void testConsumerReconnectionRepeatedConsumeMessage()
> > throws Exception {
> >
> > final String topic = 
> > "testConsumerReconnectionRepeatedConsumeMessage";
> >
> > @Cleanup
> >
> > Producer producer = pulsarClient.newProducer(Schema.STRING)
> >
> > .topic(topic).sendTimeout(0,
> > TimeUnit.SECONDS).enableBatching(false).create();
> >
> > @Cleanup
> >
> > Consumer consumer =
> > pulsarClient.newConsumer(Schema.STRING).subscriptionType(SubscriptionType.Exclusive)
> >
> > .topic(topic).subscriptionName("test").subscribe();
> >
> >
> > // send 5 message
> >
> > for (int i = 0; i < 5; i++) {
> >
> > producer.send("test" + i);
> >
> > }
> >
> >
> > // consumer receive 5 messages
> >
> > for (int i = 0; i < 5; i++) {
> >
> > consumer.receive();
> >
> > }
> >
> >
> > admin.topics().unload(topic);
> >
> >
> > // consumer receive also can receive 5 messages
> >
> > Message message = null;
> >
> > for (int i = 0; i < 5; i++) {
> >
> > message = consumer.receive();
> >
> > }
> >
> > consumer.acknowledgeCumulativeAsync(message.getMessageId());
> >
> > }
> >
> > ```
> >
> > Through the above example, the consumer repeatedly consumes the 5
> > messages sent by the producer, and acks through cumulative ack. If per
> > 1000, 1 messages cumulative ack once, there will be a lot of
> > repeated consumption that may be caused by consumer reconnection.
> > Although it does not affect the semantics at at-least-once, it will
> > cause a lot of useless overhead.
> >
> >
> > Most importantly it destroys the semantics of pulsar transactions 
> > exactly-once.
> >
> >
> > I want to discuss whether we should fix normal and transaction
> > cumulative acks in the same way. Prevent repeated consumption of
> > messages due to consumer reconnection, and filter messages that users
> > have received through `consumer.receive()`. Or do we only guarantee
> > excetly-once semantics, only guarantee use transaction will not
> > receive the same messages by cumulative ack with the transaction?
> >
> >
> > Please leave your opinion, thanks! :)
> >
> >
> >
> > Thanks,
> >
> > Bo


Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Nicolò Boschi
Dear community,

The plan has been executed.
The summary of our actions is:
1. We cancelled all pending jobs (queue and in-progress)
2. We removed the required checks to be able to merge improvements on the
CI workflow
3. We merged a couple of improvements:
   1. workarounded the possible bug triggered by jobs retries. Now
broker flaky tests are in a dedicated workflow
   2. moved known flaky tests to the flaky suite
   3. optimized the runner consumption for docs-only and cpp-only pulls
4. We reactivated the required checks.


Now it's possible to come back to normal life.
1. You must rebase your branch to the latest master (there's a button for
you in the UI) or eventually you can close/reopen the pull to trigger the
checks
2. You can merge a pull request if you want
3. You will find a new job in the Checks section called "Pulsar CI / Pulsar
CI checks completed" that indicates the Pulsar CI successfully passed

There's a slight chance that the CI will be stuck again in the next few
days but we will take it monitored.

Thanks Lari for the nice work!

Regards,
Nicolò Boschi


Il giorno gio 8 set 2022 alle ore 10:55 Lari Hotari  ha
scritto:

> Thank you Nicolo.
> There's lazy consensus, let's go forward with the action plan.
>
> -Lari
>
> On 2022/09/08 08:16:05 Nicolò Boschi wrote:
> > This is the pull for step 2. https://github.com/apache/pulsar/pull/17539
> >
> > This is the script I'm going to use to cancel pending workflows.
> >
> https://github.com/nicoloboschi/pulsar-validation-tool/blob/master/pulsar-scripts/pulsar-gha/cancel-workflows.js
> >
> > I'm going to run the script in minutes.
> >
> > I advertised on Slack about what is happening:
> >
> https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1662624668695339?thread_ts=1662463042.016709&cid=C5ZSVEN4E
> >
> > >we’re going to execute the plan described in the ML. So any queued
> actions
> > will be cancelled. In order to validate your pull it is suggested to run
> > the actions in your own Pulsar fork. Please don’t re-run failed jobs or
> > push any other commits to avoid triggering new actions
> >
> >
> > Nicolò Boschi
> >
> >
> > Il giorno gio 8 set 2022 alle ore 09:42 Nicolò Boschi <
> boschi1...@gmail.com>
> > ha scritto:
> >
> > > Thanks Lari for the detailed explanation. This is kind of an emergency
> > > situation and I believe your plan is the way to go now.
> > >
> > > I already prepared a pull for moving the flaky suite out of the Pulsar
> CI
> > > workflow: https://github.com/nicoloboschi/pulsar/pull/8
> > > I can take care of the execution of the plan.
> > >
> > > > 1. Cancel all existing builds in_progress or queued
> > >
> > > I have a script locally that uses GHA to check and cancel the pending
> > > runs. We can extend it to all the queued builds (will share it soon).
> > >
> > > > 2. Edit .asf.yaml and drop the "required checks" requirement for
> merging
> > > PRs.
> > > > 3. Wait for build to run for .asf.yaml change, merge it
> > >
> > > After the pull is out, we'll need to cancel all other workflows that
> > > contributors may inadvertently have triggered.
> > >
> > > > 4. Disable all workflows
> > > > 5. Process specific PRs manually to improve the situation.
> > > >- Make GHA workflow improvements such as
> > > https://github.com/apache/pulsar/pull/17491 and
> > > https://github.com/apache/pulsar/pull/17490
> > > >- Quarantine all very flaky tests so that everyone doesn't waste
> time
> > > with those. It should be possible to merge a PR even when a quarantined
> > > test fails.
> > >
> > > in this step we will merge this
> > > https://github.com/nicoloboschi/pulsar/pull/8
> > >
> > > I want to add to the list this improvement to reduce runners usage in
> case
> > > of doc or cpp changes.
> > > https://github.com/nicoloboschi/pulsar/pull/7
> > >
> > >
> > > > 6. Rebase PRs (or close and re-open) that would be processed next so
> > > that changes are picked up
> > >
> > > It's better to leave this task to the author of the pull in order to
> not
> > > create too much load at the same time
> > >
> > > > 7. Enable workflows
> > > > 8. Start processing PRs with checks to see if things are handled in a
> > > better way.
> > > > 9. When things are stable, enable required checks again in
> .asf.yaml, in
> > > the meantime be careful about merging PRs
> > > > 10. Fix quarantined flaky tests
> > >
> > >
> > > Nicolò Boschi
> > >
> > >
> > > Il giorno gio 8 set 2022 alle ore 09:27 Lari Hotari <
> lhot...@apache.org>
> > > ha scritto:
> > >
> > >> If my assumption of the GitHub usage metrics bug in the GitHub Actions
> > >> build job queue fairness algorithm is correct, what would help is
> running
> > >> the flaky unit test group outside of Pulsar CI workflow. In that
> case, the
> > >> impact of the usage metrics would be limited.
> > >>
> > >> The example of
> > >> https://github.com/apache/pulsar/actions/runs/3003787409/usage shows
> > >> this flaw as explained in the previous email. The total reported
> execution
> > >> time

[GitHub] [pulsar] gitfy added a comment to the discussion: Pulsar functions : python zip with deps wheel

2022-09-08 Thread GitBox


GitHub user gitfy added a comment to the discussion: Pulsar functions : python 
zip with deps wheel

I found the solution for the above issue.
The expectation is to have the zip file with all the wheel files for the 
dependent libraries. Internally pulsar is running the command to install the 
depdencies using the options
```
--no-index --find-links
```
which means, the deps are not download from the public rep, rather it uses the 
local wheels to install them.
Once that's packages, the deployment succeeds. 

Would be nice if the document is improved to provide these technical 
expectations.


GitHub link: 
https://github.com/apache/pulsar/discussions/17532#discussioncomment-3599901


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [VOTE] PIP-196 Segmented transaction buffer snapshot

2022-09-08 Thread 丛搏
Hi, Xiangying
+1(non-binding)

This PIP overall LGTM! It solves the problem of snapshots not being
able to scale.
We can make some optimizations later :
1. Merge transaction snapshot segments to reduce the number of
segments and the index's size.
2. Add snapshot segments to memory as TB requires reducing memory overhead.

Thanks!
Bo

Xiangying Meng  于2022年9月8日周四 22:17写道:
>
> Hi, community
> This proposal has some updates. The latest version of the proposal can be
> found here
> 
> .
> Feel free to comment on this doc.
> Sincerely,
> Xiangying
>
> On Wed, Sep 7, 2022 at 4:55 PM Xiangying Meng  wrote:
>
> > Hi, community
> > I,d like to start a vote for the PIP-196
> >  Segmented transaction
> > buffer snapshot.
> > And the discussion can be found here
> > .
> >
> > Sincerely,
> > Xiangying
> >
> >


Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread tison
Thank you, Lari and Nicolò!
Best,
tison.


Nicolò Boschi  于2022年9月9日周五 02:41写道:

> Dear community,
>
> The plan has been executed.
> The summary of our actions is:
> 1. We cancelled all pending jobs (queue and in-progress)
> 2. We removed the required checks to be able to merge improvements on the
> CI workflow
> 3. We merged a couple of improvements:
>1. workarounded the possible bug triggered by jobs retries. Now
> broker flaky tests are in a dedicated workflow
>2. moved known flaky tests to the flaky suite
>3. optimized the runner consumption for docs-only and cpp-only pulls
> 4. We reactivated the required checks.
>
>
> Now it's possible to come back to normal life.
> 1. You must rebase your branch to the latest master (there's a button for
> you in the UI) or eventually you can close/reopen the pull to trigger the
> checks
> 2. You can merge a pull request if you want
> 3. You will find a new job in the Checks section called "Pulsar CI / Pulsar
> CI checks completed" that indicates the Pulsar CI successfully passed
>
> There's a slight chance that the CI will be stuck again in the next few
> days but we will take it monitored.
>
> Thanks Lari for the nice work!
>
> Regards,
> Nicolò Boschi
>
>
> Il giorno gio 8 set 2022 alle ore 10:55 Lari Hotari 
> ha
> scritto:
>
> > Thank you Nicolo.
> > There's lazy consensus, let's go forward with the action plan.
> >
> > -Lari
> >
> > On 2022/09/08 08:16:05 Nicolò Boschi wrote:
> > > This is the pull for step 2.
> https://github.com/apache/pulsar/pull/17539
> > >
> > > This is the script I'm going to use to cancel pending workflows.
> > >
> >
> https://github.com/nicoloboschi/pulsar-validation-tool/blob/master/pulsar-scripts/pulsar-gha/cancel-workflows.js
> > >
> > > I'm going to run the script in minutes.
> > >
> > > I advertised on Slack about what is happening:
> > >
> >
> https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1662624668695339?thread_ts=1662463042.016709&cid=C5ZSVEN4E
> > >
> > > >we’re going to execute the plan described in the ML. So any queued
> > actions
> > > will be cancelled. In order to validate your pull it is suggested to
> run
> > > the actions in your own Pulsar fork. Please don’t re-run failed jobs or
> > > push any other commits to avoid triggering new actions
> > >
> > >
> > > Nicolò Boschi
> > >
> > >
> > > Il giorno gio 8 set 2022 alle ore 09:42 Nicolò Boschi <
> > boschi1...@gmail.com>
> > > ha scritto:
> > >
> > > > Thanks Lari for the detailed explanation. This is kind of an
> emergency
> > > > situation and I believe your plan is the way to go now.
> > > >
> > > > I already prepared a pull for moving the flaky suite out of the
> Pulsar
> > CI
> > > > workflow: https://github.com/nicoloboschi/pulsar/pull/8
> > > > I can take care of the execution of the plan.
> > > >
> > > > > 1. Cancel all existing builds in_progress or queued
> > > >
> > > > I have a script locally that uses GHA to check and cancel the pending
> > > > runs. We can extend it to all the queued builds (will share it soon).
> > > >
> > > > > 2. Edit .asf.yaml and drop the "required checks" requirement for
> > merging
> > > > PRs.
> > > > > 3. Wait for build to run for .asf.yaml change, merge it
> > > >
> > > > After the pull is out, we'll need to cancel all other workflows that
> > > > contributors may inadvertently have triggered.
> > > >
> > > > > 4. Disable all workflows
> > > > > 5. Process specific PRs manually to improve the situation.
> > > > >- Make GHA workflow improvements such as
> > > > https://github.com/apache/pulsar/pull/17491 and
> > > > https://github.com/apache/pulsar/pull/17490
> > > > >- Quarantine all very flaky tests so that everyone doesn't waste
> > time
> > > > with those. It should be possible to merge a PR even when a
> quarantined
> > > > test fails.
> > > >
> > > > in this step we will merge this
> > > > https://github.com/nicoloboschi/pulsar/pull/8
> > > >
> > > > I want to add to the list this improvement to reduce runners usage in
> > case
> > > > of doc or cpp changes.
> > > > https://github.com/nicoloboschi/pulsar/pull/7
> > > >
> > > >
> > > > > 6. Rebase PRs (or close and re-open) that would be processed next
> so
> > > > that changes are picked up
> > > >
> > > > It's better to leave this task to the author of the pull in order to
> > not
> > > > create too much load at the same time
> > > >
> > > > > 7. Enable workflows
> > > > > 8. Start processing PRs with checks to see if things are handled
> in a
> > > > better way.
> > > > > 9. When things are stable, enable required checks again in
> > .asf.yaml, in
> > > > the meantime be careful about merging PRs
> > > > > 10. Fix quarantined flaky tests
> > > >
> > > >
> > > > Nicolò Boschi
> > > >
> > > >
> > > > Il giorno gio 8 set 2022 alle ore 09:27 Lari Hotari <
> > lhot...@apache.org>
> > > > ha scritto:
> > > >
> > > >> If my assumption of the GitHub usage metrics bug in the GitHub
> Actions
> > > >> build job queue fairness algorithm is correct, what wou

Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Haiting Jiang
Great work. Thank you, Lari and Nicolò.

BR,
Haiting

On Fri, Sep 9, 2022 at 9:36 AM tison  wrote:
>
> Thank you, Lari and Nicolò!
> Best,
> tison.
>
>
> Nicolò Boschi  于2022年9月9日周五 02:41写道:
>
> > Dear community,
> >
> > The plan has been executed.
> > The summary of our actions is:
> > 1. We cancelled all pending jobs (queue and in-progress)
> > 2. We removed the required checks to be able to merge improvements on the
> > CI workflow
> > 3. We merged a couple of improvements:
> >1. workarounded the possible bug triggered by jobs retries. Now
> > broker flaky tests are in a dedicated workflow
> >2. moved known flaky tests to the flaky suite
> >3. optimized the runner consumption for docs-only and cpp-only pulls
> > 4. We reactivated the required checks.
> >
> >
> > Now it's possible to come back to normal life.
> > 1. You must rebase your branch to the latest master (there's a button for
> > you in the UI) or eventually you can close/reopen the pull to trigger the
> > checks
> > 2. You can merge a pull request if you want
> > 3. You will find a new job in the Checks section called "Pulsar CI / Pulsar
> > CI checks completed" that indicates the Pulsar CI successfully passed
> >
> > There's a slight chance that the CI will be stuck again in the next few
> > days but we will take it monitored.
> >
> > Thanks Lari for the nice work!
> >
> > Regards,
> > Nicolò Boschi
> >
> >
> > Il giorno gio 8 set 2022 alle ore 10:55 Lari Hotari 
> > ha
> > scritto:
> >
> > > Thank you Nicolo.
> > > There's lazy consensus, let's go forward with the action plan.
> > >
> > > -Lari
> > >
> > > On 2022/09/08 08:16:05 Nicolò Boschi wrote:
> > > > This is the pull for step 2.
> > https://github.com/apache/pulsar/pull/17539
> > > >
> > > > This is the script I'm going to use to cancel pending workflows.
> > > >
> > >
> > https://github.com/nicoloboschi/pulsar-validation-tool/blob/master/pulsar-scripts/pulsar-gha/cancel-workflows.js
> > > >
> > > > I'm going to run the script in minutes.
> > > >
> > > > I advertised on Slack about what is happening:
> > > >
> > >
> > https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1662624668695339?thread_ts=1662463042.016709&cid=C5ZSVEN4E
> > > >
> > > > >we’re going to execute the plan described in the ML. So any queued
> > > actions
> > > > will be cancelled. In order to validate your pull it is suggested to
> > run
> > > > the actions in your own Pulsar fork. Please don’t re-run failed jobs or
> > > > push any other commits to avoid triggering new actions
> > > >
> > > >
> > > > Nicolò Boschi
> > > >
> > > >
> > > > Il giorno gio 8 set 2022 alle ore 09:42 Nicolò Boschi <
> > > boschi1...@gmail.com>
> > > > ha scritto:
> > > >
> > > > > Thanks Lari for the detailed explanation. This is kind of an
> > emergency
> > > > > situation and I believe your plan is the way to go now.
> > > > >
> > > > > I already prepared a pull for moving the flaky suite out of the
> > Pulsar
> > > CI
> > > > > workflow: https://github.com/nicoloboschi/pulsar/pull/8
> > > > > I can take care of the execution of the plan.
> > > > >
> > > > > > 1. Cancel all existing builds in_progress or queued
> > > > >
> > > > > I have a script locally that uses GHA to check and cancel the pending
> > > > > runs. We can extend it to all the queued builds (will share it soon).
> > > > >
> > > > > > 2. Edit .asf.yaml and drop the "required checks" requirement for
> > > merging
> > > > > PRs.
> > > > > > 3. Wait for build to run for .asf.yaml change, merge it
> > > > >
> > > > > After the pull is out, we'll need to cancel all other workflows that
> > > > > contributors may inadvertently have triggered.
> > > > >
> > > > > > 4. Disable all workflows
> > > > > > 5. Process specific PRs manually to improve the situation.
> > > > > >- Make GHA workflow improvements such as
> > > > > https://github.com/apache/pulsar/pull/17491 and
> > > > > https://github.com/apache/pulsar/pull/17490
> > > > > >- Quarantine all very flaky tests so that everyone doesn't waste
> > > time
> > > > > with those. It should be possible to merge a PR even when a
> > quarantined
> > > > > test fails.
> > > > >
> > > > > in this step we will merge this
> > > > > https://github.com/nicoloboschi/pulsar/pull/8
> > > > >
> > > > > I want to add to the list this improvement to reduce runners usage in
> > > case
> > > > > of doc or cpp changes.
> > > > > https://github.com/nicoloboschi/pulsar/pull/7
> > > > >
> > > > >
> > > > > > 6. Rebase PRs (or close and re-open) that would be processed next
> > so
> > > > > that changes are picked up
> > > > >
> > > > > It's better to leave this task to the author of the pull in order to
> > > not
> > > > > create too much load at the same time
> > > > >
> > > > > > 7. Enable workflows
> > > > > > 8. Start processing PRs with checks to see if things are handled
> > in a
> > > > > better way.
> > > > > > 9. When things are stable, enable required checks again in
> > > .asf.yaml, in
> > > > > the meantime be careful ab

Re: Pulsar CI congested, master branch build broken

2022-09-08 Thread Michael Marshall
Fantastic, thank you Lari and Nicolò!

- Michael

On Thu, Sep 8, 2022 at 9:03 PM Haiting Jiang  wrote:
>
> Great work. Thank you, Lari and Nicolò.
>
> BR,
> Haiting
>
> On Fri, Sep 9, 2022 at 9:36 AM tison  wrote:
> >
> > Thank you, Lari and Nicolò!
> > Best,
> > tison.
> >
> >
> > Nicolò Boschi  于2022年9月9日周五 02:41写道:
> >
> > > Dear community,
> > >
> > > The plan has been executed.
> > > The summary of our actions is:
> > > 1. We cancelled all pending jobs (queue and in-progress)
> > > 2. We removed the required checks to be able to merge improvements on the
> > > CI workflow
> > > 3. We merged a couple of improvements:
> > >1. workarounded the possible bug triggered by jobs retries. Now
> > > broker flaky tests are in a dedicated workflow
> > >2. moved known flaky tests to the flaky suite
> > >3. optimized the runner consumption for docs-only and cpp-only pulls
> > > 4. We reactivated the required checks.
> > >
> > >
> > > Now it's possible to come back to normal life.
> > > 1. You must rebase your branch to the latest master (there's a button for
> > > you in the UI) or eventually you can close/reopen the pull to trigger the
> > > checks
> > > 2. You can merge a pull request if you want
> > > 3. You will find a new job in the Checks section called "Pulsar CI / 
> > > Pulsar
> > > CI checks completed" that indicates the Pulsar CI successfully passed
> > >
> > > There's a slight chance that the CI will be stuck again in the next few
> > > days but we will take it monitored.
> > >
> > > Thanks Lari for the nice work!
> > >
> > > Regards,
> > > Nicolò Boschi
> > >
> > >
> > > Il giorno gio 8 set 2022 alle ore 10:55 Lari Hotari 
> > > ha
> > > scritto:
> > >
> > > > Thank you Nicolo.
> > > > There's lazy consensus, let's go forward with the action plan.
> > > >
> > > > -Lari
> > > >
> > > > On 2022/09/08 08:16:05 Nicolò Boschi wrote:
> > > > > This is the pull for step 2.
> > > https://github.com/apache/pulsar/pull/17539
> > > > >
> > > > > This is the script I'm going to use to cancel pending workflows.
> > > > >
> > > >
> > > https://github.com/nicoloboschi/pulsar-validation-tool/blob/master/pulsar-scripts/pulsar-gha/cancel-workflows.js
> > > > >
> > > > > I'm going to run the script in minutes.
> > > > >
> > > > > I advertised on Slack about what is happening:
> > > > >
> > > >
> > > https://apache-pulsar.slack.com/archives/C5ZSVEN4E/p1662624668695339?thread_ts=1662463042.016709&cid=C5ZSVEN4E
> > > > >
> > > > > >we’re going to execute the plan described in the ML. So any queued
> > > > actions
> > > > > will be cancelled. In order to validate your pull it is suggested to
> > > run
> > > > > the actions in your own Pulsar fork. Please don’t re-run failed jobs 
> > > > > or
> > > > > push any other commits to avoid triggering new actions
> > > > >
> > > > >
> > > > > Nicolò Boschi
> > > > >
> > > > >
> > > > > Il giorno gio 8 set 2022 alle ore 09:42 Nicolò Boschi <
> > > > boschi1...@gmail.com>
> > > > > ha scritto:
> > > > >
> > > > > > Thanks Lari for the detailed explanation. This is kind of an
> > > emergency
> > > > > > situation and I believe your plan is the way to go now.
> > > > > >
> > > > > > I already prepared a pull for moving the flaky suite out of the
> > > Pulsar
> > > > CI
> > > > > > workflow: https://github.com/nicoloboschi/pulsar/pull/8
> > > > > > I can take care of the execution of the plan.
> > > > > >
> > > > > > > 1. Cancel all existing builds in_progress or queued
> > > > > >
> > > > > > I have a script locally that uses GHA to check and cancel the 
> > > > > > pending
> > > > > > runs. We can extend it to all the queued builds (will share it 
> > > > > > soon).
> > > > > >
> > > > > > > 2. Edit .asf.yaml and drop the "required checks" requirement for
> > > > merging
> > > > > > PRs.
> > > > > > > 3. Wait for build to run for .asf.yaml change, merge it
> > > > > >
> > > > > > After the pull is out, we'll need to cancel all other workflows that
> > > > > > contributors may inadvertently have triggered.
> > > > > >
> > > > > > > 4. Disable all workflows
> > > > > > > 5. Process specific PRs manually to improve the situation.
> > > > > > >- Make GHA workflow improvements such as
> > > > > > https://github.com/apache/pulsar/pull/17491 and
> > > > > > https://github.com/apache/pulsar/pull/17490
> > > > > > >- Quarantine all very flaky tests so that everyone doesn't 
> > > > > > > waste
> > > > time
> > > > > > with those. It should be possible to merge a PR even when a
> > > quarantined
> > > > > > test fails.
> > > > > >
> > > > > > in this step we will merge this
> > > > > > https://github.com/nicoloboschi/pulsar/pull/8
> > > > > >
> > > > > > I want to add to the list this improvement to reduce runners usage 
> > > > > > in
> > > > case
> > > > > > of doc or cpp changes.
> > > > > > https://github.com/nicoloboschi/pulsar/pull/7
> > > > > >
> > > > > >
> > > > > > > 6. Rebase PRs (or close and re-open) that would be processed next
> > > so
> > > > > > that changes

Re: [VOTE] PIP-196 Segmented transaction buffer snapshot

2022-09-08 Thread PengHui Li
+1(binding)

 I have done the review on gdoc
And please also update the github issue(PIP).

Thanks,
Penghui

On Fri, Sep 9, 2022 at 9:31 AM 丛搏  wrote:

> Hi, Xiangying
> +1(non-binding)
>
> This PIP overall LGTM! It solves the problem of snapshots not being
> able to scale.
> We can make some optimizations later :
> 1. Merge transaction snapshot segments to reduce the number of
> segments and the index's size.
> 2. Add snapshot segments to memory as TB requires reducing memory overhead.
>
> Thanks!
> Bo
>
> Xiangying Meng  于2022年9月8日周四 22:17写道:
> >
> > Hi, community
> > This proposal has some updates. The latest version of the proposal can be
> > found here
> > <
> https://docs.google.com/document/d/1hBk2nGcj0Os-ULi2q404gCoxIsPleGY8p5H1hqWD5kI/edit#
> >
> > .
> > Feel free to comment on this doc.
> > Sincerely,
> > Xiangying
> >
> > On Wed, Sep 7, 2022 at 4:55 PM Xiangying Meng 
> wrote:
> >
> > > Hi, community
> > > I,d like to start a vote for the PIP-196
> > >  Segmented transaction
> > > buffer snapshot.
> > > And the discussion can be found here
> > > .
> > >
> > > Sincerely,
> > > Xiangying
> > >
> > >
>


Re: [DISCUSS] Consumer reconnection causes repeated consumption messages

2022-09-08 Thread Haiting Jiang
Hi Bo,

Overall it makes sense to me.
It is basically the same as broker side deduplication mechanism when
producing messages, which uses `sequenceId`.
In your case, messageId is used for deduplication. It should work as
long as the received messageId increases monotonically.

So we should be careful of any operations that would reset the cursor.
For example, if the user resets the cursor with the admin client. We
need more detail info on this matter.
And I am not sure if there are other operations that would reset the
cursor implicitly.

Thanks,
Haiting

On Thu, Sep 8, 2022 at 11:36 PM 丛搏  wrote:
>
> Hi Haiting
>
> When using cumulative ack, we can save the maximum received MessageId
> on the consumer client side to filter the message duplication caused
> by reconnection, if the consumer client process restarts the maximum
> received MessageId will not exist in the consumer client. This
> requires the user to be responsible for the received messages, and if
> the user wants to re-consume the received messages, they need to call
> `void redeliverUnacknowledgedMessages().` then clear the maximum
> received MessageId from the consumer client
>
> Thanks,
> Bo
>
> Haiting Jiang  于2022年9月8日周四 14:51写道:
> >
> > From the user's perspective, I think we should always avoid delivering
> > repeated messages.
> > But can the broker tell if the reconnection is caused by topic
> > unloading or consumer client process restarting?
> > For the latter case, the message should be redelivered, it's the whole
> > point of user explicit acking.
> >
> > Thanks,
> > Haiting
> >
> > On Thu, Sep 8, 2022 at 10:56 AM 丛搏  wrote:
> > >
> > > Hello, Pulsar community:
> > >
> > >
> > > Now the consumer does not filter messages that have already been
> > > consumed. After consumer reconnection, the broker will dispatch
> > > messages to the consumer from the markDeletePosition. In Failover and
> > > Exclusive subscription type, all messages in a topic will be
> > > dispatched to the same consumer. Let's look at the following example:
> > >
> > > ```
> > >
> > > @Test
> > >
> > > public void testConsumerReconnectionRepeatedConsumeMessage()
> > > throws Exception {
> > >
> > > final String topic = 
> > > "testConsumerReconnectionRepeatedConsumeMessage";
> > >
> > > @Cleanup
> > >
> > > Producer producer = 
> > > pulsarClient.newProducer(Schema.STRING)
> > >
> > > .topic(topic).sendTimeout(0,
> > > TimeUnit.SECONDS).enableBatching(false).create();
> > >
> > > @Cleanup
> > >
> > > Consumer consumer =
> > > pulsarClient.newConsumer(Schema.STRING).subscriptionType(SubscriptionType.Exclusive)
> > >
> > > .topic(topic).subscriptionName("test").subscribe();
> > >
> > >
> > > // send 5 message
> > >
> > > for (int i = 0; i < 5; i++) {
> > >
> > > producer.send("test" + i);
> > >
> > > }
> > >
> > >
> > > // consumer receive 5 messages
> > >
> > > for (int i = 0; i < 5; i++) {
> > >
> > > consumer.receive();
> > >
> > > }
> > >
> > >
> > > admin.topics().unload(topic);
> > >
> > >
> > > // consumer receive also can receive 5 messages
> > >
> > > Message message = null;
> > >
> > > for (int i = 0; i < 5; i++) {
> > >
> > > message = consumer.receive();
> > >
> > > }
> > >
> > > consumer.acknowledgeCumulativeAsync(message.getMessageId());
> > >
> > > }
> > >
> > > ```
> > >
> > > Through the above example, the consumer repeatedly consumes the 5
> > > messages sent by the producer, and acks through cumulative ack. If per
> > > 1000, 1 messages cumulative ack once, there will be a lot of
> > > repeated consumption that may be caused by consumer reconnection.
> > > Although it does not affect the semantics at at-least-once, it will
> > > cause a lot of useless overhead.
> > >
> > >
> > > Most importantly it destroys the semantics of pulsar transactions 
> > > exactly-once.
> > >
> > >
> > > I want to discuss whether we should fix normal and transaction
> > > cumulative acks in the same way. Prevent repeated consumption of
> > > messages due to consumer reconnection, and filter messages that users
> > > have received through `consumer.receive()`. Or do we only guarantee
> > > excetly-once semantics, only guarantee use transaction will not
> > > receive the same messages by cumulative ack with the transaction?
> > >
> > >
> > > Please leave your opinion, thanks! :)
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Bo


Re: Enable "Update branch" Github button in pull requests

2022-09-08 Thread Hang Chen
Hi Nicolo,
 Thanks for your proposal. After Pulsar enabled the `update
branch` bottom, I found a lot of Prs are blocked by running unit tests
due to a lot of unit tests rerun by the `update branch` action. The
update branch action will cause the CI resource busy and waste a lot
of resources if contributors or committers click the `update branch`
bottom many times for each PR. I think we need to add some
restrictions for the `update branch` bottom. Do you have any ideas?

Thanks,
Hang

Qiang Huang  于2022年8月24日周三 19:50写道:
>
> +1. It can help a lot.
>
> Zixuan Liu  于2022年8月22日周一 10:38写道:
>
> > +1
> >
> > Best,
> > Zixuan
> >
> > On 2022/07/12 16:06:43 Nicolò Boschi wrote:
> > > Hi all,
> > >
> > > I'd like to propose to enable the Github button "update branch" in the
> > pull
> > > requests.
> > >
> > > The main reason is that it helps when you want to rebase your pull to the
> > > current master.
> > > For instance today a fix for a failing job in the CI has been committed
> > and
> > > you were forced to close/reopen the pull or manual rebase your branch.
> > With
> > > this button is much easier and it will retrigger the CI as well.
> > >
> > > This is the guide to enable it:
> > >
> > https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch
> > >
> > > This is an example where I activated the button in my own Pulsar fork:
> > > https://github.com/nicoloboschi/pulsar/pull/4
> > >
> > > Thanks,
> > > Nicolò Boschi
> > >
>
>
>
> --
> BR,
> Qiang Huang


Re: [DISCUSS] Consumer reconnection causes repeated consumption messages

2022-09-08 Thread Michael Marshall
I agree that reducing unnecessary duplicates is a good goal.

For the topic unloading case, it might help to think about how we can
improve the protocol. The current handover is very blunt. A new
solution could focus on decreasing duplicate message delivery while
also focusing on decreasing time where the topic is unavailable.

This thread is probably a good requirement when thinking about ways to
improve the topic handover logic, which has been discussed as a
potential load balancing enhancement or a 3.0 enhancement.

> So we should be careful of any operations that would reset the cursor.
> For example, if the user resets the cursor with the admin client. We
> need more detail info on this matter.

This is a great point! In this case, it seems important to redeliver
these messages. (For those unfamiliar, I just confirmed that the
broker disconnects consumers when a cursor is reset.)

Thanks,
Michael

On Thu, Sep 8, 2022 at 9:33 PM Haiting Jiang  wrote:
>
> Hi Bo,
>
> Overall it makes sense to me.
> It is basically the same as broker side deduplication mechanism when
> producing messages, which uses `sequenceId`.
> In your case, messageId is used for deduplication. It should work as
> long as the received messageId increases monotonically.
>
> So we should be careful of any operations that would reset the cursor.
> For example, if the user resets the cursor with the admin client. We
> need more detail info on this matter.
> And I am not sure if there are other operations that would reset the
> cursor implicitly.
>
> Thanks,
> Haiting
>
> On Thu, Sep 8, 2022 at 11:36 PM 丛搏  wrote:
> >
> > Hi Haiting
> >
> > When using cumulative ack, we can save the maximum received MessageId
> > on the consumer client side to filter the message duplication caused
> > by reconnection, if the consumer client process restarts the maximum
> > received MessageId will not exist in the consumer client. This
> > requires the user to be responsible for the received messages, and if
> > the user wants to re-consume the received messages, they need to call
> > `void redeliverUnacknowledgedMessages().` then clear the maximum
> > received MessageId from the consumer client
> >
> > Thanks,
> > Bo
> >
> > Haiting Jiang  于2022年9月8日周四 14:51写道:
> > >
> > > From the user's perspective, I think we should always avoid delivering
> > > repeated messages.
> > > But can the broker tell if the reconnection is caused by topic
> > > unloading or consumer client process restarting?
> > > For the latter case, the message should be redelivered, it's the whole
> > > point of user explicit acking.
> > >
> > > Thanks,
> > > Haiting
> > >
> > > On Thu, Sep 8, 2022 at 10:56 AM 丛搏  wrote:
> > > >
> > > > Hello, Pulsar community:
> > > >
> > > >
> > > > Now the consumer does not filter messages that have already been
> > > > consumed. After consumer reconnection, the broker will dispatch
> > > > messages to the consumer from the markDeletePosition. In Failover and
> > > > Exclusive subscription type, all messages in a topic will be
> > > > dispatched to the same consumer. Let's look at the following example:
> > > >
> > > > ```
> > > >
> > > > @Test
> > > >
> > > > public void testConsumerReconnectionRepeatedConsumeMessage()
> > > > throws Exception {
> > > >
> > > > final String topic = 
> > > > "testConsumerReconnectionRepeatedConsumeMessage";
> > > >
> > > > @Cleanup
> > > >
> > > > Producer producer = 
> > > > pulsarClient.newProducer(Schema.STRING)
> > > >
> > > > .topic(topic).sendTimeout(0,
> > > > TimeUnit.SECONDS).enableBatching(false).create();
> > > >
> > > > @Cleanup
> > > >
> > > > Consumer consumer =
> > > > pulsarClient.newConsumer(Schema.STRING).subscriptionType(SubscriptionType.Exclusive)
> > > >
> > > > .topic(topic).subscriptionName("test").subscribe();
> > > >
> > > >
> > > > // send 5 message
> > > >
> > > > for (int i = 0; i < 5; i++) {
> > > >
> > > > producer.send("test" + i);
> > > >
> > > > }
> > > >
> > > >
> > > > // consumer receive 5 messages
> > > >
> > > > for (int i = 0; i < 5; i++) {
> > > >
> > > > consumer.receive();
> > > >
> > > > }
> > > >
> > > >
> > > > admin.topics().unload(topic);
> > > >
> > > >
> > > > // consumer receive also can receive 5 messages
> > > >
> > > > Message message = null;
> > > >
> > > > for (int i = 0; i < 5; i++) {
> > > >
> > > > message = consumer.receive();
> > > >
> > > > }
> > > >
> > > > consumer.acknowledgeCumulativeAsync(message.getMessageId());
> > > >
> > > > }
> > > >
> > > > ```
> > > >
> > > > Through the above example, the consumer repeatedly consumes the 5
> > > > messages sent by the producer, and acks through cumulative ack. If per
> > > > 1000, 1 messages cumulative ack once, there will be a lot of
> > > > repeated consumption that may be caused by consumer reco

[GitHub] [pulsar] michaeljmarshall added a comment to the discussion: pulsar-daemon start standalone

2022-09-08 Thread GitBox


GitHub user michaeljmarshall added a comment to the discussion: pulsar-daemon 
start standalone

This is the bookkeeper port. You can update it in the `conf/standalone.conf` by 
setting the following to a different port:

```conf
bookiePort=3181
```

GitHub link: 
https://github.com/apache/pulsar/discussions/17550#discussioncomment-3603331


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: Enable "Update branch" Github button in pull requests

2022-09-08 Thread Michael Marshall
That is interesting feedback, Hang. I wasn't familiar with the button
when it appeared (even though I had read this email thread), and I
definitely hit the "update branch" button several times thinking I was
resolving conflicts.

Perhaps this new button contributed to the CI congestion we saw recently?

Thanks,
Michael

On Thu, Sep 8, 2022 at 10:56 PM Hang Chen  wrote:
>
> Hi Nicolo,
>  Thanks for your proposal. After Pulsar enabled the `update
> branch` bottom, I found a lot of Prs are blocked by running unit tests
> due to a lot of unit tests rerun by the `update branch` action. The
> update branch action will cause the CI resource busy and waste a lot
> of resources if contributors or committers click the `update branch`
> bottom many times for each PR. I think we need to add some
> restrictions for the `update branch` bottom. Do you have any ideas?
>
> Thanks,
> Hang
>
> Qiang Huang  于2022年8月24日周三 19:50写道:
> >
> > +1. It can help a lot.
> >
> > Zixuan Liu  于2022年8月22日周一 10:38写道:
> >
> > > +1
> > >
> > > Best,
> > > Zixuan
> > >
> > > On 2022/07/12 16:06:43 Nicolò Boschi wrote:
> > > > Hi all,
> > > >
> > > > I'd like to propose to enable the Github button "update branch" in the
> > > pull
> > > > requests.
> > > >
> > > > The main reason is that it helps when you want to rebase your pull to 
> > > > the
> > > > current master.
> > > > For instance today a fix for a failing job in the CI has been committed
> > > and
> > > > you were forced to close/reopen the pull or manual rebase your branch.
> > > With
> > > > this button is much easier and it will retrigger the CI as well.
> > > >
> > > > This is the guide to enable it:
> > > >
> > > https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch
> > > >
> > > > This is an example where I activated the button in my own Pulsar fork:
> > > > https://github.com/nicoloboschi/pulsar/pull/4
> > > >
> > > > Thanks,
> > > > Nicolò Boschi
> > > >
> >
> >
> >
> > --
> > BR,
> > Qiang Huang


Re: Enable "Update branch" Github button in pull requests

2022-09-08 Thread Lari Hotari
Good points Hang. Yes, let's remove the "update branch" button since it seems 
to lead to unnecessary load for CI.

-Lari

On 2022/09/09 03:55:57 Hang Chen wrote:
> Hi Nicolo,
>  Thanks for your proposal. After Pulsar enabled the `update
> branch` bottom, I found a lot of Prs are blocked by running unit tests
> due to a lot of unit tests rerun by the `update branch` action. The
> update branch action will cause the CI resource busy and waste a lot
> of resources if contributors or committers click the `update branch`
> bottom many times for each PR. I think we need to add some
> restrictions for the `update branch` bottom. Do you have any ideas?
> 
> Thanks,
> Hang
> 
> Qiang Huang  于2022年8月24日周三 19:50写道:
> >
> > +1. It can help a lot.
> >
> > Zixuan Liu  于2022年8月22日周一 10:38写道:
> >
> > > +1
> > >
> > > Best,
> > > Zixuan
> > >
> > > On 2022/07/12 16:06:43 Nicolò Boschi wrote:
> > > > Hi all,
> > > >
> > > > I'd like to propose to enable the Github button "update branch" in the
> > > pull
> > > > requests.
> > > >
> > > > The main reason is that it helps when you want to rebase your pull to 
> > > > the
> > > > current master.
> > > > For instance today a fix for a failing job in the CI has been committed
> > > and
> > > > you were forced to close/reopen the pull or manual rebase your branch.
> > > With
> > > > this button is much easier and it will retrigger the CI as well.
> > > >
> > > > This is the guide to enable it:
> > > >
> > > https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch
> > > >
> > > > This is an example where I activated the button in my own Pulsar fork:
> > > > https://github.com/nicoloboschi/pulsar/pull/4
> > > >
> > > > Thanks,
> > > > Nicolò Boschi
> > > >
> >
> >
> >
> > --
> > BR,
> > Qiang Huang
> 


[GitHub] [pulsar] michaeljmarshall added a comment to the discussion: How do I convert a None-partitioned topic to Partitioned topic

2022-09-08 Thread GitBox


GitHub user michaeljmarshall added a comment to the discussion: How do I 
convert a None-partitioned topic to Partitioned topic

There is not currently a way to convert non-partitioned topics to partitioned 
topics. If you want to use the same name as a topic name and there is no data 
in the current topic (or you're willing to lose the current data), you can 
delete and re-create the topic.

GitHub link: 
https://github.com/apache/pulsar/discussions/17469#discussioncomment-3603519


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [DISCUSS] User-friendly acknowledgeCumulative API on a partitioned topic or multi-topics

2022-09-08 Thread Shivji Kumar Jha
Tarun is a colleague at Nutanix who is very eager to do his first patch in
an apache Project :-)

Regards,
Shivji Kumar Jha
http://www.shivjijha.com/
+91 8884075512


On Wed, 7 Sept 2022 at 17:05, Yunze Xu  wrote:

> Sure. I’m glad to see that. Just a little confused about who is Tarun?
>
> Thanks,
> Yunze
>
>
>
>
> > On Sep 6, 2022, at 17:40, Shivji Kumar Jha  wrote:
> >
> > ++ Tarun
> >
> > Hi Yunze,
> >
> > We would love to have this.
> >
> > ```java
> > // the key is the partitioned topic name like my-topic-partition-0
> > void acknowledgeCumulative(Map topicToMessageId);
> > ```
> >
> > If you are busy with other things, do you mind Tarun taking this up ?
> Happy
> > to have you as a reviewer.
> >
> > Regards,
> > Shivji Kumar Jha
> > http://www.shivjijha.com/
> > +91 8884075512
> >
> >
> > On Sun, 4 Sept 2022 at 21:25, Yunze Xu 
> wrote:
> >
> >> I am busy on other things recently so there is no further update. But
> >> I found there is already two methods to acknowledge multiple messages
> >> in Java client.
> >>
> >> ```java
> >>void acknowledge(Messages messages) throws PulsarClientException;
> >>
> >>void acknowledge(List messageIdList) throws
> >> PulsarClientException;
> >> ```
> >>
> >> And here is the issue to track the catch up:
> >>
> >> https://github.com/apache/pulsar/issues/17428
> >>
> >> Yunze
> >>
> >>
> >>
> >>
> >>> On Sep 4, 2022, at 22:37, Asaf Mesika  wrote:
> >>>
> >>> What eventually happened with this idea?
> >>>
> >>> On Fri, Jul 29, 2022 at 8:02 AM PengHui Li 
> >> wrote:
> >>>
>  +1
> 
>  Penghui
>  On Jul 28, 2022, 20:14 +0800, lordcheng10 <1572139...@qq.com.invalid
> >,
>  wrote:
> > Nice feature!
> >
> >
> >
> >
> > -- Original --
> > From: "Yunze Xu" > Date: 2022Äê7ÔÂ15ÈÕ(ÐÇÆÚÎå) ÍíÉÏ6:04
> > To: "dev" > Subject: [DISCUSS] User-friendly acknowledgeCumulative API on a
>  partitioned topic or multi-topics
> >
> >
> >
> > Hi all,
> >
> > Long days ago I opened a PR to support cumulative acknowledgement
> > for C++ client, but it's controversial about whether should a
> > partitioned consumer acknowledge a message ID cumulatively.
> >
> > See https://github.com/apache/pulsar/pull/6796 for discussion.
> >
> > Currently, the Java client acknowledges the specific partition of the
> > message ID, while the C++ client just fails when calling
> > `acknowledgeCumulative` on a partitioned topic. However, even if the
> > Java client doesn't fail, it's not user friendly.
> >
> > Assuming users called `acknowledgeCumulative` periodically, there is
> a
> > chance that some messages of the specific partition has never been
> > passed to the method.
> >
> > For example, a consumer received:
> >
> > P0-M0, P1-M0, P0-M1, P1-M1, P0-M2, P1-M2...
> >
> > And the user acknowledged every two messages, i.e.
> >
> > P0-M0, P0-M1, P0-M2
> >
> > Eventually, partition 1 has never been acknowledged.
> >
> > User must maintain its own `Map > partitioned topic or multi-topics consumer with the existing
> > `acknowledgeCumulative` API.
> >
> > Should we make it more friendly for users? For example, we can make
> > `acknowledgeCumulative` accept the map to remind users to maintain
> > the map from topic name to message ID:
> >
> > ```java
> > // the key is the partitioned topic name like my-topic-partition-0
> > void acknowledgeCumulative(Map topicToMessageId);
> > ```
> >
> > For those who don't want to maintain the map by themselves, maybe we
> > can provide a simpler API like:
> >
> > ```java
> > // acknowlegde all latest received messages
> > void acknowledgeCumulative();
> > ```
> >
> > and provide an option to enable this behavior.
> >
> > Do you have any suggestion on this idea? I will prepare a proposal if
> > there is no disagreement.
> >
> > Thanks,
> > Yunze
> 
> >>
> >>
>
>


Re: Enable "Update branch" Github button in pull requests

2022-09-08 Thread Nicolò Boschi
That's a good observation. We don't have data so we can't be sure about
that, but I think that it can be a possible cause of the recent congestion.

I opened an INFRA ticket - https://issues.apache.org/jira/browse/INFRA-23679
(.asf.yaml file doesn't support this option)
I will come back to this thread when it's done

 Regards,
Nicolò Boschi


Il giorno ven 9 set 2022 alle ore 06:27 Lari Hotari  ha
scritto:

> Good points Hang. Yes, let's remove the "update branch" button since it
> seems to lead to unnecessary load for CI.
>
> -Lari
>
> On 2022/09/09 03:55:57 Hang Chen wrote:
> > Hi Nicolo,
> >  Thanks for your proposal. After Pulsar enabled the `update
> > branch` bottom, I found a lot of Prs are blocked by running unit tests
> > due to a lot of unit tests rerun by the `update branch` action. The
> > update branch action will cause the CI resource busy and waste a lot
> > of resources if contributors or committers click the `update branch`
> > bottom many times for each PR. I think we need to add some
> > restrictions for the `update branch` bottom. Do you have any ideas?
> >
> > Thanks,
> > Hang
> >
> > Qiang Huang  于2022年8月24日周三 19:50写道:
> > >
> > > +1. It can help a lot.
> > >
> > > Zixuan Liu  于2022年8月22日周一 10:38写道:
> > >
> > > > +1
> > > >
> > > > Best,
> > > > Zixuan
> > > >
> > > > On 2022/07/12 16:06:43 Nicolò Boschi wrote:
> > > > > Hi all,
> > > > >
> > > > > I'd like to propose to enable the Github button "update branch" in
> the
> > > > pull
> > > > > requests.
> > > > >
> > > > > The main reason is that it helps when you want to rebase your pull
> to the
> > > > > current master.
> > > > > For instance today a fix for a failing job in the CI has been
> committed
> > > > and
> > > > > you were forced to close/reopen the pull or manual rebase your
> branch.
> > > > With
> > > > > this button is much easier and it will retrigger the CI as well.
> > > > >
> > > > > This is the guide to enable it:
> > > > >
> > > >
> https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch
> > > > >
> > > > > This is an example where I activated the button in my own Pulsar
> fork:
> > > > > https://github.com/nicoloboschi/pulsar/pull/4
> > > > >
> > > > > Thanks,
> > > > > Nicolò Boschi
> > > > >
> > >
> > >
> > >
> > > --
> > > BR,
> > > Qiang Huang
> >
>