[GitHub] [pulsar] shubham-Shole4ever created a discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user shubham-Shole4ever created a discussion: Support for long running 
message consumer

**Is your feature request related to a problem? Please describe.**
The ackTimeout is set at the consumer level and is valid for all the messages 
that consumer handles. We have a case where the consumption of a message takes 
an unpredictable amount of time, ranging from 10 mins to couple hours. We also 
don't want to set the ackTimeout for the messages to be max possible (which 
could be half a day or more).
Can we have a feature where the consumer can send back a signal to the broker, 
acknowledging that its not failed but currently working on the received 
message, and the broker extends the ackTimeout for that message.

**Describe the solution you'd like**
A functionality which allows the consumer to notify the broker that it is 
working on the received message. The broker, on receiving this signal can 
extend the ackTimeout for that particular message (probably refreshing the 
ackTimeout)

**Describe alternatives you've considered**
Currently, there is no way to modify the ackTimeout for a particular message. 
The ackTimeout is set at the consumer level and cannot be modified for any 
message.


GitHub link: https://github.com/apache/pulsar/discussions/18456


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] shubham-Shole4ever added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user shubham-Shole4ever added a comment to the discussion: Support for 
long running message consumer

@codelipenghui I had a look at the negative-acknowledgement. This will still 
not work if my ackTimeout is set to 10 mins and the message I am consuming is 
taking 30 mins (for e.g.). The broker will resurface the message after 10 mins, 
even though one of the consumer is still working on it. 
I want to avoid this scenario. My proposal is to have something like a 
"working(messageId)" functionality on the consumer, which notifies the broker 
not to timeout (and resurface) the message, but rather extend/refresh the 
ackTimeout set for the concerned messageId.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133175


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] codelipenghui added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: Support for long 
running message consumer

Please take a look at this document which may help you.  
http://pulsar.apache.org/docs/en/concepts-messaging/#negative-acknowledgement

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133174


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] codelipenghui added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: Support for long 
running message consumer

@shubham-Shole4ever 
You can disable ack timeout, just use ack/negative ack. It means explicitly 
telling the broker that the process failed and then the broker redeliver this 
message, if message is in progress, no need to ack/negative ack.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133177


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] shubham-Shole4ever added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user shubham-Shole4ever added a comment to the discussion: Support for 
long running message consumer

@codelipenghui 
But if my application crashes while it is processing the message, it'll never 
be able to ack/negative ack that message ever. This'll result in that message 
never being retried. This is the exact scenario why I cannot ditch the 
ackTimeout as well.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133178


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] merlimat added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user merlimat added a comment to the discussion: Support for long 
running message consumer

@shubham-Shole4ever When a consumer crashes, or the TCP connection is broken, 
the messages that were delivered to this consumer and not acked, will be 
replayed to another available consumer (in case of shared subscriptions) or 
next time the consumer reconnects. 

You don't need ack timeout for that.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133179


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] shubham-Shole4ever added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user shubham-Shole4ever added a comment to the discussion: Support for 
long running message consumer

@sijie I can do with the workaround suggested by @codelipenghui and @merlimat 
for the time being. However, as mentioned, the solution will not work in case I 
also have a need of ackTimeout.
Would request the community to propose a feature to handle such cases for 
future.

Thanks @codelipenghui and @merlimat for all the help. :)

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133181


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sijie added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user sijie added a comment to the discussion: Support for long running 
message consumer

@shubham-Shole4ever does Matteo's comment make sense to you?

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133180


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] benbro added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user benbro added a comment to the discussion: Support for long running 
message consumer

There is still no good solution for retrying long running jobs.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133184


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] harissecic added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user harissecic added a comment to the discussion: Support for long 
running message consumer

> There is still no good solution for retrying long running jobs.

I think there's plenty of _good enough_ workarounds but I agree there should be 
one optimal for long running consumers out-of-the-box. Just to list a few:
1. Using message properties on consumers with `reconsumeLater` - not sure which 
version starts to support this feature but adding properties to the message 
like `processing=true` and later `isDone=true` would require just a little 
extra code to check these properties before even trying to consume the message. 
If done is set to true simply ack message and move to the next.
2. Using readers with similar approach where message metadata/properties are 
read. In some cases consumers are not needed and using reader is a bit more 
simpler but in others we do really want the consumer - so not really a 
workaround in context of this case.
3. Combining DLQ with `negAck` and later processing DLQ with extra custom code 
to check if something was done already. Putting max redelivery to 1 would make 
message automatically on the next retry going directly to DLQ after timeout. 
This of course would require local concurrent cache where you keep processing 
ID-s in runtime memory and check them on message arrivals so you can simply 
negAck message if it's still processing. This way after processing actual 
message consumer can trigger "removing" message from DLQ. This would support 
both ackTimout and manually handling timeouts.
4. Trying to cache everything in DB or such and looking for messageIds, started 
processing time, allowed timeouts, ... Upon receiving message check this list 
and determine whether the message is being processed still or failed and this 
was a consumer restart.

I assume some kind of 3 would be good to have out-of-the-box. Best of course 
would be to have something like LRQ (long running queue for the lack of 
creativity from my side) where upon retry of ackTimeout consumer has the option 
to send back the message to broker like 'still processing' and it moves message 
to this queue and have Pulsar track if TCP dies, push them back to normal queue 
and retry, if TCP is alive let consumer tell when this message should be 
removed. Using DLQ for this is also possible but confuses messages that where 
retired too much and the ones that consumer is aware take too long.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133185


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] tisonkun added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user tisonkun added a comment to the discussion: Support for long 
running message consumer

Closed as answered.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133183


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] harissecic added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user harissecic added a comment to the discussion: Support for long 
running message consumer

Stumbled upon this looking for another answer. However in case it helps anyone 
I'll leave a comment. For such cases I guess it's possible to combine DLQ with 
ackTimeout. Default value I use is 3. Although I don't use auto-ack I guess it 
will still work the same way. If message times-out 3 times (in this case) it 
will automatically go to Dead Letter Queue. This will prevent it to loop 
endlessly between services. My example is that I'm building up a module for a 
framework. Now in such case I don't do ackTimeout but let users set it if they 
want to. However, I do by default set 3 retries before DLQ. Reason was personal 
experience where it endlessly looped my test message to the shared consumers 
and I got error logs all the time and couldn't figure it out. Then I realised 
well message is simply getting negativeAck from each consumer and then 
redelivered all the time but funny thing is it was malformed JSON message so 
consumers were doomed to crash (validations falied for previously nulla
 ble thing in kotlin that I moved to non-null). When I set up DLQ to 3 I had 
some messages fail and then get re-read due to timeout for ack. But combining 
DLQ, ackTimeout, and shared consumers I think you can set timeout pretty low if 
processing data takes less time and you do manual ACK as soon as it's done.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133182


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] tisonkun added a comment to the discussion: Support for long running message consumer

2022-11-14 Thread GitBox


GitHub user tisonkun added a comment to the discussion: Support for long 
running message consumer

I'm moving this discussion to the Discussions forum since it's an open-ended 
discussion instead of an actionable task :)

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133186


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-14 Thread Enrico Olivelli
Il giorno lun 14 nov 2022 alle ore 05:57 Kai Wang  ha scritto:
>
> Hi, pulsar-dev community,
>
> Since the non-persistent topic support doesn't require API changes. I have 
> pushed a PR to implement it, which has already been merged.
>
> See: https://github.com/apache/pulsar/pull/18375

Perfect

Thanks
Enrico

>
> And this PIP title has been changed to `Make TableView support TTL`.
>
> PIP link: https://github.com/apache/pulsar/issues/18229
>
> Thanks,
> Kai
>
> On 2022/11/04 02:28:41 Kai Wang wrote:
> > Hi, pulsar-dev community,
> >
> > I’ve opened a PIP to discuss : PIP-221: Make TableView support read the 
> > non-persistent topic.
> >
> > PIP link: https://github.com/apache/pulsar/issues/18229
> >
> > Thanks,
> > Kai
> >


Re: [ANNOUNCE] New Committer: Zili Chen

2022-11-14 Thread Zixuan Liu
Congrats! Zili


Best,
Zixuan

Qiang Huang  于2022年11月14日周一 14:31写道:

> Congratulations!
>
> Zike Yang  于2022年11月14日周一 13:45写道:
>
> > Hi, tison,
> >
> > Congratulations and welcome!
> >
> > BR,
> > Zike Yang
> >
> > On Mon, Nov 14, 2022 at 12:58 PM Kai Wang 
> > wrote:
> > >
> > > Congratulations! tison
> > >
> > > Thanks,
> > > Kai
> > > On Nov 10, 2022 at 8:16 AM +0800, dev@pulsar.apache.org, wrote:
> > > >
> > > > Congratulations! tison
> >
>
>
> --
> BR,
> Qiang Huang
>


Re: [ANNOUNCE] New Committer: Lin Chen

2022-11-14 Thread Zixuan Liu
Congrats! Lin

Best,
Zixuan

Enrico Olivelli  于2022年11月14日周一 15:06写道:

> Congratulations
>
> Enrico
>
> Il Lun 14 Nov 2022, 07:30 Qiang Huang  ha
> scritto:
>
> > Congratulations!
> >
> > houxiaoyu  于2022年11月14日周一 13:40写道:
> >
> > > Congrats!
> > >
> > > Best,
> > > Xiaoyu Hou
> > >
> > > PengHui Li  于2022年11月14日周一 12:28写道:
> > >
> > > > The Project Management Committee (PMC) for Apache Pulsar has invited
> > > > Lin Chen (https://github.com/lordcheng10)
> > > > to become a committer and we are pleased to announce that he has
> > > accepted.
> > > >
> > > > Being a committer enables easier contribution to the
> > > > project since there is no need to go via the patch
> > > > submission process. This should enable better productivity.
> > > >
> > > > Welcome and congratulations, Lin Chen!
> > > >
> > > > Please join us in congratulating and welcoming Lin Chen onboard!
> > > >
> > > > Best Regards,
> > > > Penghui on behalf of the Pulsar PMC
> > > >
> > >
> >
> >
> > --
> > BR,
> > Qiang Huang
> >
>


Re: Releasing current master as Pulsar 2.11.0 ?

2022-11-14 Thread guo jiwei
I found out that several PRs have been unable to cherry-pick to 2.11 today.
I agree to cut the new branch based on the master and turn off the
new/unstable features in branch-2.11.



Regards
Tboy


On Fri, Nov 4, 2022 at 1:00 PM Dave Fisher  wrote:

> Inline
>
> Sent from my iPhone
>
> > On Nov 3, 2022, at 6:55 AM, Enrico Olivelli  wrote:
> >
> > PengHui,
> >
> >> Il giorno mar 1 nov 2022 alle ore 07:51 PengHui Li
> >>  ha scritto:
> >>
> >>> As it is, we already need to discuss EOL for 2.7 and 2.8.
> >>
> >> Agree. We should clarify this one.
> >> I think we can stop to provide new releases for 2.7
> >> and only security or critical bugs for 2.8 (one more official release)
> >>
> >> https://github.com/apache/pulsar/issues/15966 will make the
> >> release strategy clear.
> >>
> >> LTS -> 36 months (24 + 12)
> >> Feature release -> 6 months (3+3)
> >>
> >> Thanks,
> >> Penghui
> >>
> >> On Tue, Nov 1, 2022 at 2:15 PM Michael Marshall 
> >> wrote:
> >>
> >>> I am concerned that we have too many active release branches, and
> planning
> >>> to follow 2.11.0 with 3.0.0 soon after feels like it will make that
> problem
> >>> worse. As it is, we already need to discuss EOL for 2.7 and 2.8.
> >>>
> >>> Thanks,
> >>> Michael
> >>>
>  On Mon, Oct 31, 2022 at 7:55 PM PengHui Li 
> wrote:
> >>>
>  Releasing from the master branch will bring more uncertainty, no?
>  We have fixed many regressions that were introduced to branch-2.11.
>  If we cut a new branch-2.11 based on the master branch. Maybe new
>  regressions
>  will happen again. This may make us wait another month to have a
> 2.11.0
>  release.
> >
> > I am not sure.
> > I don't know if anyone is actively testing the 2.11 branch more than
> > the master branch.
> > On my side the (automated) testing that I do with my colleagues on
> > branch-2.11 is basically the same as for the master branch.
> >
> > I believe that if we want to cut a 2.11 release that is not branched
> > again from the master branch
> > we really must start the release as soon as BK 4.15.3 is released
>
> I understand that Bookkeeper issues have Ben what’s blocking 2.11
> >
> > Many people contributed features to the master branch that cannot be
> > shipped with 2.11 because
> > they are considered "breaking changes".
> > But 2.11 was supposed to be released in August, more than 3 months ago.
>
> I think we can recognize that our past history has been that there are
> often 3 or 4 RCs for our 2.x.0 releases.
>
> Maybe we should be cherry picking some PRs on master to 2.11 before we
> start the process? It may or may not save an RC but it will give us time to
> be realistic about a reasonable cadence from 2.10.x to 2.11.x to 2.12.x …
> it’s hard to support many versions at once. The CVE announced today took
> months to be included in all of our current releases from 2.7.5 to 2.10.2.
> Separation of C++ and Pulsar client releases from Pulsar releases helps
> here, but it may not with the next security issue.
>
> Regards,
> Dave
> >
> >
> > Enrico
> >
> >
> 
>  IMO, we can start Pulsar 3.0 (follow
>  https://github.com/apache/pulsar/issues/15966)
>  after 2.11.0 is released instead of waiting for 3 more months.
> 
>  For https://github.com/apache/bookkeeper/issues/3466
>  I don't think it's a blocker for the Pulsar release for now.
>  Yes, it is worth investigating more. We also tried a chaos test for
> that
>  case.
>  We haven't reproduced the problem on Pulsar.
> 
>  Now, we are just waiting for the new BookKeeper release 4.15.3 since
> >>> 4.15.2
>  has regressions [1]
> 
>  [1] https://github.com/apache/bookkeeper/pull/3523
> 
>  Thanks,
>  Penghui
> 
>  On Tue, Nov 1, 2022 at 3:10 AM Michael Marshall  >
>  wrote:
> 
> > I have not followed the branch-2.11 work closely, but I think it
> makes
> > sense to re-create branch-2.11 from the current master.
> >
> > We created branch-2.11 almost 3 months ago. Re-creating the branch
> > will prevent unnecessary delay on new features added over the past 3
> > months.
> >
> > If we follow through with this proposal, we will need to clean up PR
> > tags and milestones to prevent confusion.
> >
> > Thanks,
> > Michael
> >
> > On Mon, Oct 31, 2022 at 3:31 AM Enrico Olivelli  >
> > wrote:
> >>
> >> Hello Pulsar fellows,
> >>
> >> I think that too much time passed since we wanted to cut 2.11.
> >>
> >> The branch-2.11 contains some code used by no one.
> >>
> >> In the meantime many features went into master branch,
> >>
> >> I don't think that it is worth it to cut a release from branch-2.11
> >> and start with something that is already stale.
> >>
> >> I propose to drop branch-2.11 and create a new branch out of the
> >> current master branch and start the period of hardening before
> >>> cutting
> >> the release.
> >>

[GitHub] [pulsar] kitekite2020 created a discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user kitekite2020 created a discussion: Active-active and no data loss. 
Is it possible?

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've 
considered.

**Additional context**
Add any other context or screenshots about the feature request here.


GitHub link: https://github.com/apache/pulsar/discussions/18457


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] codelipenghui added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: Active-active and 
no data loss. Is it possible?

> We are thinking of geo replication. But since geo replication is 
> asynchronous, we are worried about message loss. Every single message cannot 
> be lost.

The geo-replication is asynchronous, but it does not result in message loss. 
The replication state is stored in a cursor(ledger backend), even the broker 
crash, the replication state not lost. So the new broker can resume the 
replicate task.

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133536


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] kitekite2020 added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user kitekite2020 added a comment to the discussion: Active-active and 
no data loss. Is it possible?

We would like to have pulsar clusters in two data centers. They are active and 
active. And we would like to replicate messages from one site to the other. 
i.e. the subscriber from one site will receive messages from publishers at two 
sites. And no message loss. If one site is down, the other site can continue to 
work. 
We are thinking of geo replication. But since geo replication is asynchronous, 
we are worried about message loss. Every single message cannot be lost.  
Then we are thinking to have synchronous replication, at bookkeeper level, 
cross two sites. First difficulty is there is no examples of how to do it 
without global zookeeper. If we use local clusters to manage each site, we need 
a global configuration store to manage two sites information which becomes a 
single point failure. If we use global zookeeper, one zookeeper for two sites, 
due to zookeeper cluster has to be 2*n+1 nodes, it means if the site with n+1 
node fails, the other site also stops working.
Is my understanding right? Is there a possible solution with Pulsar for us? 
Thanks.   

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133535


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] kitekite2020 added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user kitekite2020 added a comment to the discussion: Active-active and 
no data loss. Is it possible?

> Is the message replication guaranteed? 
The replication state is stored in a cursor(ledger backend) -> Does this mean 
it is stored in bookie? What if this bookie crashes?

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133537


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] kitekite2020 added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user kitekite2020 added a comment to the discussion: Active-active and 
no data loss. Is it possible?

> Yes, it is stored in bookies(data is replicated in the certain bookies). So 
> you can increase the data replicas to avoid data loss.

Thank you. So I understand from you that there is a mechanism to trace or 
manage which message is replicated and make sure all messages are replicated. 
That is great.
Is there a mechanism to trace which is message replicated and received by the 
target cluster/site?
I am also trying if possible to use grafana and prometheus to validate if there 
is message loss. Any advice here? 

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133539


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] codelipenghui added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: Active-active and 
no data loss. Is it possible?

Yes, the metrics related to the replication is described at 
https://pulsar.apache.org/docs/en/reference-metrics/#replication-metrics-1, and 
https://github.com/streamnative/apache-pulsar-grafana-dashboard already 
contains message replication related dashboards. You can checkout and try it.

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133540


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] codelipenghui added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: Active-active and 
no data loss. Is it possible?

Yes, it is stored in bookies(data is replicated in the certain bookies). So you 
can increase the data replicas to avoid data loss.

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133538


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] kitekite2020 added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user kitekite2020 added a comment to the discussion: Active-active and 
no data loss. Is it possible?

Would like to clarify for geo-replication message delivery. Is it more like 
TCP, every message got acknowledged when it is replicated successfully to the 
other site, or more like UDP, fire and forget? 

GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133542


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] kitekite2020 added a comment to the discussion: Active-active and no data loss. Is it possible?

2022-11-14 Thread GitBox


GitHub user kitekite2020 added a comment to the discussion: Active-active and 
no data loss. Is it possible?

> Yes, the metrics related to the replication is described at 
> https://pulsar.apache.org/docs/en/reference-metrics/#replication-metrics-1, 
> and https://github.com/streamnative/apache-pulsar-grafana-dashboard already 
> contains message replication related dashboards. You can checkout and try it.

Thanks. Was trying this dashboard. Had some problems and will figure out why.




GitHub link: 
https://github.com/apache/pulsar/discussions/18457#discussioncomment-4133541


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] baynes created a discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user baynes created a discussion: Messages lost with new topic and regex 
subscription

**Describe the bug**
When a new topic is detected by a regexp subscription it takes time before the 
subscriptions cursor is set up for that topic. As the cursor is set to the end 
of the topic this means at least one message is lost and as this can take 40 
seconds, one could lose 40 seconds of data.


**To Reproduce**

If I set up a consumer with a regex subscription, for example:

`/opt/pulsar/bin/pulsar-client consume --regex '.*' -s all -n 0`

I then send a message on a **NEW** topic that matches the regex.

`/opt/pulsar//bin/pulsar-client produce addtopic -m 'm1'`

The consumer detects the new topic and sets up a subscription to it. This can 
take 30-40 seconds. However it does not see the message (or any other messages 
sent befor the subscription is set up)

Once it is set up, sending more data to the topic will be picked up by the 
consumer.

`/opt/pulsar//bin/pulsar-client produce addtopic -m 'm2'`

The consumer will display the message 'm2'.

So though it works from now on, potentially the first 40 seconds of data have 
been lost.

**Expected behavior**
All messages sent to the new topic should be seen by the consumer.

**Screenshots**
N/A

**Desktop (please complete the following information):**
 Centos 7
Pulsar 2.5.0, 2.5.1

**Additional context**

The initial message(s) are on the topic, one can see them with a reader. So a 
solution would be for the cursor for the new topic subscription be created 
pointing to the start of the topic rather then the normal end in this case.


GitHub link: https://github.com/apache/pulsar/discussions/18458


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] baynes added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user baynes added a comment to the discussion: Messages lost with new 
topic and regex subscription

> 
> 
> Might be some overlap with #6531

Further thoughts.
I think that is more to do with topics created before function creation/start - 
and the initial_position option of the consumer is enough to handle it so long 
as you can get it through. This is to do with topics created after 
function/consumer creation/start - where one wants it to behave as if the 
cursor setup on the new topic is instantaneous on the topic creation.


GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133558


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sijie added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user sijie added a comment to the discussion: Messages lost with new 
topic and regex subscription

@baynes You can specify SubscriptionInitialPosition.earliest when you create 
the regex subscription.

GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133559


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] baynes added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user baynes added a comment to the discussion: Messages lost with new 
topic and regex subscription

Might be some overlap with #6531

GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133557


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] BewareMyPower added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user BewareMyPower added a comment to the discussion: Messages lost with 
new topic and regex subscription

The same applies to a partitioned consumer. IMO, when a consumer found new 
topics/partitions, the subscription initial position **should be changed to 
earliest** no matter what the original initial position is.

Usually consumers use latest initial position to discard outdated messages. 
However, assuming that partitions were dynamic increased, i.e. there're some 
producers and consumers serving this partitioned topic currently. If producers 
found the increased partitions before consumers, in consumer's view, those 
messages before it consumes **shouldn't be considered outdated**.

What do you think of this change? @sijie 

GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133562


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sijie added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user sijie added a comment to the discussion: Messages lost with new 
topic and regex subscription

@baynes noted.

GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133561


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] baynes added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user baynes added a comment to the discussion: Messages lost with new 
topic and regex subscription

I guess that would do it. It changes the behavior for topics created while the 
client is down or before it is started for the first time. It is probably 
useful to read all the messages on those when the client comes up - but there 
could be cases where it is undesirable when the client starts for the first 
time.

If we do go that way then  #6531 must be fixed as we are actually using 
functions.

GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133560


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] vitosans added a comment to the discussion: Messages lost with new topic and regex subscription

2022-11-14 Thread GitBox


GitHub user vitosans added a comment to the discussion: Messages lost with new 
topic and regex subscription

@sijie - What is the official position on this? Is it suggested to use 
earliest? I see it has gone stale and has not been updated for two years. We 
are running into this issue, which is counter-intuitive to how a queue should 
work. 

GitHub link: 
https://github.com/apache/pulsar/discussions/18458#discussioncomment-4133563


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] bigbang489 created a discussion: [Pulsar Client] Is there a way to specify a schema version instead of uploading SchemaInfo to broker?

2022-11-14 Thread GitBox


GitHub user bigbang489 created a discussion: [Pulsar Client] Is there a way to 
specify a schema version instead of uploading SchemaInfo to broker?

**Is your enhancement request related to a problem? Please describe.**
When conumser/producer connect to broker, it has to upload the schema info 
(include JSON Schema) to broker for checking schema compatibility, in case the 
JSON Schema is large, it will become inefficient. I know it is only uploaded 
when consumer/producer connect to broker, but in my scenario, the client needs 
to connect and close connection everytime it consume/produce message. Is it 
better if we just specify the schema version instead of whole SchemaInfo?

**Describe the solution you'd like**
In case of the schema version has been registered, the client just need to 
specify the schema version of message it going to consume/produce


GitHub link: https://github.com/apache/pulsar/discussions/18459


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] sijie added a comment to the discussion: [Pulsar Client] Is there a way to specify a schema version instead of uploading SchemaInfo to broker?

2022-11-14 Thread GitBox


GitHub user sijie added a comment to the discussion: [Pulsar Client] Is there a 
way to specify a schema version instead of uploading SchemaInfo to broker?

> Is it better if we just specify the schema version instead of whole 
> SchemaInfo?

We send the schema definition when a client connect to a broker, so the broker 
can verify the compatibility of a schema. If we only send the schema version, 
there is no way for us to verify if the client is using the right schema. 
Because a client can just provide a random schema number and produce the data 
with a different schema.

> but in my scenario, the client needs to connect and close connection 
> everytime it consume/produce message. 

Can you explain a bit more about your use case? I would like to understand more 
why you need to connect and close connection every time it consumes and produce 
messages. This is an anti-pattern of using Pulsar.


GitHub link: 
https://github.com/apache/pulsar/discussions/18459#discussioncomment-4133617


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] bigbang489 added a comment to the discussion: [Pulsar Client] Is there a way to specify a schema version instead of uploading SchemaInfo to broker?

2022-11-14 Thread GitBox


GitHub user bigbang489 added a comment to the discussion: [Pulsar Client] Is 
there a way to specify a schema version instead of uploading SchemaInfo to 
broker?

We are building an adapter to integrate Pulsar to SAP Integraion system. Sender 
adapter will consume message from a topic, Receiver adapter will produce 
message to a topic. For Receiver adapter, there is no certain time when it is 
called, if it's called once a month and keeping connection open for a long time 
to just send a message a month is not a good idea. That is why we decided to 
close and re-connect to broker everytime.
Btw, Is there any way to tell the Pulsar client (that has producers only) that 
it should close the connectivity if there is no message to publish after an 
amount of time, and re-connect to brokers when it needs to send a message again?
For the 

> If we only send the schema version, there is no way for us to verify if the 
> client is using the right schema. Because a client can just provide a random 
> schema number and produce the data with a different schema.

There is no guarantee that the encoded data sent to brokers is using correct 
schema in case they using their own custom Schema implementation. I think 
skipping verifying SchemaInfo will make sense, if the client itself can assures 
the correctness of schema version. By doing this, not only faster connectivity 
establishment, but also be able to send messages with diffrent versions without 
creating new producers for each version.

GitHub link: 
https://github.com/apache/pulsar/discussions/18459#discussioncomment-4133618


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: Data quality problem

2022-11-14 Thread Elliot West
While we can get caught up in the specifics of exactly how JSON Schema is
supported in the Kafka ecosystem, it is ultimately possible if desired, and
is common, even if not part of open-source Apache Kafka.

Devin's assertion is that JSON Schema compliant payload validation
and schema evolution are not currently supportable in the Pulsar ecosystem
and that perhaps they should be.

Elliot.


On Fri, 11 Nov 2022 at 14:56, Elliot West 
wrote:

> Hey Devin,
>
> *"Kafka conforms to the JSON Schema specification"*
> Only when using Confluent's Schema Registry.
>
> *"if a producer makes a change or omission, such as in a value used for
> tracking, it might not surface until way down the line"*
> So let me understand this: Although the producer has a schema, it does not
> use it for validation of JSON (as would implicitly occur for Avro? Is this
> correct?
>
> I agree that robust support for schema, certainly at the edges, is a
> cornerstone for a data system. I also agree that it would be better to
> adopt existing standards rather than implement them in a bespoke manner.
>
> I'd be interested to hear your thoughts on concrete improvements that you
> believe would be necessary - for example:
>
> * Producer validation of JSON occurs using "JSON Schema"
> * Evolutions of JSON Schema conform to ...
> * Users can declare topic schema using a JSON Schema document
> * Users can query topic schema and have a JSON schema document returned to
> them
>
> Thanks,
>
> Elliot.
>
>
>
>
>
>
> On Thu, 10 Nov 2022 at 16:51, Devin Bost  wrote:
>
>> One of the areas where Kafka has an advantage over Pulsar is around data
>> quality. Kafka conforms to the JSON Schema specification, which enables
>> integration with any technology that conforms to the standard, such as for
>> data validation, discoverability, lineage, versioning, etc.
>> Pulsar's implementation is non-compliant with the standard, and producers
>> and consumers have no built-in way in Pulsar to validate that values in
>> their messages match expectations. As a consequence, if a producer makes a
>> change or omission, such as in a value used for tracking, it might not
>> surface until way down the line, and then it can be very difficult to
>> track
>> down the source of the problem, which kills the agility of teams
>> responsible for maintaining apps using Pulsar. It's also bad PR because
>> then incidents are associated with Pulsar, even though the business might
>> not understand that the data problem wasn't necessarily caused by Pulsar.
>>
>> What's the right way for us to address this problem?
>>
>> --
>> Devin Bost
>> Sent from mobile
>> Cell: 801-400-4602
>>
>
>
> --
>
> Elliot West
>
> Senior Platform Engineer
>
> elliot.w...@streamnative.io
>
> streamnative.io
>
> 
> 
> 
>


-- 

Elliot West

Senior Platform Engineer

elliot.w...@streamnative.io

streamnative.io






[GitHub] [pulsar] codelipenghui added a comment to the discussion: Question: copy or move topics between Pulsar clusters

2022-11-14 Thread GitBox


GitHub user codelipenghui added a comment to the discussion: Question: copy or 
move topics between Pulsar clusters

The issue had no activity for 30 days, mark with Stale label.

GitHub link: 
https://github.com/apache/pulsar/discussions/18461#discussioncomment-4136189


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] oversearch created a discussion: Question: copy or move topics between Pulsar clusters

2022-11-14 Thread GitBox


GitHub user oversearch created a discussion: Question: copy or move topics 
between Pulsar clusters

**Is your feature request related to a problem? Please describe.**
I'm wondering if it's currently possible to copy or move a topic that exists in 
one Pulsar cluster to another independent cluster?  

For example, say I have a topic with a long retention time (perhaps backed by 
tiered storage in S3) associated with some entity in my business logic that 
exists specifically in one AWS region in a multi-region setup, with a cluster 
per region.  Let's say that management of that entity needs to move to a 
different region, and thus I'd need to move that topic and its producers and 
consumers to another region to maintain locality.  

Is there an official / easy way to do this?

**Describe the solution you'd like**

Ideally there would be an admin command to safely copy a complete topic as-is 
from one cluster to another, with metadata and consumer positions preserved 
(assuming that an identical namespace existed in the destination cluster).  But 
a documented workflow would satisfy my needs.

**Describe alternatives you've considered**

I could of course simply open a reader to the source topic and a producer on 
the destination, and then stream the messages over.  This has two major 
shortcomings though: 1) Some of the metadata will presumably change, such as 
the message IDs and the "publish timestamp", which I don't think I could force 
to be preserved. 2) If a regex consumer on the destination region picks up the 
newly created topic shortly after it's created, it will presumably start 
receiving the copied messages as if they're brand new, which is undesirable.  
While it's possible to manually expire messages with the admin API, that would 
require a great deal of coordination if the process is happening live, and I'm 
not sure that it could be guaranteed that none of the copied messages would be 
delivered to existing consumers on the destination.

I also considered the possibility that if the topic existed in its own 
namespace, I might be able to set up cross-region replication and have it 
replicated to the destination.  But I'm not sure if messages already marked 
"received" would be replicated in this situation?  The docs don't seem to be 
clear on this, but now that I'm re-reading them, the answer seems to be "no"...

Beyond that I was thinking about manipulating the Bookkeeper ledger files 
directly, but that seems to be getting way into the weeds...

Any advice you could provide would be helpful, thank you!


GitHub link: https://github.com/apache/pulsar/discussions/18461


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: Request a site ID and tracking code for Apache Pulsar

2022-11-14 Thread Martijn Visser
Hi Tison,

We currently don't offer account creation outside of the Privacy group.
However, feel free to send me your requirements for the segment and I'll
create it for you :)

Thanks,

Martijn

On Thu, Nov 10, 2022 at 7:07 AM tison  wrote:

> Hi Martijn,
>
> Thanks for your help! We successfully integrate Matomo to the Pulsar
> website now.
>
> I'd like to know how to create an account to add segments or further
> analysis board. It seems an Apache account is not carried to Matomo account
> system.
>
> Best,
> tison.
>
>
> tison  于2022年11月7日周一 16:49写道:
>
>>
>> Hi Martijn,
>>
>> Thank you!
>> Best,
>> tison.
>>
>>
>> Martijn Visser  于2022年11月7日周一 16:44写道:
>>
>>> Hi Tison,
>>>
>>> It most certainly is but I haven't had the time yet to reply yet. My
>>> apologies!
>>>
>>> Here's the tracking code for Pulsar. The code will need to be integrated
>>> in
>>> any page you want to track by adding it before the  tag. You can
>>> find the results at https://analytics.apache.org
>>>
>>> 
>>> 
>>>   var _paq = window._paq = window._paq || [];
>>>   /* tracker methods like "setCustomDimension" should be called before
>>>   "trackPageView" */
>>>   /* We explicitly disable cookie tracking to avoid privacy issues */
>>>   _paq.push(['disableCookies']);
>>>   _paq.push(['trackPageView']);
>>>   _paq.push(['enableLinkTracking']);
>>>   (function() {
>>> var u="https://analytics.apache.org/";;
>>> _paq.push(['setTrackerUrl', u+'matomo.php']);
>>> _paq.push(['setSiteId', '32']);
>>> var d=document, g=d.createElement('script'),
>>> s=d.getElementsByTagName('script')[0];
>>> g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
>>>   })();
>>> 
>>> 
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> Op ma 7 nov. 2022 om 09:17 schreef tison :
>>>
>>> > Hi,
>>> >
>>> > Is here the correct place to send this request to?
>>> >
>>> > Best,
>>> > tison.
>>> >
>>> >
>>> > Dave Fisher  于2022年11月2日周三 00:36写道:
>>> >
>>> > > - privacy.
>>> > >
>>> > > THANK YOU!
>>> > >
>>> > > Sent from my iPhone
>>> > >
>>> > > > On Nov 1, 2022, at 9:21 AM, tison  wrote:
>>> > > >
>>> > > > Hi Privacy Team,
>>> > > >
>>> > > > As proposed in https://github.com/apache/pulsar/issues/15664, the
>>> > Apache
>>> > > > Pulsar community is actively migrating from Google Analytics to the
>>> > > Matomo
>>> > > > solution.
>>> > > >
>>> > > > Reading from https://privacy.apache.org/matomo/, I send this
>>> email to
>>> > > > request a site ID and tracking code for Apache Pulsar. I think this
>>> > > setting
>>> > > > is public and the motivation is provided.
>>> > > >
>>> > > > dev@pulsar.a.o in cc. Please correct me if more prerequisites are
>>> > > needed.
>>> > > >
>>> > > > Best,
>>> > > > tison.
>>> > >
>>> > >
>>> >
>>>
>>


[GitHub] [pulsar] hwittenborn added a comment to the discussion: deb packages

2022-11-14 Thread GitBox


GitHub user hwittenborn added a comment to the discussion: deb packages

Would [makedeb](https://makedeb.org) be of any interested for this? It's a 
project I personally lead, but I think it could greatly reduce the maintenance 
cost of having a Debian package.

GitHub link: 
https://github.com/apache/pulsar/discussions/18464#discussioncomment-4137545


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] github-actions[bot] added a comment to the discussion: deb packages

2022-11-14 Thread GitBox


GitHub user github-actions[bot] added a comment to the discussion: deb packages

The issue had no activity for 30 days, mark with Stale label.

GitHub link: 
https://github.com/apache/pulsar/discussions/18464#discussioncomment-4137546


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] pablopla added a comment to the discussion: deb packages

2022-11-14 Thread GitBox


GitHub user pablopla added a comment to the discussion: deb packages

@hwittenborn makedeb could be great.

Why is this issue stale?

GitHub link: 
https://github.com/apache/pulsar/discussions/18464#discussioncomment-4137547


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] github-actions[bot] added a comment to the discussion: deb packages

2022-11-14 Thread GitBox


GitHub user github-actions[bot] added a comment to the discussion: deb packages

The issue had no activity for 30 days, mark with Stale label.

GitHub link: 
https://github.com/apache/pulsar/discussions/18464#discussioncomment-4137550


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] hpvd added a comment to the discussion: deb packages

2022-11-14 Thread GitBox


GitHub user hpvd added a comment to the discussion: deb packages

+1 for this!

GitHub link: 
https://github.com/apache/pulsar/discussions/18464#discussioncomment-4137549


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] github-actions[bot] added a comment to the discussion: deb packages

2022-11-14 Thread GitBox


GitHub user github-actions[bot] added a comment to the discussion: deb packages

The issue had no activity for 30 days, mark with Stale label.

GitHub link: 
https://github.com/apache/pulsar/discussions/18464#discussioncomment-4137548


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-14 Thread Michael Marshall
> And this PIP title has been changed to `Make TableView support TTL`.

What time is used to compute expiration? Is it the publish time or the
receive time? Also, are there cases that will reset a key's timer?

Thanks,
Michael

On Mon, Nov 14, 2022 at 2:40 AM Enrico Olivelli  wrote:
>
> Il giorno lun 14 nov 2022 alle ore 05:57 Kai Wang  ha 
> scritto:
> >
> > Hi, pulsar-dev community,
> >
> > Since the non-persistent topic support doesn't require API changes. I have 
> > pushed a PR to implement it, which has already been merged.
> >
> > See: https://github.com/apache/pulsar/pull/18375
>
> Perfect
>
> Thanks
> Enrico
>
> >
> > And this PIP title has been changed to `Make TableView support TTL`.
> >
> > PIP link: https://github.com/apache/pulsar/issues/18229
> >
> > Thanks,
> > Kai
> >
> > On 2022/11/04 02:28:41 Kai Wang wrote:
> > > Hi, pulsar-dev community,
> > >
> > > I’ve opened a PIP to discuss : PIP-221: Make TableView support read the 
> > > non-persistent topic.
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/18229
> > >
> > > Thanks,
> > > Kai
> > >


Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-14 Thread Joe F
I am not sure about the semantics of TableView on a non-persistent topic.

 Exactly how does this work?

 What happens if the client crashes?  What is the base state for the table?

What exactly can I expect as a user from this?

Joe

On Sun, Nov 13, 2022 at 8:57 PM Kai Wang  wrote:

> Hi, pulsar-dev community,
>
> Since the non-persistent topic support doesn't require API changes. I have
> pushed a PR to implement it, which has already been merged.
>
> See: https://github.com/apache/pulsar/pull/18375
>
> And this PIP title has been changed to `Make TableView support TTL`.
>
> PIP link: https://github.com/apache/pulsar/issues/18229
>
> Thanks,
> Kai
>
> On 2022/11/04 02:28:41 Kai Wang wrote:
> > Hi, pulsar-dev community,
> >
> > I’ve opened a PIP to discuss : PIP-221: Make TableView support read the
> non-persistent topic.
> >
> > PIP link: https://github.com/apache/pulsar/issues/18229
> >
> > Thanks,
> > Kai
> >
>


Re: [Vote] PIP-215: Configurable TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView

2022-11-14 Thread PengHui Li
+1

Thanks,
Penghui

On Wed, Nov 9, 2022 at 12:52 AM Heesung Sohn
 wrote:

> Dear Pulsar Community,
>
> Please review and vote on this PIP.
>
> PIP link: https://github.com/apache/pulsar/issues/18099
>
> Thank you,
> -Heesung
>


Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-14 Thread Kai Wang
Hi Michael,

> What time is used to compute expiration? Is it the publish time or the
> receive time?
This TTL will be based on the message publish time. We can also make it 
configurable if users have this demand.

> Also, are there cases that will reset a key's timer?
If some keys need to reset the timer, users can publish a new message with the 
old key . Since we are using the publish time as the expiration time.

Thanks,
Kai

On 2022/11/14 16:43:06 Michael Marshall wrote:
> > And this PIP title has been changed to `Make TableView support TTL`.
> 
> What time is used to compute expiration? Is it the publish time or the
> receive time? Also, are there cases that will reset a key's timer?
> 
> Thanks,
> Michael
> 
> On Mon, Nov 14, 2022 at 2:40 AM Enrico Olivelli  wrote:
> >
> > Il giorno lun 14 nov 2022 alle ore 05:57 Kai Wang  ha 
> > scritto:
> > >
> > > Hi, pulsar-dev community,
> > >
> > > Since the non-persistent topic support doesn't require API changes. I 
> > > have pushed a PR to implement it, which has already been merged.
> > >
> > > See: https://github.com/apache/pulsar/pull/18375
> >
> > Perfect
> >
> > Thanks
> > Enrico
> >
> > >
> > > And this PIP title has been changed to `Make TableView support TTL`.
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/18229
> > >
> > > Thanks,
> > > Kai
> > >
> > > On 2022/11/04 02:28:41 Kai Wang wrote:
> > > > Hi, pulsar-dev community,
> > > >
> > > > I’ve opened a PIP to discuss : PIP-221: Make TableView support read the 
> > > > non-persistent topic.
> > > >
> > > > PIP link: https://github.com/apache/pulsar/issues/18229
> > > >
> > > > Thanks,
> > > > Kai
> > > >
> 


[Discuss] Deprecate Index-based Publisher Stat Aggregation in Topics Partitioned-Stats

2022-11-14 Thread Heesung Sohn
Dear Pulsar Community,

We recently found a bug in `pulsar-admin topics partitioned-stats api` that
could incur a memory burst, high GC time, or OOM.

For this issue, I proposed a fix
 by deprecating the
aggregatePublisherStatsByProducerName
config and always aggregating the publishers' stats by publisherName,
instead of the list index(aggregatePublisherStatsByProducerName=false,
default).


   -  The index-based aggregation is inherently wrong in a highly
   concurrent producer environment(where the order and size of the publisher
   stat list are not guaranteed to be the same). The publisher stats need to
   be aggregated by a unique key, preferably the producer
   name(aggregatePublisherStatsByProducerName=true).


However, this fix will break some of the old client's compatibility since
the way Pulsar generates the producer name has changed over time, as
described here
.

As I replied here
, although
it is not desirable, I think we could be lenient on this change in the stat
API response(assuming thispublishers'stat struct is used for human admins
only for ad-hoc checks).

Are we OK with this non-backward-compatible fix for some of the old
clients? Or, do you have any other suggestions?

One idea for a long-term fix could be:
When there are thousands of producers(consumers) for a (partitioned-)topic,
it is expensive to aggregate each publisher(subscriptions)'s stats
on-the-fly across the brokers. Alternatively, for the next major version, I
think we could further define producers(subscriptions)' API like the below
and drop the publishers and subscriptions structs from topics
(partitioned-)stats returns.

pulsar-admin publishers list my-topic --last-pagination-key xyz
pulsar-admin publishers stats my-producer

# similarly for subscriptions

Regards,
Heesung


Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-14 Thread Kai Wang
Hi Joe,

> I am not sure about the semantics of TableView on a non-persistent topic.
> What happens if the client crashes?  What is the base state for the table?

If users use a non-persistent topic as the TableView topic, when the client 
crashes, 
the TableViews data will be lose. 

The current use case is to use the non-persistent topic to store the load data 
used by the new load manager. It doesn't require strong consistency ensure, and 
no need persistence.


Thanks,
Kai

On 2022/11/14 23:03:13 Joe F wrote:
> I am not sure about the semantics of TableView on a non-persistent topic.
> 
>  Exactly how does this work?
> 
>  What happens if the client crashes?  What is the base state for the table?
> 
> What exactly can I expect as a user from this?
> 
> Joe
> 
> On Sun, Nov 13, 2022 at 8:57 PM Kai Wang  wrote:
> 
> > Hi, pulsar-dev community,
> >
> > Since the non-persistent topic support doesn't require API changes. I have
> > pushed a PR to implement it, which has already been merged.
> >
> > See: https://github.com/apache/pulsar/pull/18375
> >
> > And this PIP title has been changed to `Make TableView support TTL`.
> >
> > PIP link: https://github.com/apache/pulsar/issues/18229
> >
> > Thanks,
> > Kai
> >
> > On 2022/11/04 02:28:41 Kai Wang wrote:
> > > Hi, pulsar-dev community,
> > >
> > > I’ve opened a PIP to discuss : PIP-221: Make TableView support read the
> > non-persistent topic.
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/18229
> > >
> > > Thanks,
> > > Kai
> > >
> >
> 


[GitHub] [pulsar] startjava added a comment to the discussion: what is "滚动策略循环"?

2022-11-14 Thread GitBox


GitHub user startjava added a comment to the discussion: what is "滚动策略循环"?

https://pulsar.apache.org/docs/next/concepts-architecture-overview/#managed-ledgers

A ledger can be deleted when all cursors have consumed the messages it 
contains. This allows for periodic rollover of ledgers.

“periodic rollover of ledgers”

what is rollover of ledgers ? how rollover??

ledgers not delete, from begin write data ??





GitHub link: 
https://github.com/apache/pulsar/discussions/18471#discussioncomment-4143453


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org