Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-10-02 Thread Heesung Sohn
> we might be able to solve storing > 10M-100M unack messages but the question is if a broker really has that many unack messages then will broker run with such huge memory pressure and will it really serve large scale usecases for which Pulsar was built? I am sure, it might be useful for small use

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-10-02 Thread Rajan Dhabalia
>> Is this the correct understanding of how PR 9292 efficiently stores individual acks? Yes, that's correct. It's like serializing a Position object to a bit which could be the smallest serializable size we could achieve among any ser/des approaches. In the past, there were two fundamental challe

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-10-01 Thread Lari Hotari
On 2024/09/27 19:18:03 Rajan Dhabalia wrote: > Well, again PR#9292 already has an agreement to merge earlier as well and > it was reviewed as well but it just was blocked for no reason. and recently > it was acknowledged that PR#9292 fulfills the purpose of most of the > usecases and we should move

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-27 Thread Heesung Sohn
>> I'd hope to see these used for storing individual acks in Pulsar, but it >> seems to be too late to handle that when there are already other >> implementations that are sufficient for solving the problem. I think we can make PositionInfo ser/dser code and its interface generic, so that Pulsar

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-27 Thread Rajan Dhabalia
Hi, >> It's too bad that the PR reviews stalled in the past and the required improvements did not get included in the project. It's frustrating when this happens, especially for those who have put their time and effort into contributing a valuable feature such as the one you contributed in PR 9292

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-27 Thread Lari Hotari
Hi Rajan, It's too bad that the PR reviews stalled in the past and the required improvements did not get included in the project. It's frustrating when this happens, especially for those who have put their time and effort into contributing a valuable feature such as the one you contributed in P

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-26 Thread Lari Hotari
On 2024/09/25 21:59:53 Heesung Sohn wrote: > Hi team, > Can we use a space-efficient probabilistic data structure, such as > bloom filter to store the acked msg ids, if re-delivering acked msgs > are not so strict? Talking about algorithms, there are also efficient ways optimized for storing inte

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-26 Thread Rajan Dhabalia
Hi Enrico, I have rebased https://github.com/apache/pulsar/pull/9292 PR again after you helped to dismiss the blocker. Can you please help to review it again as you were part of the PR reviewer earlier. I think below two PRs were the most critical one to scale the unack message path because earli

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-25 Thread Rajan Dhabalia
+1 binding. >> Can we use a space-efficient probabilistic data structure, such as bloom filter to store the acked msg ids Here, in the problem statement it tries to dump and retrieve data for topic recovery, so we need a compute efficient solution for faster recovery with disk storage to store la

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-25 Thread Heesung Sohn
Sorry, we probably need to store "not acked" msg ids in the bloom filter. For all msg_Id from the min_acked, - if found in "not-acked msg id set"(possibly non-acked) -> re-send(msgs possibly duplicated). - if not found in "not-acked msg id set"(definitely acked) -> do not send. On Wed, Sep 25, 2

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-25 Thread Heesung Sohn
Hi team, Can we use a space-efficient probabilistic data structure, such as bloom filter to store the acked msg ids, if re-delivering acked msgs are not so strict? On Tue, Sep 24, 2024 at 5:28 PM Andrey Yegorov wrote: > > Penghui, > > Thank you for the thorough and detailed response. > > I have

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-24 Thread Andrey Yegorov
Penghui, Thank you for the thorough and detailed response. I have added the feature toggle per Lari's comment on the implementation PR. Regarding the compatibility, if the user has 10MB cursor data somehow (in our testng large serialized position info resulted in a single large entry > 1MB that

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-24 Thread PengHui Li
Thanks for driving the proposal. I would like to share the related context that happened many years ago - https://lists.apache.org/thread/y0r9kk0968ydpxtf16x6ql3x6kwy7dc1 - https://lists.apache.org/thread/hfv18cg0yckt5cqd0fc66rp7tth036kf We have two major approaches: 1. Minimize the persistent

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-24 Thread Lari Hotari
On Tue, 24 Sept 2024 at 05:01, Rajan Dhabalia wrote: > However, there are multiple other PRs related to key-shared sub, stats, > cursor performance, and other PRs are still blocked by others and people > just block it because they think they don't have this usecase. It's so > unfortunate that peop

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-24 Thread Lari Hotari
On Mon, 23 Sept 2024 at 20:37, Andrey Yegorov wrote: > @Lari: I think we can simply use managedLedgerMaxUnackedRangesToPersist as > a limiter, as the state is already persisted with the cursor. > Adding another config to allow single entry/multi entry storage of the > state feels like unnecessary

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-23 Thread Rajan Dhabalia
>> I am sorry I haven't followed up andI am not able to spend much time. I don't want to block your proposal Rajan. I totally understand and I am sure it was not intentional by you to block this PR. However, there are multiple other PRs related to key-shared sub, stats, cursor performance, and ot

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-23 Thread Andrey Yegorov
vote thread: https://lists.apache.org/thread/q31fx0rox9tdt34xsmo1ol1l76q8vk99 On Mon, Sep 23, 2024 at 10:37 AM Andrey Yegorov wrote: > Thank you all for the feedback. > > My take from this is the feature is needed and the general consensus is to > proceed with it. > I'll start a vote thread. > >

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-23 Thread Andrey Yegorov
Thank you all for the feedback. My take from this is the feature is needed and the general consensus is to proceed with it. I'll start a vote thread. Compression of the state (already used if enabled) and a more compact serialization format (as in Rajan's PR) alone are partial solutions that move

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-22 Thread Lari Hotari
Thanks for driving this, Andrey. This proposal is needed and very useful. One detail that should be addressed is the fact that there's an earlier PIP which wasn't fully implemented. It's "PIP 81: Split the individual acknowledgments into multiple entries." https://github.com/apache/pulsar/wiki/P

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-21 Thread Enrico Olivelli
Il Sab 21 Set 2024, 01:51 Rajan Dhabalia ha scritto: > Hi Andrey, > > Thanks for submitting the PR as we have been facing this issue for a long > time now and we also have PR which solves this issue in a simple and a > fundamental way with proven perf results as well. > > PR: https://github.com/a

Re: [DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-20 Thread Rajan Dhabalia
Hi Andrey, Thanks for submitting the PR as we have been facing this issue for a long time now and we also have PR which solves this issue in a simple and a fundamental way with proven perf results as well. PR: https://github.com/apache/pulsar/pull/9292 But again I am not sure some folks blocked

[DISCUSS] PIP-381: Handle large PositionInfo state

2024-09-20 Thread Andrey Yegorov
Hello, I created a PIP for handling large PositionInfo state (large number of unacked ranges in cursor.) PIP PR: https://github.com/apache/pulsar/pull/23328 Proposed implementation: https://github.com/apache/pulsar/pull/22799 Relevant excerpts from PIP: --- Background knowledge