> we might be able to solve storing > 10M-100M unack messages but the question is if a broker really has that many unack messages then will broker run with such huge memory pressure and will it really serve large scale usecases for which Pulsar was built? I am sure, it might be useful for small usecases and clients can use it if needed but it might not be useful for most of the usecases.
Could we use a shared AckedMessageRanges LRU Cache across many topics? It seems inefficient to pre-load all ranges. Thanks, Heesung On Wed, Oct 2, 2024 at 11:36 AM Rajan Dhabalia <rdhaba...@apache.org> wrote: > > >> Is this the correct understanding of how PR 9292 efficiently stores > individual acks? > > Yes, that's correct. It's like serializing a Position object to a bit which > could be the smallest serializable size we could achieve among any ser/des > approaches. > > In the past, there were two fundamental challenges we were facing to serve > a large set of unack messages: (a) in memory pressure/ large GC-Pauses due > to the large number of Position objects (b) serializing such a number of > objects to store in bookie ledger for topic recovery. > > (a) was handled by #3819 to replace a Position object with a bit which can > allow brokers to run with a large number of unack messages for a topic. But > it also comes with a certain limit for large scale multi-tenant systems > where a broker is serving a large number of topics and serving several > millions of unack messages per topic can create memory pressure on the > broker. Therefore, even if we solve (b) to store billions of unack messages > while topic recovery, the broker might not run with stability beyond sever > millions of unack messages. > So, we don't have to solve (b) to store more than 1M-10M unack messages > because keeping > 10M unack messages can impact broker stability in large > scale multi-tenant env. #9292 solves this acceptable range of unack > messages with which we can also run broker with stability. > > Talking about PIP-381, we might be able to solve storing > 10M-100M unack > messages but the question is if a broker really has that many unack > messages then will broker run with such huge memory pressure and will it > really serve large scale usecases for which Pulsar was built? I am sure, it > might be useful for small usecases and clients can use it if needed but it > might not be useful for most of the usecases. > > Thanks, > Rajan > > > On Tue, Oct 1, 2024 at 11:42 PM Lari Hotari <lhot...@apache.org> wrote: > > > On 2024/09/27 19:18:03 Rajan Dhabalia wrote: > > > Well, again PR#9292 already has an agreement to merge earlier as well and > > > it was reviewed as well but it just was blocked for no reason. and > > recently > > > it was acknowledged that PR#9292 fulfills the purpose of most of the > > > usecases and we should move forward with the approach. and that was the > > > reason, I had again spent time rebasing as rebasing efforts were already > > > mentioned in another thread and therefore we made it ready to make sure > > we > > > can use this feature for most of the usecases. I don't see any concern > > and > > > issue to not move forward now with PR#9292 unless if anyone comes with a > > > personal reason. PR#9292 is straightforward and it would be great if > > > reviewers can review it again and we can make progress to make this > > feature > > > available soon. > > > > I'm late to reviewing PR 9292, and I reviewed it after it was merged. > > The change in the merged PR 9292 is very useful if I understand it > > correctly. > > Thanks for the great work, Rajan. > > > > Since individual acknowledgments get encoded as a long[] array, it will > > compress the information significantly. A single long entry in the array > > will hold 64 bits and therefore 64 individual acknowledgments. In theory, 1 > > million (1024 * 1024) individual bits can be held in 128kB of memory > > (1024kB / 8) since every bit will encode one acknowledgment. > > > > Is this the correct understanding of how PR 9292 efficiently stores > > individual acks? > > > > -Lari > >