Hi Girish, Thank you for your feedback and for raising an important concern about the potential memory impact of the draining hashes hashmap, especially for large numbers of hashes. I'm glad to inform you that the memory impact related to the draining hashes state is not significant.
On the broker side, a consumer already maintains the "pendingAck" state: https://github.com/apache/pulsar/blob/4f96146f13b136644a4eb0cf4ec36699e0431929/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L357-L374 There's no need to add tracking for PIP-379 at the message level since that tracking already exists and contains the hashes. For the "draining hashes," the state is only needed when hash assignments change, as long as there are pending acks for a particular draining hash. The "draining hash" data structure is very lightweight: - It's an entry in a map where the key is the hash and the value contains a reference counter and the consumer ID. - Estimated memory consumption is roughly 80 bytes per entry (key: 16+16 bytes, value: 16+16+16 bytes). Memory usage estimates: - Worst case (all 64k hashes draining for a subscription): about 5MB - Practical case (less than 1000 hashes draining): less than 80 kilobytes - For 10,000 draining hashes: about 800 kB Importantly, PIP-379 will actually reduce memory consumption compared to the PIP-282 solution. PIP-282 introduced new runtime state ("individually sent positions"), which PIP-379 will remove. To summarize, the memory footprint of this feature is expected to be minimal and an improvement over the current solution when the "individually sent position" is removed. Please let me know if you have any further questions or concerns. -Lari On 2024/09/16 07:16:04 Girish Sharma wrote: > This is a good proposal and solves a big blocker that we have in using > key_shared for a few certain use cases. My concern is on the memory impact > of maintaining the draining hashes hashmap, especially when the hashes are > in the order of 10,000s or more. > Will there be checks or constraints to limit the memory footprint? > > Regards > > On Mon, Sep 16, 2024 at 12:30 PM Lari Hotari <lhot...@apache.org> wrote: > > > Hi PengHui, > > > > Thank you for your suggestion. I really appreciate your input on this, and > > I understand that without context, your proposed approach would indeed be > > the way forward. > > > > In this particular case, though, there are compelling reasons why PIP-379 > > suggests replacing the "recently joined consumers" approach with "draining > > hashes" in Pulsar sooner rather than later. > > > > Unlike PIP-192 and PIP-195, where we were improving already functional > > implementations, here we're looking at addressing what appears to be a core > > functionality issue. > > > > An additional reason is that PIP-282 is already implemented in master > > branch and it adds significant runtime state complexity compared to the > > previous solution. PIP-282 didn't have a way to opt-out either and it is > > currently included in master branch. > > > > I'm more than happy to discuss the process aspects in more detail if you > > have any concerns about the proposed approach from that perspective. > > > > I'd really value your further feedback on the content and proposed design > > of PIP-379. Your insights would be incredibly helpful as we work on > > refining this solution. > > > > Looking forward to your thoughts! > > > > -Lari > > > > On 2024/09/15 23:52:48 PengHui Li wrote: > > > Hi Lari, > > > > > > I recommend creating a new implementation rather than directly replacing > > > the existing one. > > > This approach aligns with how we’ve handled several proposals in the past > > > and allows us to maintain stability while introducing improvements > > > > > > - PIP-192: New Pulsar Broker Load Balancer > > > - PIP-195: New bucket based delayed message tracker > > > > > > Once the new implementation proves to be stable, we can switch the > > default > > > implementation of the Key_Shared subscription to the new ‘draining > > hashes’ > > > solution. > > > > > > On Sat, Sep 14, 2024 at 8:40 AM Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > > > Awesome proposal, no questions from my side > > > > > > > > +1 > > > > > > > > Enrico > > > > > > > > Il giorno sab 14 set 2024 alle ore 16:21 Lari Hotari < > > lhot...@apache.org> > > > > ha scritto: > > > > > > > > > Dear Pulsar Community, > > > > > > > > > > I'd like to propose a new improvement for Pulsar's Key_Shared > > > > > subscription mode, outlined in PIP-379. This proposal aims to address > > > > > several issues with the current implementation and introduce a more > > > > > efficient mechanism for managing message ordering. > > > > > > > > > > Problem: > > > > > The current Key_Shared implementation faces challenges including: > > > > > 1. Complex management of "recently joined consumers" > > > > > 2. Incomplete fulfillment of ordering guarantees > > > > > 3. Unnecessary message blocking > > > > > 4. Poor observability > > > > > > > > > > PIP-379 introduces a "draining hashes" concept to efficiently manage > > > > > message ordering by tracking affected hashes when consumer > > assignments > > > > > change. The high-level solution is drafted in the PIP document. > > > > > > > > > > Benefits: > > > > > 1. Improved message ordering guarantees > > > > > 2. Reduced unnecessary message blocking > > > > > 3. Better scalability and performance > > > > > 4. Enhanced observability > > > > > > > > > > This proposal would replace the existing "recently joined consumers" > > > > > mechanism, addressing its limitations while providing a more robust > > > > > solution. > > > > > > > > > > The full proposal can be found at: > > > > > https://github.com/apache/pulsar/pull/23309 > > > > > The direct link to the rendered version of the markdown file is: > > > > > https://github.com/lhotari/pulsar/blob/lh-pip-379/pip/pip-379.md > > > > > > > > > > I welcome your feedback and discussion on this proposal. Please share > > > > > your thoughts, concerns, or suggestions. > > > > > > > > > > -Lari > > > > > > > > > > > > > > > > > -- > Girish Sharma >