Hello Lari, Thanks for writing this proposal. The proposed solution looks good.
Best regards, Apurva Telang On Mon, Sep 16, 2024 at 4:23 AM Lari Hotari <lhot...@apache.org> wrote: > Hi Girish, > > Thank you for your feedback and for raising an important concern about the > potential memory impact of the draining hashes hashmap, especially for > large numbers of hashes. I'm glad to inform you that the memory impact > related to the draining hashes state is not significant. > > On the broker side, a consumer already maintains the "pendingAck" state: > > https://github.com/apache/pulsar/blob/4f96146f13b136644a4eb0cf4ec36699e0431929/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L357-L374 > There's no need to add tracking for PIP-379 at the message level since > that tracking already exists and contains the hashes. > > For the "draining hashes," the state is only needed when hash assignments > change, as long as there are pending acks for a particular draining hash. > > The "draining hash" data structure is very lightweight: > - It's an entry in a map where the key is the hash and the value contains > a reference counter and the consumer ID. > - Estimated memory consumption is roughly 80 bytes per entry (key: 16+16 > bytes, value: 16+16+16 bytes). > > Memory usage estimates: > - Worst case (all 64k hashes draining for a subscription): about 5MB > - Practical case (less than 1000 hashes draining): less than 80 kilobytes > - For 10,000 draining hashes: about 800 kB > > Importantly, PIP-379 will actually reduce memory consumption compared to > the PIP-282 solution. PIP-282 introduced new runtime state ("individually > sent positions"), which PIP-379 will remove. To summarize, the memory > footprint of this feature is expected to be minimal and an improvement over > the current solution when the "individually sent position" is removed. > > Please let me know if you have any further questions or concerns. > > -Lari > > On 2024/09/16 07:16:04 Girish Sharma wrote: > > This is a good proposal and solves a big blocker that we have in using > > key_shared for a few certain use cases. My concern is on the memory > impact > > of maintaining the draining hashes hashmap, especially when the hashes > are > > in the order of 10,000s or more. > > Will there be checks or constraints to limit the memory footprint? > > > > Regards > > > > On Mon, Sep 16, 2024 at 12:30 PM Lari Hotari <lhot...@apache.org> wrote: > > > > > Hi PengHui, > > > > > > Thank you for your suggestion. I really appreciate your input on this, > and > > > I understand that without context, your proposed approach would indeed > be > > > the way forward. > > > > > > In this particular case, though, there are compelling reasons why > PIP-379 > > > suggests replacing the "recently joined consumers" approach with > "draining > > > hashes" in Pulsar sooner rather than later. > > > > > > Unlike PIP-192 and PIP-195, where we were improving already functional > > > implementations, here we're looking at addressing what appears to be a > core > > > functionality issue. > > > > > > An additional reason is that PIP-282 is already implemented in master > > > branch and it adds significant runtime state complexity compared to the > > > previous solution. PIP-282 didn't have a way to opt-out either and it > is > > > currently included in master branch. > > > > > > I'm more than happy to discuss the process aspects in more detail if > you > > > have any concerns about the proposed approach from that perspective. > > > > > > I'd really value your further feedback on the content and proposed > design > > > of PIP-379. Your insights would be incredibly helpful as we work on > > > refining this solution. > > > > > > Looking forward to your thoughts! > > > > > > -Lari > > > > > > On 2024/09/15 23:52:48 PengHui Li wrote: > > > > Hi Lari, > > > > > > > > I recommend creating a new implementation rather than directly > replacing > > > > the existing one. > > > > This approach aligns with how we’ve handled several proposals in the > past > > > > and allows us to maintain stability while introducing improvements > > > > > > > > - PIP-192: New Pulsar Broker Load Balancer > > > > - PIP-195: New bucket based delayed message tracker > > > > > > > > Once the new implementation proves to be stable, we can switch the > > > default > > > > implementation of the Key_Shared subscription to the new ‘draining > > > hashes’ > > > > solution. > > > > > > > > On Sat, Sep 14, 2024 at 8:40 AM Enrico Olivelli <eolive...@gmail.com > > > > > wrote: > > > > > > > > > Awesome proposal, no questions from my side > > > > > > > > > > +1 > > > > > > > > > > Enrico > > > > > > > > > > Il giorno sab 14 set 2024 alle ore 16:21 Lari Hotari < > > > lhot...@apache.org> > > > > > ha scritto: > > > > > > > > > > > Dear Pulsar Community, > > > > > > > > > > > > I'd like to propose a new improvement for Pulsar's Key_Shared > > > > > > subscription mode, outlined in PIP-379. This proposal aims to > address > > > > > > several issues with the current implementation and introduce a > more > > > > > > efficient mechanism for managing message ordering. > > > > > > > > > > > > Problem: > > > > > > The current Key_Shared implementation faces challenges including: > > > > > > 1. Complex management of "recently joined consumers" > > > > > > 2. Incomplete fulfillment of ordering guarantees > > > > > > 3. Unnecessary message blocking > > > > > > 4. Poor observability > > > > > > > > > > > > PIP-379 introduces a "draining hashes" concept to efficiently > manage > > > > > > message ordering by tracking affected hashes when consumer > > > assignments > > > > > > change. The high-level solution is drafted in the PIP document. > > > > > > > > > > > > Benefits: > > > > > > 1. Improved message ordering guarantees > > > > > > 2. Reduced unnecessary message blocking > > > > > > 3. Better scalability and performance > > > > > > 4. Enhanced observability > > > > > > > > > > > > This proposal would replace the existing "recently joined > consumers" > > > > > > mechanism, addressing its limitations while providing a more > robust > > > > > > solution. > > > > > > > > > > > > The full proposal can be found at: > > > > > > https://github.com/apache/pulsar/pull/23309 > > > > > > The direct link to the rendered version of the markdown file is: > > > > > > https://github.com/lhotari/pulsar/blob/lh-pip-379/pip/pip-379.md > > > > > > > > > > > > I welcome your feedback and discussion on this proposal. Please > share > > > > > > your thoughts, concerns, or suggestions. > > > > > > > > > > > > -Lari > > > > > > > > > > > > > > > > > > > > > > > > -- > > Girish Sharma > > > -- Best regards, Apurva Telang.