Hello Lari,

Thanks for writing this proposal. The proposed solution looks good.

Best regards,
Apurva Telang

On Mon, Sep 16, 2024 at 4:23 AM Lari Hotari <lhot...@apache.org> wrote:

> Hi Girish,
>
> Thank you for your feedback and for raising an important concern about the
> potential memory impact of the draining hashes hashmap, especially for
> large numbers of hashes. I'm glad to inform you that the memory impact
> related to the draining hashes state is not significant.
>
> On the broker side, a consumer already maintains the "pendingAck" state:
>
> https://github.com/apache/pulsar/blob/4f96146f13b136644a4eb0cf4ec36699e0431929/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L357-L374
> There's no need to add tracking for PIP-379 at the message level since
> that tracking already exists and contains the hashes.
>
> For the "draining hashes," the state is only needed when hash assignments
> change, as long as there are pending acks for a particular draining hash.
>
> The "draining hash" data structure is very lightweight:
> - It's an entry in a map where the key is the hash and the value contains
> a reference counter and the consumer ID.
> - Estimated memory consumption is roughly 80 bytes per entry (key: 16+16
> bytes, value: 16+16+16 bytes).
>
> Memory usage estimates:
> - Worst case (all 64k hashes draining for a subscription): about 5MB
> - Practical case (less than 1000 hashes draining): less than 80 kilobytes
> - For 10,000 draining hashes: about 800 kB
>
> Importantly, PIP-379 will actually reduce memory consumption compared to
> the PIP-282 solution. PIP-282 introduced new runtime state ("individually
> sent positions"), which PIP-379 will remove. To summarize, the memory
> footprint of this feature is expected to be minimal and an improvement over
> the current solution when the "individually sent position" is removed.
>
> Please let me know if you have any further questions or concerns.
>
> -Lari
>
> On 2024/09/16 07:16:04 Girish Sharma wrote:
> > This is a good proposal and solves a big blocker that we have in using
> > key_shared for a few certain use cases. My concern is on the memory
> impact
> > of maintaining the draining hashes hashmap, especially when the hashes
> are
> > in the order of 10,000s or more.
> > Will there be checks or constraints to limit the memory footprint?
> >
> > Regards
> >
> > On Mon, Sep 16, 2024 at 12:30 PM Lari Hotari <lhot...@apache.org> wrote:
> >
> > > Hi PengHui,
> > >
> > > Thank you for your suggestion. I really appreciate your input on this,
> and
> > > I understand that without context, your proposed approach would indeed
> be
> > > the way forward.
> > >
> > > In this particular case, though, there are compelling reasons why
> PIP-379
> > > suggests replacing the "recently joined consumers" approach with
> "draining
> > > hashes" in Pulsar sooner rather than later.
> > >
> > > Unlike PIP-192 and PIP-195, where we were improving already functional
> > > implementations, here we're looking at addressing what appears to be a
> core
> > > functionality issue.
> > >
> > > An additional reason is that PIP-282 is already implemented in master
> > > branch and it adds significant runtime state complexity compared to the
> > > previous solution. PIP-282 didn't have a way to opt-out either and it
> is
> > > currently included in master branch.
> > >
> > > I'm more than happy to discuss the process aspects in more detail if
> you
> > > have any concerns about the proposed approach from that perspective.
> > >
> > > I'd really value your further feedback on the content and proposed
> design
> > > of PIP-379. Your insights would be incredibly helpful as we work on
> > > refining this solution.
> > >
> > > Looking forward to your thoughts!
> > >
> > > -Lari
> > >
> > > On 2024/09/15 23:52:48 PengHui Li wrote:
> > > > Hi Lari,
> > > >
> > > > I recommend creating a new implementation rather than directly
> replacing
> > > > the existing one.
> > > > This approach aligns with how we’ve handled several proposals in the
> past
> > > > and allows us to maintain stability while introducing improvements
> > > >
> > > > - PIP-192: New Pulsar Broker Load Balancer
> > > > - PIP-195: New bucket based delayed message tracker
> > > >
> > > > Once the new implementation proves to be stable, we can switch the
> > > default
> > > > implementation of the Key_Shared subscription to the new ‘draining
> > > hashes’
> > > > solution.
> > > >
> > > > On Sat, Sep 14, 2024 at 8:40 AM Enrico Olivelli <eolive...@gmail.com
> >
> > > wrote:
> > > >
> > > > > Awesome proposal, no questions from my side
> > > > >
> > > > > +1
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno sab 14 set 2024 alle ore 16:21 Lari Hotari <
> > > lhot...@apache.org>
> > > > > ha scritto:
> > > > >
> > > > > > Dear Pulsar Community,
> > > > > >
> > > > > > I'd like to propose a new improvement for Pulsar's Key_Shared
> > > > > > subscription mode, outlined in PIP-379. This proposal aims to
> address
> > > > > > several issues with the current implementation and introduce a
> more
> > > > > > efficient mechanism for managing message ordering.
> > > > > >
> > > > > > Problem:
> > > > > > The current Key_Shared implementation faces challenges including:
> > > > > > 1. Complex management of "recently joined consumers"
> > > > > > 2. Incomplete fulfillment of ordering guarantees
> > > > > > 3. Unnecessary message blocking
> > > > > > 4. Poor observability
> > > > > >
> > > > > > PIP-379 introduces a "draining hashes" concept to efficiently
> manage
> > > > > > message ordering by tracking affected hashes when consumer
> > > assignments
> > > > > > change. The high-level solution is drafted in the PIP document.
> > > > > >
> > > > > > Benefits:
> > > > > > 1. Improved message ordering guarantees
> > > > > > 2. Reduced unnecessary message blocking
> > > > > > 3. Better scalability and performance
> > > > > > 4. Enhanced observability
> > > > > >
> > > > > > This proposal would replace the existing "recently joined
> consumers"
> > > > > > mechanism, addressing its limitations while providing a more
> robust
> > > > > > solution.
> > > > > >
> > > > > > The full proposal can be found at:
> > > > > > https://github.com/apache/pulsar/pull/23309
> > > > > > The direct link to the rendered version of the markdown file is:
> > > > > > https://github.com/lhotari/pulsar/blob/lh-pip-379/pip/pip-379.md
> > > > > >
> > > > > > I welcome your feedback and discussion on this proposal. Please
> share
> > > > > > your thoughts, concerns, or suggestions.
> > > > > >
> > > > > > -Lari
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Girish Sharma
> >
>


-- 
Best regards,
Apurva Telang.

Reply via email to