Hi Rajan,
Thanks for the review!
It looks like the root cause of "stuck" keyword is the nature of
Key_Shared subscription which must guarantee messages ordering. I
understand it is really hard to achieve such guarantees without having
separate physical partitions but this feature is very essential for many
use cases. I also understand that the solution I've provided is more a
hack than a solid solution. I was trying to touch as little code as I
can and, unfortunately, I see no other options to avoid stuck dispatches
in the current implementation approach.
I also want to highlight that the current implementation leads entire
system to hang due to a single message corruption in case consumers
disconnects. This breaks one of a big Pulsar advantage: independent keys
processing.
Do you have plans for Key_Shared dispatcher rework? I would be happy to
contribute to such a project if I can help somehow.
Best Regards,
Nikolai
On 2023/12/01 06:34:59 Rajan Dhabalia wrote:
> Hi Nikolai,
>
> Thanks for bringing this to the attention. I have seen multiple PIP
and bug
> fixes for the Key-Shared subscription and I feel Key_Shared subscription
> dispatcher is developed with multiple fundamental issues which
required to
> introduce scheduled-task to unblock stuck reads, introducing stuck
delivery
> flag, and now we are seeing at least 2-3 PIP to add few more hacks to
> control such stuck dispatch. After reviewing code, it seems
>
PersistentStickyKeyDispatcherMultipleConsumers::getRestrictedMaxEntriesForConsumer(..)
> function can still return 0 dispatch messages with multiple different
edge
> cases and fundamental dispatch is not clean at all. This is the only
> dispatcher introduced after open sourcing Pulsar and unfortunately it was
> not developed with a cleaner approach. I understand the pain of such
stuck
> dispatch and even the current release is having a bugs where dispatch is
> getting stuck and many production systems are struggling with it and you
> must be suggesting this PIP to get work around for this stuck issue.
But I
> don't think this is the right approach to follow.
> I think the correct approach is to get rid of this "stuck" keyword
from the
> dispatcher because having "stuck" concept in dispatcher means it has a
> known flow and bug which we are not solving but we are adding more
debt to
> perform the work around. We never had any such concept in other
dispatchers
> which was introduced when Pulsar was introduced originally and it just
> works with clear semantics and permit flow mechanism , and we should
follow
> the same practice for this subscription.
>
> Thanks,
> Rajan
>
> On Thu, Nov 30, 2023 at 9:18 AM Nikolai <ta...@gmail.com> wrote:
>
> > Hello!
> >
> > I submitted a new PIP to add a configuration option which allows to
skip
> > blocking recently joined consumers for Key_Shared subscriptions
> >
> > It introduces additional memory consumption, so it will be disabled by
> > default. It would fix issues like
> > https://github.com/apache/pulsar/issues/21199
> >
> > Link to the PIP: https://github.com/apache/pulsar/pull/21615
> >
> > PR with the PIP implementation:
> > https://github.com/apache/pulsar/pull/21579
> >
> >
> > Best Regards,
> >
> > Nikolai
> >
> >
>