Hello Asaf, thank you for taking a look at this. I will have a formal PIP sometime by March end. Trying to close on the Rate limiting PIPs first.
On Sun, Feb 18, 2024 at 3:47 PM Asaf Mesika <asaf.mes...@gmail.com> wrote: > Hey Girish, > > First, I say that I *love* this proposal and, in general, those types of > proposals. > This is what strides Pulsar towards being an even more next-generation > messaging system. > > I read and have a few questions and brainstorming ideas popping into my > mind: > > 1. The current design basically says: Let’s have a read-only toggle (flag) > for each partition. When I decrease the partitions from, say, 2 to 1, then > if the partitions were “billing-0” and “billing-1”, now “billing-1” will be > marked read-only, and eventually, the client will only produce messages to > “billing-0”. After 1 hour, I can scale it back to 2 partitions, and now the > “billing-“1 will be toggled back to read-only=false. > This is true. But probably its only extension of a problem that already exists today - In case you scale up a 3 day retention topic from 2 to 3 partitions and start a new subscription from the beginning, you will see drastic time difference in the messages of the older partitions vs the newer ones. > > * I know you stated that ordered consumption is out of scope. The thing I > fear here is that even for shared subscriptions, in which order doesn’t > matter, it still feels a bit weird that when you consume from the > beginning, you can suddenly consume messages that are 1 hour apart from > each other, one after another. Something like: > > P0 | t1 | t3 | t7 | t10| t11| t13| t17| > +----+----+----+----+----+----+----+ > P1 | t2 | t4 | t6 | t9 | t12| t14| t16| > +----+----+----+----+----+----+----+ > P2 | | | t5 | t8 | | | t15| > | | | | | | | | > ----+----+----+----+----+----+----+----+ > ^ ^ > RO URO > > > t5 - you scaled to 3 partitions. > “R0” is when you change from 3 partitions to 2 > “URO” is when you change back to 3 partitions. > > When you consume this partitioned topic from the beginning, you will > consume t15 mixed with t6 and t7, which can be hours apart. > Even if the messages are hours apart, they are still confined to the ordering guarantees of a topic i.e. order is maintained within a partition :) > > I understand this can happen today if you only add a partition and read > from the beginning. > exactly! Maybe there is a need to solve this, maybe not as even kafka has similar behavior. Although I am unaware if they are having discussions to do something about it. > 2. If we keep ordered consumption out of scope, how do we keep the users > from doing “wrong” things, like using failover type subscriptions on > partitioned topics that have decreased their partitions? Topic and its > partition count is a detached “entity” from its consumption type. > > This will be a very easy proposal to do a live check in the topic update command. If there are exclusive/failover subscriptions attached to the topic, then we prevent this. We should actually do this today as well as the issue exists during partition count increase as well. > > I’m curious if you thought of implementing it following the pattern we have > today for BK. When an ensemble changes, it simply adds the new ensemble to > a list of ensembles, so you follow a chain of servers when you read from a > ledger. You read from (b1,b2,b3) and then switch to (b1, b3, b5). > > What if a partitioned topic is exactly that? It is a chain of lists. Each > list contains the topics (partitions). > Something like: > (billing-0-100, billing-1-101), (billing-0-102, billing-1-103, > billing-2-104), (billing-0-105, billing-1-106) > > It’s only a direction - just wondering if something like that has been > considered. > I believe this will be a very drastic change. I haven't looked in this direction, but this will touch almost every aspect of the broker - from dedupe, to transactions and beyond. I think almost all of the broker level feature rely on the fact that a partition will always be owned by a single topic at any given time. This will lead to an active partition for a single partition across brokers.. -- Girish Sharma