Hello Lari, thanks for the comments. replies inline.
On Wed, Jan 24, 2024 at 7:36 PM Lari Hotari <lhot...@apache.org> wrote: > Hi Girish, > > Very useful proposal. > > Would it be possible to enable comments on the Google Doc? It's pretty > hard to comment on the doc since copying is also disabled. > > I've enabled them now. Thank you for going through the doc. > In the scope definition 4.2, > "The initial scope is to target unordered consumption flows. Even in > the current world, there are challenges with normal partition scale up > for ordered consumption based topics, so keeping the partition scale > down out of scope for that as well." > > If we don't care about ordered consumption and re-keying, I guess the > feature isn't very hard to implement. > Pulsar already contains the topic termination feature which will let > consumers to consume messages while publishers cannot publish more > messages. This is the "ready-only topic" feature that could be used as > one of the building blocks for implementing the decrease of the > partition count for a topic. > Yes, terminated topic is already very close to the read-only topic barring the grace period and maybe the scope of un-terminating a topic. I will merge my read-only with the existing terminate API/feature. > > For the final design, it would be great to have a design for ordered > consumption flows. It might not be trivial to design it. I happened to > be at a local Kafka meetup a few months ago and this particular > challenge was discussed in the context of Kafka and how painful it is > to handle manually and what problems could happen in production when > large scale streaming applications assume that a specific key is > contained in a specific partition. > > There's a similar challenge also when the number of partitions are > increased so this problem isn't specific to decreasing partitions. > In ordered consumption flows, there is most likely an ordering key and > a specific key is assigned to a specific partition. If the partition > count changes, there would have to be some rekeying/reassignment that > happens. > > I agree that this is an existing problem in both kafka and pulsar for both partition count scale up (and scale down in kafka via re-mapping). For that purpose, I've kept it out of scope. But what I would ensure is that adding this new feature of partitions scale down is not increasing the complexity or difficulty of providing seamless partition count change for ordered consumption in future. -- Girish Sharma