Hi Colin, The first thing that comes to mind is AlwaysStickyPartitioner, but maybe I could think about it a bit more and come up with something better.
The only reason why I hesitate about the RoundRobin part is that it doesn't really distinguish that this partitioner will have this behavior on keyed values. I know this is the case for the RoundRobinPartitioner, but unless one is familiar with the code, it might not be obvious on first glance. I can definitely include this in the KIP once we get a name settled. Thanks, Justine On Fri, Jul 12, 2019 at 11:45 AM Colin McCabe <cmcc...@apache.org> wrote: > On Fri, Jul 12, 2019, at 09:02, Justine Olshan wrote: > > Hello all, > > > > Jun, thanks for taking a look at my KIP! We were also concerned about > > batches containing a single record so we kept this in mind for the > > implementation. The decision to switch the sticky partition actually > > involves returning from the record accumulator and assigning the new > > partition before the new batch is created. That way all of the records > will > > go to this new partition's batch. If you would like to get a better look > at > > how this works, please check out the PR: > > https://github.com/apache/kafka/pull/6997/files. The most important > lines > > are in the append method of the RecordAccumulator and doSend in > > KafkaProducer. > > Thanks for the explanation. > > > > > Colin, I think this makes sense to me except for the name > > StickyRoundRobinPartitioner seems to not really explain the behavior of > > what would be implemented. Perhaps a name indicating the sticky behavior > is > > always used, or that it will be used on keys is more descriptive. Calling > > it "RoundRobin" seems a bit misleading to me. > > Hmm, what name would you propose here? > > Keep in mind we don't have to implement the new configurable partitioner > in the initial PR :) > > best, > Colin > > > > > Thanks again for reviewing, > > Justine > > > > On Thu, Jul 11, 2019 at 6:07 PM Jun Rao <j...@confluent.io> wrote: > > > > > Hi, Justine, > > > > > > Thanks for the KIP. Nice writeup and great results. Just one comment. > > > > > > 100. To add a record to the accumulator, the producer needs to know the > > > partition id. The decision of whether the record can be added to the > > > current batch is only made after the accumulator.append() call. So, > when a > > > batch is full, it seems that the KIP will try to append the next > record to > > > the same partition, which will trigger the creation of a new batch > with a > > > single record. After that, new records will be routed to a new > partition. > > > If the producer doesn't come back to the first partition in time, the > > > producer will send a single record batch. In the worse case, it can be > that > > > every other batch has only a single record. Is this correct? If so, > could > > > we avoid that? > > > > > > Jun > > > > > > On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cmcc...@apache.org> > wrote: > > > > > > > Hi Justine, > > > > > > > > I agree that we shouldn't change RoundRobinPartitioner, since its > > > behavior > > > > is already specified. > > > > > > > > However, we could add a new, separate StickyRoundRobinPartitioner > class > > > to > > > > KIP-480 which just implemented the sticky behavior regardless of > whether > > > > the key was null. That seems pretty easy to add (and it wouldn't > have to > > > > implemented right away in the first PR, of course.) It would be an > > > option > > > > for people who wanted to configure this behavior. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote: > > > > > Hi M, > > > > > > > > > > I'm a little confused by what you mean by extending the behavior > on to > > > > the > > > > > RoundRobinPartitioner. > > > > > The sticky partitioner plans to remove the round-robin behavior > from > > > > > records with no keys. Instead of sending them to each partition in > > > order, > > > > > it sends them all to the same partition until the batch is sent. > > > > > I don't think you can have both round-robin and sticky partition > > > > behavior. > > > > > > > > > > Thank you, > > > > > Justine Olshan > > > > > > > > > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <manme...@gmail.com> > wrote: > > > > > > > > > > > Thanks for the comments Colin. > > > > > > > > > > > > My only concern is that this KIP is addressing a good feature and > > > > having > > > > > > that extended to RoundRobinPartitioner means 1 less KIP in the > > > future. > > > > > > > > > > > > Would it be appropriate to extend the support to > > > RoundRobinPartitioner > > > > too? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cmcc...@apache.org> > > > wrote: > > > > > > > > > > > > > Hi M, > > > > > > > > > > > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact > with > > > this > > > > > > > KIP. If you configure your producer to use > RoundRobinPartitioner, > > > > then > > > > > > the > > > > > > > DefaultPartitioner will not be used. And the "sticky" > behavior is > > > > > > > implemented only in the DefaultPartitioner. > > > > > > > > > > > > > > regards, > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote: > > > > > > > > Hello Justine, > > > > > > > > > > > > > > > > I have one item I wanted to discuss. > > > > > > > > > > > > > > > > We are currently in review stage for KAFKA-3333 where we can > > > choose > > > > > > > always > > > > > > > > RoundRobin regardless of null/usable key. > > > > > > > > > > > > > > > > If I understood this KIP motivation correctly, you are still > > > > honouring > > > > > > > how > > > > > > > > the hashing of key works for DefaultPartitioner. Would you > say > > > that > > > > > > > having > > > > > > > > an always "Round-Robin" partitioning with "Sticky" assignment > > > > > > (efficient > > > > > > > > batching of records for a partition) doesn't deviate from > your > > > > original > > > > > > > > intention? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan < > > > jols...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > If there are no more comments or concerns, I would like to > > > start > > > > the > > > > > > > vote > > > > > > > > > on this tomorrow afternoon. > > > > > > > > > > > > > > > > > > However, if there are still topics to discuss, feel free to > > > bring > > > > > > them > > > > > > > up > > > > > > > > > now. > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > Justine > > > > > > > > > > > > > > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan < > > > > jols...@confluent.io> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hello again, > > > > > > > > > > > > > > > > > > > > Another update to the interface has been made to the KIP. > > > > > > > > > > Please let me know if you have any feedback! > > > > > > > > > > > > > > > > > > > > Thank you, > > > > > > > > > > Justine > > > > > > > > > > > > > > > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan < > > > > > > jols...@confluent.io > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> Hi all, > > > > > > > > > >> I made some changes to the KIP. > > > > > > > > > >> The idea is to clean up the code, make behavior more > > > explicit, > > > > > > > provide > > > > > > > > > >> more flexibility, and to keep default behavior the same. > > > > > > > > > >> > > > > > > > > > >> Now we will change the partition in onNewBatch, and > specify > > > > the > > > > > > > > > >> conditions for this function call (non-keyed values, no > > > > explicit > > > > > > > > > >> partitions) in willCallOnNewBatch. > > > > > > > > > >> This clears up some of the issues with the > implementation. > > > I'm > > > > > > > happy to > > > > > > > > > >> hear further opinions and discuss this change! > > > > > > > > > >> > > > > > > > > > >> Thank you, > > > > > > > > > >> Justine > > > > > > > > > >> > > > > > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe < > > > > cmcc...@apache.org> > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote: > > > > > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A > few > > > > > > comments: > > > > > > > > > >>> > > > > > > > > > > >>> > 1. Should we favor partitions that are not under > > > > replicated? > > > > > > > This is > > > > > > > > > >>> > something that Netflix did too. > > > > > > > > > >>> > > > > > > > > > >>> This seems like it could lead to cascading failures, > right? > > > > If a > > > > > > > > > >>> partition becomes under-replicated because there is too > > > much > > > > > > > traffic, > > > > > > > > > the > > > > > > > > > >>> producer stops sending to it, which puts even more > load on > > > > the > > > > > > > > > remaining > > > > > > > > > >>> partitions, which are even more likely to fail then, > etc. > > > It > > > > > > also > > > > > > > will > > > > > > > > > >>> create unbalanced load patterns on the consumers. > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > > >>> > 2. If there's no measurable performance difference, I > > > agree > > > > > > with > > > > > > > > > >>> Stanislav > > > > > > > > > >>> > that Optional would be better than Integer. > > > > > > > > > >>> > > > > > > > > > > >>> > 3. We should include the javadoc for the newly > introduced > > > > > > method > > > > > > > that > > > > > > > > > >>> > specifies it and its parameters. In particular, it > would > > > > good > > > > > > to > > > > > > > > > >>> specify if > > > > > > > > > >>> > it gets called when an explicit partition id has been > > > > provided. > > > > > > > > > >>> > > > > > > > > > >>> Agreed. > > > > > > > > > >>> > > > > > > > > > >>> best, > > > > > > > > > >>> Colin > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > > >>> > Ismael > > > > > > > > > >>> > > > > > > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan < > > > > > > > jols...@confluent.io> > > > > > > > > > >>> wrote: > > > > > > > > > >>> > > > > > > > > > > >>> > > Hello, > > > > > > > > > >>> > > This is the discussion thread for KIP-480: Sticky > > > > > > Partitioner. > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner > > > > > > > > > >>> > > > > > > > > > > > >>> > > Thank you, > > > > > > > > > >>> > > Justine Olshan > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >