Hello all, Jun, thanks for taking a look at my KIP! We were also concerned about batches containing a single record so we kept this in mind for the implementation. The decision to switch the sticky partition actually involves returning from the record accumulator and assigning the new partition before the new batch is created. That way all of the records will go to this new partition's batch. If you would like to get a better look at how this works, please check out the PR: https://github.com/apache/kafka/pull/6997/files. The most important lines are in the append method of the RecordAccumulator and doSend in KafkaProducer.
Colin, I think this makes sense to me except for the name StickyRoundRobinPartitioner seems to not really explain the behavior of what would be implemented. Perhaps a name indicating the sticky behavior is always used, or that it will be used on keys is more descriptive. Calling it "RoundRobin" seems a bit misleading to me. Thanks again for reviewing, Justine On Thu, Jul 11, 2019 at 6:07 PM Jun Rao <j...@confluent.io> wrote: > Hi, Justine, > > Thanks for the KIP. Nice writeup and great results. Just one comment. > > 100. To add a record to the accumulator, the producer needs to know the > partition id. The decision of whether the record can be added to the > current batch is only made after the accumulator.append() call. So, when a > batch is full, it seems that the KIP will try to append the next record to > the same partition, which will trigger the creation of a new batch with a > single record. After that, new records will be routed to a new partition. > If the producer doesn't come back to the first partition in time, the > producer will send a single record batch. In the worse case, it can be that > every other batch has only a single record. Is this correct? If so, could > we avoid that? > > Jun > > On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cmcc...@apache.org> wrote: > > > Hi Justine, > > > > I agree that we shouldn't change RoundRobinPartitioner, since its > behavior > > is already specified. > > > > However, we could add a new, separate StickyRoundRobinPartitioner class > to > > KIP-480 which just implemented the sticky behavior regardless of whether > > the key was null. That seems pretty easy to add (and it wouldn't have to > > implemented right away in the first PR, of course.) It would be an > option > > for people who wanted to configure this behavior. > > > > best, > > Colin > > > > > > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote: > > > Hi M, > > > > > > I'm a little confused by what you mean by extending the behavior on to > > the > > > RoundRobinPartitioner. > > > The sticky partitioner plans to remove the round-robin behavior from > > > records with no keys. Instead of sending them to each partition in > order, > > > it sends them all to the same partition until the batch is sent. > > > I don't think you can have both round-robin and sticky partition > > behavior. > > > > > > Thank you, > > > Justine Olshan > > > > > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <manme...@gmail.com> wrote: > > > > > > > Thanks for the comments Colin. > > > > > > > > My only concern is that this KIP is addressing a good feature and > > having > > > > that extended to RoundRobinPartitioner means 1 less KIP in the > future. > > > > > > > > Would it be appropriate to extend the support to > RoundRobinPartitioner > > too? > > > > > > > > Thanks, > > > > > > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cmcc...@apache.org> > wrote: > > > > > > > > > Hi M, > > > > > > > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact with > this > > > > > KIP. If you configure your producer to use RoundRobinPartitioner, > > then > > > > the > > > > > DefaultPartitioner will not be used. And the "sticky" behavior is > > > > > implemented only in the DefaultPartitioner. > > > > > > > > > > regards, > > > > > Colin > > > > > > > > > > > > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote: > > > > > > Hello Justine, > > > > > > > > > > > > I have one item I wanted to discuss. > > > > > > > > > > > > We are currently in review stage for KAFKA-3333 where we can > choose > > > > > always > > > > > > RoundRobin regardless of null/usable key. > > > > > > > > > > > > If I understood this KIP motivation correctly, you are still > > honouring > > > > > how > > > > > > the hashing of key works for DefaultPartitioner. Would you say > that > > > > > having > > > > > > an always "Round-Robin" partitioning with "Sticky" assignment > > > > (efficient > > > > > > batching of records for a partition) doesn't deviate from your > > original > > > > > > intention? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan < > jols...@confluent.io> > > > > > wrote: > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > If there are no more comments or concerns, I would like to > start > > the > > > > > vote > > > > > > > on this tomorrow afternoon. > > > > > > > > > > > > > > However, if there are still topics to discuss, feel free to > bring > > > > them > > > > > up > > > > > > > now. > > > > > > > > > > > > > > Thank you, > > > > > > > Justine > > > > > > > > > > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan < > > jols...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > Hello again, > > > > > > > > > > > > > > > > Another update to the interface has been made to the KIP. > > > > > > > > Please let me know if you have any feedback! > > > > > > > > > > > > > > > > Thank you, > > > > > > > > Justine > > > > > > > > > > > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan < > > > > jols...@confluent.io > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> Hi all, > > > > > > > >> I made some changes to the KIP. > > > > > > > >> The idea is to clean up the code, make behavior more > explicit, > > > > > provide > > > > > > > >> more flexibility, and to keep default behavior the same. > > > > > > > >> > > > > > > > >> Now we will change the partition in onNewBatch, and specify > > the > > > > > > > >> conditions for this function call (non-keyed values, no > > explicit > > > > > > > >> partitions) in willCallOnNewBatch. > > > > > > > >> This clears up some of the issues with the implementation. > I'm > > > > > happy to > > > > > > > >> hear further opinions and discuss this change! > > > > > > > >> > > > > > > > >> Thank you, > > > > > > > >> Justine > > > > > > > >> > > > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe < > > cmcc...@apache.org> > > > > > > > wrote: > > > > > > > >> > > > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote: > > > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few > > > > comments: > > > > > > > >>> > > > > > > > > >>> > 1. Should we favor partitions that are not under > > replicated? > > > > > This is > > > > > > > >>> > something that Netflix did too. > > > > > > > >>> > > > > > > > >>> This seems like it could lead to cascading failures, right? > > If a > > > > > > > >>> partition becomes under-replicated because there is too > much > > > > > traffic, > > > > > > > the > > > > > > > >>> producer stops sending to it, which puts even more load on > > the > > > > > > > remaining > > > > > > > >>> partitions, which are even more likely to fail then, etc. > It > > > > also > > > > > will > > > > > > > >>> create unbalanced load patterns on the consumers. > > > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > 2. If there's no measurable performance difference, I > agree > > > > with > > > > > > > >>> Stanislav > > > > > > > >>> > that Optional would be better than Integer. > > > > > > > >>> > > > > > > > > >>> > 3. We should include the javadoc for the newly introduced > > > > method > > > > > that > > > > > > > >>> > specifies it and its parameters. In particular, it would > > good > > > > to > > > > > > > >>> specify if > > > > > > > >>> > it gets called when an explicit partition id has been > > provided. > > > > > > > >>> > > > > > > > >>> Agreed. > > > > > > > >>> > > > > > > > >>> best, > > > > > > > >>> Colin > > > > > > > >>> > > > > > > > >>> > > > > > > > > >>> > Ismael > > > > > > > >>> > > > > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan < > > > > > jols...@confluent.io> > > > > > > > >>> wrote: > > > > > > > >>> > > > > > > > > >>> > > Hello, > > > > > > > >>> > > This is the discussion thread for KIP-480: Sticky > > > > Partitioner. > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner > > > > > > > >>> > > > > > > > > > >>> > > Thank you, > > > > > > > >>> > > Justine Olshan > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >