Mingtao, We have also noticed this issue and are trying to fix it in the new producer:
KAFKA-1586 <https://issues.apache.org/jira/browse/KAFKA-1586> Guozhang On Tue, Aug 12, 2014 at 9:41 AM, Mingtao Zhang <mail2ming...@gmail.com> wrote: > Great! I am in the 10 min category for sure. I do see there is NO > partition key provided in our code. > > I feel it's too much 'customization' when Kafka provides a 'randomness' > default partition strategy while have another layer doing the 10 min tricky > to optimize socket stuff. > > Anyway, thank you so much for the help! > > Mingtao Sent from iPhone > > > On Aug 12, 2014, at 12:22 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > > I see your question now. You may want to read this FAQ: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified > > ? > > > > > > On Tue, Aug 12, 2014 at 8:11 AM, Mingtao Zhang <mail2ming...@gmail.com> > > wrote: > > > >> Hi Guozhang, > >> > >> I think what I am looking for is the real 'randomness' when producer > write > >> to the partitions. Based on my log, through a long time period, only one > >> partition got the write, while the other side, only one consumer is > active. > >> In my case the consumer is slow, so when it comes back for the next > >> message, the whole pipeline is slowed down. > >> > >> The 'round robin' works for my case. Is it Email a good thread to > follow? > >> > >> > http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCANZY1Phu8_QZqHCw-ivGp=mxmzpze9+2vjxrlewzzebjor0...@mail.gmail.com%3E > >> > >> Best Regards, > >> Mingtao > >> > >> > >> On Tue, Aug 12, 2014 at 7:55 AM, Mingtao Zhang <mail2ming...@gmail.com> > >> wrote: > >> > >>> Hi Guozhang, > >>> > >>> Thank you! > >>> > >>> Could I say the consumer 'take turns to consume' is resulted by the > >>> correspond partition got the 'message write'? > >>> > >>> The problem I am facing is my 'enrichment' (getting more data based on > >> raw > >>> data) consumer took too much time to complete one message consumption. > To > >>> explore more parallel, could I say my only choice is 'decouple consumer > >>> consumption with enrichment'? > >>> > >>> Mingtao Sent from iPhone > >>> > >>>> On Aug 12, 2014, at 1:10 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > >>>> > >>>> Hello Mingtao, > >>>> > >>>> The partition will not be re-assigned to other consumers unless the > >>> current > >>>> consumer fails, so the behavior you described will not be expected. > >>>> > >>>> Guozhang > >>>> > >>>> > >>>> On Mon, Aug 11, 2014 at 6:27 PM, Mingtao Zhang < > mail2ming...@gmail.com > >>> > >>>> wrote: > >>>> > >>>>> Hi Guozhang, > >>>>> > >>>>> I do have another Email talking about Partitions per topic. I paste > it > >>>>> within this Email. > >>>>> > >>>>> I am expecting those consumers will work concurrently. The behavior I > >>>>> observed here is consumer thread-1 will work a while, then thread-3 > >> will > >>>>> work, then thread-0 ..., is it normal? > >>>>> > >>>>> version is 2.2.0. > >>>>> > >>>>> Best Regards, > >>>>> Mingtao > >>>>> > >>>>>> On Wed, Jul 23, 2014 at 7:57 PM, Guozhang Wang <wangg...@gmail.com> > >>> wrote: > >>>>>> > >>>>>> num.partitions is only used as a default value when the createTopic > >>>>> command > >>>>>> does not specify the num.partitions or it is automatically created. > >> In > >>>>> your > >>>>>> case since you always use its value in the createTopic you will > >> always > >>>>> can > >>>>>> one partition. Try change your code to sth. like: > >>>>>> > >>>>>> String[] args = new String[]{ > >>>>>> "--zookeeper", config.getString("zookeeper"), > >>>>>> "--topic", config.getString("topic"), > >>>>>> "--replica", config.getString("replicas"), > >>>>>> "--partition", "8" > >>>>>> }; > >>>>>> > >>>>>> CreateTopicCommand.main(args); > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Jul 23, 2014 at 4:38 PM, Mingtao Zhang < > >> mail2ming...@gmail.com > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi All, > >>>>>>> > >>>>>>> In kafka.properties, I put (forgot to change): > >>>>>>> > >>>>>>> num.partitions=1 > >>>>>>> > >>>>>>> While I create topics programatically: > >>>>>>> > >>>>>>> String[] args = new String[]{ > >>>>>>> "--zookeeper", config.getString("zookeeper"), > >>>>>>> "--topic", config.getString("topic"), > >>>>>>> "--replica", config.getString("replicas"), > >>>>>>> "--partition", config.getString("partitions") > >>>>>>> }; > >>>>>>> > >>>>>>> CreateTopicCommand.main(args); > >>>>>>> > >>>>>>> The performance engineer told me only one consumer thread is > >> actively > >>>>>>> working even I have 4 consumer threads started (could see when > >>>>> debugging > >>>>>> or > >>>>>>> in thread dump); and 4 partitions configured from the args. > >>>>>>> > >>>>>>> It seems that num.partitions is still controlling the parallelism. > >> Do > >>> I > >>>>>>> need to change this num.partitions accordingly? Could I remove it? > >>> What > >>>>>> is > >>>>>>> I have different parallel requirement for different topic? > >>>>>>> > >>>>>>> Thank you in advance! > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> Mingtao > >>>>> > >>>>> > >>>>>> On Mon, Aug 11, 2014 at 7:37 PM, Guozhang Wang <wangg...@gmail.com> > >>> wrote: > >>>>>> > >>>>>> Mingtao, > >>>>>> > >>>>>> How many partitions of the consumed topic has? Basically the data is > >>>>>> distributed per-partition, and hence if the number of consumers is > >>> larger > >>>>>> than the number of partitions, some consumers will not get any data. > >>>>>> > >>>>>> Guozhang > >>>>>> > >>>>>> > >>>>>> On Mon, Aug 11, 2014 at 3:29 PM, Mingtao Zhang < > >> mail2ming...@gmail.com > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Is it anyhow related to the issue? > >>>>>>> > >>>>>>> WARN No previously checkpointed highwatermark value found for topic > >>> RAW > >>>>>>> partition 0. Returning 0 as the highwatermark > >>>>>>> (kafka.server.HighwaterMarkCheckpoint) > >>>>>>> > >>>>>>> Mingtao > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> -- Guozhang > >>>> > >>>> > >>>> > >>>> -- > >>>> -- Guozhang > > > > > > > > -- > > -- Guozhang > -- -- Guozhang