I see your question now. You may want to read this FAQ: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ?
On Tue, Aug 12, 2014 at 8:11 AM, Mingtao Zhang <mail2ming...@gmail.com> wrote: > Hi Guozhang, > > I think what I am looking for is the real 'randomness' when producer write > to the partitions. Based on my log, through a long time period, only one > partition got the write, while the other side, only one consumer is active. > In my case the consumer is slow, so when it comes back for the next > message, the whole pipeline is slowed down. > > The 'round robin' works for my case. Is it Email a good thread to follow? > > http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCANZY1Phu8_QZqHCw-ivGp=mxmzpze9+2vjxrlewzzebjor0...@mail.gmail.com%3E > > Best Regards, > Mingtao > > > On Tue, Aug 12, 2014 at 7:55 AM, Mingtao Zhang <mail2ming...@gmail.com> > wrote: > > > Hi Guozhang, > > > > Thank you! > > > > Could I say the consumer 'take turns to consume' is resulted by the > > correspond partition got the 'message write'? > > > > The problem I am facing is my 'enrichment' (getting more data based on > raw > > data) consumer took too much time to complete one message consumption. To > > explore more parallel, could I say my only choice is 'decouple consumer > > consumption with enrichment'? > > > > Mingtao Sent from iPhone > > > > > On Aug 12, 2014, at 1:10 AM, Guozhang Wang <wangg...@gmail.com> wrote: > > > > > > Hello Mingtao, > > > > > > The partition will not be re-assigned to other consumers unless the > > current > > > consumer fails, so the behavior you described will not be expected. > > > > > > Guozhang > > > > > > > > > On Mon, Aug 11, 2014 at 6:27 PM, Mingtao Zhang <mail2ming...@gmail.com > > > > > wrote: > > > > > >> Hi Guozhang, > > >> > > >> I do have another Email talking about Partitions per topic. I paste it > > >> within this Email. > > >> > > >> I am expecting those consumers will work concurrently. The behavior I > > >> observed here is consumer thread-1 will work a while, then thread-3 > will > > >> work, then thread-0 ..., is it normal? > > >> > > >> version is 2.2.0. > > >> > > >> Best Regards, > > >> Mingtao > > >> > > >>> On Wed, Jul 23, 2014 at 7:57 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > >>> > > >>> num.partitions is only used as a default value when the createTopic > > >> command > > >>> does not specify the num.partitions or it is automatically created. > In > > >> your > > >>> case since you always use its value in the createTopic you will > always > > >> can > > >>> one partition. Try change your code to sth. like: > > >>> > > >>> String[] args = new String[]{ > > >>> "--zookeeper", config.getString("zookeeper"), > > >>> "--topic", config.getString("topic"), > > >>> "--replica", config.getString("replicas"), > > >>> "--partition", "8" > > >>> }; > > >>> > > >>> CreateTopicCommand.main(args); > > >>> > > >>> > > >>> > > >>> On Wed, Jul 23, 2014 at 4:38 PM, Mingtao Zhang < > mail2ming...@gmail.com > > > > > >>> wrote: > > >>> > > >>>> Hi All, > > >>>> > > >>>> In kafka.properties, I put (forgot to change): > > >>>> > > >>>> num.partitions=1 > > >>>> > > >>>> While I create topics programatically: > > >>>> > > >>>> String[] args = new String[]{ > > >>>> "--zookeeper", config.getString("zookeeper"), > > >>>> "--topic", config.getString("topic"), > > >>>> "--replica", config.getString("replicas"), > > >>>> "--partition", config.getString("partitions") > > >>>> }; > > >>>> > > >>>> CreateTopicCommand.main(args); > > >>>> > > >>>> The performance engineer told me only one consumer thread is > actively > > >>>> working even I have 4 consumer threads started (could see when > > >> debugging > > >>> or > > >>>> in thread dump); and 4 partitions configured from the args. > > >>>> > > >>>> It seems that num.partitions is still controlling the parallelism. > Do > > I > > >>>> need to change this num.partitions accordingly? Could I remove it? > > What > > >>> is > > >>>> I have different parallel requirement for different topic? > > >>>> > > >>>> Thank you in advance! > > >>>> > > >>>> Best Regards, > > >>>> Mingtao > > >> > > >> > > >>> On Mon, Aug 11, 2014 at 7:37 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > >>> > > >>> Mingtao, > > >>> > > >>> How many partitions of the consumed topic has? Basically the data is > > >>> distributed per-partition, and hence if the number of consumers is > > larger > > >>> than the number of partitions, some consumers will not get any data. > > >>> > > >>> Guozhang > > >>> > > >>> > > >>> On Mon, Aug 11, 2014 at 3:29 PM, Mingtao Zhang < > mail2ming...@gmail.com > > > > > >>> wrote: > > >>> > > >>>> Is it anyhow related to the issue? > > >>>> > > >>>> WARN No previously checkpointed highwatermark value found for topic > > RAW > > >>>> partition 0. Returning 0 as the highwatermark > > >>>> (kafka.server.HighwaterMarkCheckpoint) > > >>>> > > >>>> Mingtao > > >>> > > >>> > > >>> > > >>> -- > > >>> -- Guozhang > > > > > > > > > > > > -- > > > -- Guozhang > > > -- -- Guozhang