Great! I am in the 10 min category for sure. I do see there is NO partition key provided in our code.
I feel it's too much 'customization' when Kafka provides a 'randomness' default partition strategy while have another layer doing the 10 min tricky to optimize socket stuff. Anyway, thank you so much for the help! Mingtao Sent from iPhone > On Aug 12, 2014, at 12:22 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > I see your question now. You may want to read this FAQ: > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified > ? > > > On Tue, Aug 12, 2014 at 8:11 AM, Mingtao Zhang <mail2ming...@gmail.com> > wrote: > >> Hi Guozhang, >> >> I think what I am looking for is the real 'randomness' when producer write >> to the partitions. Based on my log, through a long time period, only one >> partition got the write, while the other side, only one consumer is active. >> In my case the consumer is slow, so when it comes back for the next >> message, the whole pipeline is slowed down. >> >> The 'round robin' works for my case. Is it Email a good thread to follow? >> >> http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCANZY1Phu8_QZqHCw-ivGp=mxmzpze9+2vjxrlewzzebjor0...@mail.gmail.com%3E >> >> Best Regards, >> Mingtao >> >> >> On Tue, Aug 12, 2014 at 7:55 AM, Mingtao Zhang <mail2ming...@gmail.com> >> wrote: >> >>> Hi Guozhang, >>> >>> Thank you! >>> >>> Could I say the consumer 'take turns to consume' is resulted by the >>> correspond partition got the 'message write'? >>> >>> The problem I am facing is my 'enrichment' (getting more data based on >> raw >>> data) consumer took too much time to complete one message consumption. To >>> explore more parallel, could I say my only choice is 'decouple consumer >>> consumption with enrichment'? >>> >>> Mingtao Sent from iPhone >>> >>>> On Aug 12, 2014, at 1:10 AM, Guozhang Wang <wangg...@gmail.com> wrote: >>>> >>>> Hello Mingtao, >>>> >>>> The partition will not be re-assigned to other consumers unless the >>> current >>>> consumer fails, so the behavior you described will not be expected. >>>> >>>> Guozhang >>>> >>>> >>>> On Mon, Aug 11, 2014 at 6:27 PM, Mingtao Zhang <mail2ming...@gmail.com >>> >>>> wrote: >>>> >>>>> Hi Guozhang, >>>>> >>>>> I do have another Email talking about Partitions per topic. I paste it >>>>> within this Email. >>>>> >>>>> I am expecting those consumers will work concurrently. The behavior I >>>>> observed here is consumer thread-1 will work a while, then thread-3 >> will >>>>> work, then thread-0 ..., is it normal? >>>>> >>>>> version is 2.2.0. >>>>> >>>>> Best Regards, >>>>> Mingtao >>>>> >>>>>> On Wed, Jul 23, 2014 at 7:57 PM, Guozhang Wang <wangg...@gmail.com> >>> wrote: >>>>>> >>>>>> num.partitions is only used as a default value when the createTopic >>>>> command >>>>>> does not specify the num.partitions or it is automatically created. >> In >>>>> your >>>>>> case since you always use its value in the createTopic you will >> always >>>>> can >>>>>> one partition. Try change your code to sth. like: >>>>>> >>>>>> String[] args = new String[]{ >>>>>> "--zookeeper", config.getString("zookeeper"), >>>>>> "--topic", config.getString("topic"), >>>>>> "--replica", config.getString("replicas"), >>>>>> "--partition", "8" >>>>>> }; >>>>>> >>>>>> CreateTopicCommand.main(args); >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 23, 2014 at 4:38 PM, Mingtao Zhang < >> mail2ming...@gmail.com >>>> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> In kafka.properties, I put (forgot to change): >>>>>>> >>>>>>> num.partitions=1 >>>>>>> >>>>>>> While I create topics programatically: >>>>>>> >>>>>>> String[] args = new String[]{ >>>>>>> "--zookeeper", config.getString("zookeeper"), >>>>>>> "--topic", config.getString("topic"), >>>>>>> "--replica", config.getString("replicas"), >>>>>>> "--partition", config.getString("partitions") >>>>>>> }; >>>>>>> >>>>>>> CreateTopicCommand.main(args); >>>>>>> >>>>>>> The performance engineer told me only one consumer thread is >> actively >>>>>>> working even I have 4 consumer threads started (could see when >>>>> debugging >>>>>> or >>>>>>> in thread dump); and 4 partitions configured from the args. >>>>>>> >>>>>>> It seems that num.partitions is still controlling the parallelism. >> Do >>> I >>>>>>> need to change this num.partitions accordingly? Could I remove it? >>> What >>>>>> is >>>>>>> I have different parallel requirement for different topic? >>>>>>> >>>>>>> Thank you in advance! >>>>>>> >>>>>>> Best Regards, >>>>>>> Mingtao >>>>> >>>>> >>>>>> On Mon, Aug 11, 2014 at 7:37 PM, Guozhang Wang <wangg...@gmail.com> >>> wrote: >>>>>> >>>>>> Mingtao, >>>>>> >>>>>> How many partitions of the consumed topic has? Basically the data is >>>>>> distributed per-partition, and hence if the number of consumers is >>> larger >>>>>> than the number of partitions, some consumers will not get any data. >>>>>> >>>>>> Guozhang >>>>>> >>>>>> >>>>>> On Mon, Aug 11, 2014 at 3:29 PM, Mingtao Zhang < >> mail2ming...@gmail.com >>>> >>>>>> wrote: >>>>>> >>>>>>> Is it anyhow related to the issue? >>>>>>> >>>>>>> WARN No previously checkpointed highwatermark value found for topic >>> RAW >>>>>>> partition 0. Returning 0 as the highwatermark >>>>>>> (kafka.server.HighwaterMarkCheckpoint) >>>>>>> >>>>>>> Mingtao >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- Guozhang >>>> >>>> >>>> >>>> -- >>>> -- Guozhang > > > > -- > -- Guozhang