Great! I am in the 10 min category for sure. I do see there is NO partition key 
provided in our code.

I feel it's too much 'customization' when Kafka provides a 'randomness' default 
partition strategy while have another layer doing the 10 min tricky to optimize 
socket stuff.

Anyway, thank you so much for the help!

Mingtao Sent from iPhone

> On Aug 12, 2014, at 12:22 PM, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> I see your question now. You may want to read this FAQ:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> ?
> 
> 
> On Tue, Aug 12, 2014 at 8:11 AM, Mingtao Zhang <mail2ming...@gmail.com>
> wrote:
> 
>> Hi Guozhang,
>> 
>> I think what I am looking for is the real 'randomness' when producer write
>> to the partitions. Based on my log, through a long time period, only one
>> partition got the write, while the other side, only one consumer is active.
>> In my case the consumer is slow, so when it comes back for the next
>> message, the whole pipeline is slowed down.
>> 
>> The 'round robin' works for my case. Is it Email a good thread to follow?
>> 
>> http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCANZY1Phu8_QZqHCw-ivGp=mxmzpze9+2vjxrlewzzebjor0...@mail.gmail.com%3E
>> 
>> Best Regards,
>> Mingtao
>> 
>> 
>> On Tue, Aug 12, 2014 at 7:55 AM, Mingtao Zhang <mail2ming...@gmail.com>
>> wrote:
>> 
>>> Hi Guozhang,
>>> 
>>> Thank you!
>>> 
>>> Could I say the consumer 'take turns to consume' is resulted by the
>>> correspond partition got the 'message write'?
>>> 
>>> The problem I am facing is my 'enrichment' (getting more data based on
>> raw
>>> data) consumer took too much time to complete one message consumption. To
>>> explore more parallel, could I say my only choice is 'decouple consumer
>>> consumption with enrichment'?
>>> 
>>> Mingtao Sent from iPhone
>>> 
>>>> On Aug 12, 2014, at 1:10 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>>> 
>>>> Hello Mingtao,
>>>> 
>>>> The partition will not be re-assigned to other consumers unless the
>>> current
>>>> consumer fails, so the behavior you described will not be expected.
>>>> 
>>>> Guozhang
>>>> 
>>>> 
>>>> On Mon, Aug 11, 2014 at 6:27 PM, Mingtao Zhang <mail2ming...@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Hi Guozhang,
>>>>> 
>>>>> I do have another Email talking about Partitions per topic. I paste it
>>>>> within this Email.
>>>>> 
>>>>> I am expecting those consumers will work concurrently. The behavior I
>>>>> observed here is consumer thread-1 will work a while, then thread-3
>> will
>>>>> work, then thread-0 ..., is it normal?
>>>>> 
>>>>> version is 2.2.0.
>>>>> 
>>>>> Best Regards,
>>>>> Mingtao
>>>>> 
>>>>>> On Wed, Jul 23, 2014 at 7:57 PM, Guozhang Wang <wangg...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> num.partitions is only used as a default value when the createTopic
>>>>> command
>>>>>> does not specify the num.partitions or it is automatically created.
>> In
>>>>> your
>>>>>> case since you always use its value in the createTopic you will
>> always
>>>>> can
>>>>>> one partition. Try change your code to sth. like:
>>>>>> 
>>>>>>       String[] args = new String[]{
>>>>>>           "--zookeeper", config.getString("zookeeper"),
>>>>>>           "--topic", config.getString("topic"),
>>>>>>           "--replica", config.getString("replicas"),
>>>>>>           "--partition", "8"
>>>>>>       };
>>>>>> 
>>>>>>       CreateTopicCommand.main(args);
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 23, 2014 at 4:38 PM, Mingtao Zhang <
>> mail2ming...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> In kafka.properties, I put (forgot to change):
>>>>>>> 
>>>>>>> num.partitions=1
>>>>>>> 
>>>>>>> While I create topics programatically:
>>>>>>> 
>>>>>>>       String[] args = new String[]{
>>>>>>>           "--zookeeper", config.getString("zookeeper"),
>>>>>>>           "--topic", config.getString("topic"),
>>>>>>>           "--replica", config.getString("replicas"),
>>>>>>>           "--partition", config.getString("partitions")
>>>>>>>       };
>>>>>>> 
>>>>>>>       CreateTopicCommand.main(args);
>>>>>>> 
>>>>>>> The performance engineer told me only one consumer thread is
>> actively
>>>>>>> working even I have 4 consumer threads started (could see when
>>>>> debugging
>>>>>> or
>>>>>>> in thread dump); and 4 partitions configured from the args.
>>>>>>> 
>>>>>>> It seems that num.partitions is still controlling the parallelism.
>> Do
>>> I
>>>>>>> need to change this num.partitions accordingly? Could I remove it?
>>> What
>>>>>> is
>>>>>>> I have different parallel requirement for different topic?
>>>>>>> 
>>>>>>> Thank you in advance!
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Mingtao
>>>>> 
>>>>> 
>>>>>> On Mon, Aug 11, 2014 at 7:37 PM, Guozhang Wang <wangg...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> Mingtao,
>>>>>> 
>>>>>> How many partitions of the consumed topic has? Basically the data is
>>>>>> distributed per-partition, and hence if the number of consumers is
>>> larger
>>>>>> than the number of partitions, some consumers will not get any data.
>>>>>> 
>>>>>> Guozhang
>>>>>> 
>>>>>> 
>>>>>> On Mon, Aug 11, 2014 at 3:29 PM, Mingtao Zhang <
>> mail2ming...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Is it anyhow related to the issue?
>>>>>>> 
>>>>>>> WARN No previously checkpointed highwatermark value found for topic
>>> RAW
>>>>>>> partition 0. Returning 0 as the highwatermark
>>>>>>> (kafka.server.HighwaterMarkCheckpoint)
>>>>>>> 
>>>>>>> Mingtao
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> -- Guozhang
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -- Guozhang
> 
> 
> 
> -- 
> -- Guozhang

Reply via email to