Just noticed there are only two partitions per topic. Regardless of how large 
parallelism set. Only two of those will get partition assigned at most.

Sent from my iPhone

> On Jan 6, 2017, at 02:40, Chakravarthy varaga <chakravarth...@gmail.com> 
> wrote:
> 
> Hi All,
> 
>     Any updates on this?
> 
> Best Regards
> CVP
> 
>> On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga 
>> <chakravarth...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> I have a job as attached.
>> 
>> I have a 16 Core blade running RHEL 7. The taskmanager default number of 
>> slots is set to 1. The source is a kafka stream and each of the 2 
>> sources(topic) have 2 partitions each.
>> What I notice is that when I deploy a job to run with #parallelism=2 the 
>> total processing time doubles the time it took when the same job was 
>> deployed with #parallelism=1. It linearly increases with the parallelism.
>> 
>> Since the numberof slots is set to 1 per TM, I would assume that the job 
>> would be processed in parallel in 2 different TMs and that each consumer in 
>> each TM is connected to 1 partition of the topic. This therefore should have 
>> kept the overall processing time the same or less !!!
>> 
>> The co-flatmap connects the 2 streams & uses ValueState (checkpointed in 
>> FS). I think this is distributed among the TMs. My understanding is that the 
>> search of values state could be costly between TMs.  Do you sense something 
>> wrong here?
>> 
>> Best Regards
>> CVP
>> 
>> 
>> 
>> 
> 

Reply via email to