Hi Guys, I understand that you are extremely busy but any pointers here is highly appreciated. I can proceed forward towards concluding the activity !
Best Regards CVP On Mon, Jan 9, 2017 at 11:43 AM, Chakravarthy varaga < chakravarth...@gmail.com> wrote: > Anything that I could check or collect for you for investigation ? > > On Sat, Jan 7, 2017 at 1:35 PM, Chakravarthy varaga < > chakravarth...@gmail.com> wrote: > >> Hi Stephen >> >> . Kafka version is: 0.9.0.1 the connector is flinkconsumer09 >> . The flatmap n coflatmap are connected by keyBy >> . No data is broadcasted and the data is not exploded based on the >> parallelism >> >> Cvp >> >> On 6 Jan 2017 20:16, "Stephan Ewen" <se...@apache.org> wrote: >> >>> Hi! >>> >>> You are right, parallelism 2 should be faster than parallelism 1 ;-) As >>> ChenQin pointed out, having only 2 Kafka Partitions may prevent further >>> scaleout. >>> >>> Few things to check: >>> - How are you connecting the FlatMap and CoFlatMap? Default, keyBy, >>> broadcast? >>> - Broadcast for example would multiply the data based on parallelism, >>> can lead to slowdown when saturating the network. >>> - Are you using the standard Kafka Source (which Kafka version)? >>> - Is there any part in the program that multiplies data/effort with >>> higher parallelism (does the FlatMap explode data based on parallelism)? >>> >>> Stephan >>> >>> >>> On Fri, Jan 6, 2017 at 7:27 PM, Chen Qin <qinnc...@gmail.com> wrote: >>> >>>> Just noticed there are only two partitions per topic. Regardless of how >>>> large parallelism set. Only two of those will get partition assigned at >>>> most. >>>> >>>> Sent from my iPhone >>>> >>>> On Jan 6, 2017, at 02:40, Chakravarthy varaga <chakravarth...@gmail.com> >>>> wrote: >>>> >>>> Hi All, >>>> >>>> Any updates on this? >>>> >>>> Best Regards >>>> CVP >>>> >>>> On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga < >>>> chakravarth...@gmail.com> wrote: >>>> >>>>> >>>>> Hi All, >>>>> >>>>> I have a job as attached. >>>>> >>>>> I have a 16 Core blade running RHEL 7. The taskmanager default number >>>>> of slots is set to 1. The source is a kafka stream and each of the 2 >>>>> sources(topic) have 2 partitions each. >>>>> >>>>> >>>>> *What I notice is that when I deploy a job to run with #parallelism=2 >>>>> the total processing time doubles the time it took when the same job was >>>>> deployed with #parallelism=1. It linearly increases with the parallelism.* >>>>> Since the numberof slots is set to 1 per TM, I would assume that the >>>>> job would be processed in parallel in 2 different TMs and that each >>>>> consumer in each TM is connected to 1 partition of the topic. This >>>>> therefore should have kept the overall processing time the same or less >>>>> !!! >>>>> >>>>> The co-flatmap connects the 2 streams & uses ValueState (checkpointed >>>>> in FS). I think this is distributed among the TMs. My understanding is >>>>> that >>>>> the search of values state could be costly between TMs. Do you sense >>>>> something wrong here? >>>>> >>>>> Best Regards >>>>> CVP >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >