Hi, The re-balance actually distributes it to all the task managers, and now all TM's are getting utilized, You were right , I am seeing two boxes(Tasks) now.
I have one question regarding the task slots : For the source the parallelism is set to 56, now when we see on the UI and click on source sub-task , I see 56 entries , out of which only two are getting the data from Kafka (this may be because I have two kafka partitions) The 56 entries that I am seeing for a sub-task on UI are the total task slots of all TM's, right ? If yes, only two slots are getting utilized, how do I ensure enough task slots are getting utilized at the source ? I have 7 task managers (8 cores per TM), so if only 1 core each of two task manager is performing the consume operation, wouldn't it hamper the performance. Even if two Task managers are utilized , all 16 slots should have been used , right ? For the other sub-task, for all 56 entries I am seeing bytes received. (this may be because of applying rebalance after the source) P.S: I am reading over million records from Kafka , so need to utilize enough resources [Performance is the key here]. Regards, Vinay Patil On Mon, Jul 4, 2016 at 8:55 PM, Vinay Patil <vinay18.pa...@gmail.com> wrote: > Thanks a lot guys, this helps to understand better > > Regards, > Vinay Patil > > On Mon, Jul 4, 2016 at 8:43 PM, Stephan Ewen <se...@apache.org> wrote: > >> Just to be sure: Each *subtask* has one thread - so for each task, there >> are as many parallel threads (distributed across nodes) as your >> parallelism >> indicates. >> >> For most cases, having long chains and then a higher parallelism is a good >> choice. >> Cases where individual functions (MapFunction, etc) do something very CPU >> intensive are cases where you may want to not chain them, so they get a >> separate thread. >> >> If you see all tasks in one box in the UI, it probably means you have only >> "Filter" and "Map" as a function? In that case it is fine to have just one >> box (=Task) in the UI. The box still has parallelism via subtasks. >> >> If you insert a "rebalance()" between the Kafka Source and the >> Map/Filter/etc it makes sure that the data distribution in the >> Map/Filter/etc operators has best utilization independent of how the data >> was partitioned in Kafka. >> You should then also see two boxes in the UI - one for the Kafka Source, >> one for the actual processing. >> >> >> >> >> >> >> On Mon, Jul 4, 2016 at 5:00 PM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >> > Hi, >> > chaining is useful to minimize communication overhead. But in your case >> you >> > might benefit more from having good cluster utilization. There seems to >> be >> > a tradeoff. Maybe you can run some easy tests to see how it behaves for >> > you. >> > >> > Cheers, >> > Aljoscha >> > >> > On Mon, 4 Jul 2016 at 16:28 Vinay Patil <vinay18.pa...@gmail.com> >> wrote: >> > >> > > Thanks, >> > > >> > > so is operator chaining useful in terms of utilizing the resources or >> we >> > > should keep the chaining to minimal use, say 3-4 operators and disable >> > > chaining ? >> > > I am worried because I am seeing all the operators in one box on flink >> > UI. >> > > >> > > >> > > Regards, >> > > Vinay Patil >> > > >> > > On Mon, Jul 4, 2016 at 7:13 PM, Aljoscha Krettek <aljos...@apache.org >> > >> > > wrote: >> > > >> > > > Hi, >> > > > this is true, yes. If the number of Kafka partitions is less than >> the >> > > > parallelism then some of the sources might not be utilized. If you >> > > insert a >> > > > rebalance after the sources you should be able to utilize all the >> > > > downstream operations equally. >> > > > >> > > > Cheers, >> > > > Aljoscha >> > > > >> > > > On Mon, 4 Jul 2016 at 11:13 Vinay Patil <vinay18.pa...@gmail.com> >> > wrote: >> > > > >> > > > > Just an update, the task will be executed by multiple threads , my >> > bad >> > > I >> > > > > asked the wrong way. >> > > > > Can you please clarify other things. >> > > > > >> > > > > Out of 8 node only 3 of them are getting utilized, reading the >> data >> > > from >> > > > > Kafka , does it mean that the Kafka partitions are set to less >> > number ? >> > > > > >> > > > > What if we use rescale or rebalance since it evenly distributes , >> > would >> > > > > that ensure maximum use of resources ? >> > > > > >> > > > > Regards, >> > > > > Vinay Patil >> > > > > >> > > > > On Fri, Jul 1, 2016 at 11:09 PM, Vinay Patil < >> > vinay18.pa...@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > According to the documentation : >> > > > > > *"**Each task is executed by one thread ,**Chaining operators >> > > together >> > > > > > into tasks is a useful optimization: it reduces the overhead of >> > > > > > thread-to-thread handover and buffering, and increases overall >> > > > throughput >> > > > > > while decreasing latency"* >> > > > > > So does it mean that the single box (refer below mails) >> represent >> > it >> > > as >> > > > > a *single >> > > > > > task* and the task will be executed by single thread only ? >> > > > > > >> > > > > > I am having 8 node cluster (parallelism set to 56), so what is >> the >> > > > > correct >> > > > > > way to achieve maximum CPU utilization and parallelism ? Does >> > > complete >> > > > > > stream chaining into a single box achieve maximum parallelism ? >> > > > > > >> > > > > > The data we are processing is huge volume of data (60,000 >> records >> > per >> > > > > > second), so wanted to be sure what we can correct to achieve >> better >> > > > > > results. >> > > > > > >> > > > > > Regards, >> > > > > > Vinay Patil >> > > > > > >> > > > > > >> > > > > > On Fri, Jul 1, 2016 at 9:23 PM, Aljoscha Krettek < >> > > aljos...@apache.org> >> > > > > > wrote: >> > > > > > >> > > > > >> Hi, >> > > > > >> yes, the window operator is stateful, which means that it will >> > pick >> > > up >> > > > > >> where it left in case of a failure and restore. >> > > > > >> >> > > > > >> You're right about the graph, chained operators are shown as >> one >> > > box. >> > > > > >> >> > > > > >> Cheers, >> > > > > >> Aljoscha >> > > > > >> >> > > > > >> On Fri, 1 Jul 2016 at 04:52 Vinay Patil < >> vinay18.pa...@gmail.com> >> > > > > wrote: >> > > > > >> >> > > > > >> > Hi, >> > > > > >> > >> > > > > >> > Just watched the video on Robust Stream Processing . >> > > > > >> > So when we say Window is a stateful operator , does it mean >> that >> > > > even >> > > > > if >> > > > > >> > the task manager doing the window operation fails, will it >> pick >> > > up >> > > > > from >> > > > > >> > the state left earlier when it comes up ? (Have not read >> more on >> > > > state >> > > > > >> for >> > > > > >> > now) >> > > > > >> > >> > > > > >> > >> > > > > >> > Also in one of our project when we deploy on cluster and >> check >> > the >> > > > Job >> > > > > >> > Graph , everything is shown in one box , why this happens ? >> Is >> > it >> > > > > >> because >> > > > > >> > of chaining of streams ? >> > > > > >> > So the box here represent the function flow, right ? >> > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > Regards, >> > > > > >> > Vinay Patil >> > > > > >> > >> > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> > >