Just to be sure: Each *subtask* has one thread - so for each task, there are as many parallel threads (distributed across nodes) as your parallelism indicates.
For most cases, having long chains and then a higher parallelism is a good choice. Cases where individual functions (MapFunction, etc) do something very CPU intensive are cases where you may want to not chain them, so they get a separate thread. If you see all tasks in one box in the UI, it probably means you have only "Filter" and "Map" as a function? In that case it is fine to have just one box (=Task) in the UI. The box still has parallelism via subtasks. If you insert a "rebalance()" between the Kafka Source and the Map/Filter/etc it makes sure that the data distribution in the Map/Filter/etc operators has best utilization independent of how the data was partitioned in Kafka. You should then also see two boxes in the UI - one for the Kafka Source, one for the actual processing. On Mon, Jul 4, 2016 at 5:00 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > chaining is useful to minimize communication overhead. But in your case you > might benefit more from having good cluster utilization. There seems to be > a tradeoff. Maybe you can run some easy tests to see how it behaves for > you. > > Cheers, > Aljoscha > > On Mon, 4 Jul 2016 at 16:28 Vinay Patil <vinay18.pa...@gmail.com> wrote: > > > Thanks, > > > > so is operator chaining useful in terms of utilizing the resources or we > > should keep the chaining to minimal use, say 3-4 operators and disable > > chaining ? > > I am worried because I am seeing all the operators in one box on flink > UI. > > > > > > Regards, > > Vinay Patil > > > > On Mon, Jul 4, 2016 at 7:13 PM, Aljoscha Krettek <aljos...@apache.org> > > wrote: > > > > > Hi, > > > this is true, yes. If the number of Kafka partitions is less than the > > > parallelism then some of the sources might not be utilized. If you > > insert a > > > rebalance after the sources you should be able to utilize all the > > > downstream operations equally. > > > > > > Cheers, > > > Aljoscha > > > > > > On Mon, 4 Jul 2016 at 11:13 Vinay Patil <vinay18.pa...@gmail.com> > wrote: > > > > > > > Just an update, the task will be executed by multiple threads , my > bad > > I > > > > asked the wrong way. > > > > Can you please clarify other things. > > > > > > > > Out of 8 node only 3 of them are getting utilized, reading the data > > from > > > > Kafka , does it mean that the Kafka partitions are set to less > number ? > > > > > > > > What if we use rescale or rebalance since it evenly distributes , > would > > > > that ensure maximum use of resources ? > > > > > > > > Regards, > > > > Vinay Patil > > > > > > > > On Fri, Jul 1, 2016 at 11:09 PM, Vinay Patil < > vinay18.pa...@gmail.com> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > According to the documentation : > > > > > *"**Each task is executed by one thread ,**Chaining operators > > together > > > > > into tasks is a useful optimization: it reduces the overhead of > > > > > thread-to-thread handover and buffering, and increases overall > > > throughput > > > > > while decreasing latency"* > > > > > So does it mean that the single box (refer below mails) represent > it > > as > > > > a *single > > > > > task* and the task will be executed by single thread only ? > > > > > > > > > > I am having 8 node cluster (parallelism set to 56), so what is the > > > > correct > > > > > way to achieve maximum CPU utilization and parallelism ? Does > > complete > > > > > stream chaining into a single box achieve maximum parallelism ? > > > > > > > > > > The data we are processing is huge volume of data (60,000 records > per > > > > > second), so wanted to be sure what we can correct to achieve better > > > > > results. > > > > > > > > > > Regards, > > > > > Vinay Patil > > > > > > > > > > > > > > > On Fri, Jul 1, 2016 at 9:23 PM, Aljoscha Krettek < > > aljos...@apache.org> > > > > > wrote: > > > > > > > > > >> Hi, > > > > >> yes, the window operator is stateful, which means that it will > pick > > up > > > > >> where it left in case of a failure and restore. > > > > >> > > > > >> You're right about the graph, chained operators are shown as one > > box. > > > > >> > > > > >> Cheers, > > > > >> Aljoscha > > > > >> > > > > >> On Fri, 1 Jul 2016 at 04:52 Vinay Patil <vinay18.pa...@gmail.com> > > > > wrote: > > > > >> > > > > >> > Hi, > > > > >> > > > > > >> > Just watched the video on Robust Stream Processing . > > > > >> > So when we say Window is a stateful operator , does it mean that > > > even > > > > if > > > > >> > the task manager doing the window operation fails, will it pick > > up > > > > from > > > > >> > the state left earlier when it comes up ? (Have not read more on > > > state > > > > >> for > > > > >> > now) > > > > >> > > > > > >> > > > > > >> > Also in one of our project when we deploy on cluster and check > the > > > Job > > > > >> > Graph , everything is shown in one box , why this happens ? Is > it > > > > >> because > > > > >> > of chaining of streams ? > > > > >> > So the box here represent the function flow, right ? > > > > >> > > > > > >> > > > > > >> > > > > > >> > Regards, > > > > >> > Vinay Patil > > > > >> > > > > > >> > > > > > > > > > > > > > > >