Thanks a lot guys, this helps to understand better

Regards,
Vinay Patil

On Mon, Jul 4, 2016 at 8:43 PM, Stephan Ewen <se...@apache.org> wrote:

> Just to be sure: Each *subtask* has one thread - so for each task, there
> are as many parallel threads (distributed across nodes) as your parallelism
> indicates.
>
> For most cases, having long chains and then a higher parallelism is a good
> choice.
> Cases where individual functions (MapFunction, etc) do something very CPU
> intensive are cases where you may want to not chain them, so they get a
> separate thread.
>
> If you see all tasks in one box in the UI, it probably means you have only
> "Filter" and "Map" as a function? In that case it is fine to have just one
> box (=Task) in the UI. The box still has parallelism via subtasks.
>
> If you insert a "rebalance()" between the Kafka Source and the
> Map/Filter/etc it makes sure that the data distribution in the
> Map/Filter/etc operators has best utilization independent of how the data
> was partitioned in Kafka.
> You should then also see two boxes in the UI - one for the Kafka Source,
> one for the actual processing.
>
>
>
>
>
>
> On Mon, Jul 4, 2016 at 5:00 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > Hi,
> > chaining is useful to minimize communication overhead. But in your case
> you
> > might benefit more from having good cluster utilization. There seems to
> be
> > a tradeoff. Maybe you can run some easy tests to see how it behaves for
> > you.
> >
> > Cheers,
> > Aljoscha
> >
> > On Mon, 4 Jul 2016 at 16:28 Vinay Patil <vinay18.pa...@gmail.com> wrote:
> >
> > > Thanks,
> > >
> > > so is operator chaining useful in terms of utilizing the resources or
> we
> > > should keep the chaining to minimal use, say 3-4 operators and disable
> > > chaining ?
> > > I am worried because I am seeing all the operators in one box on flink
> > UI.
> > >
> > >
> > > Regards,
> > > Vinay Patil
> > >
> > > On Mon, Jul 4, 2016 at 7:13 PM, Aljoscha Krettek <aljos...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > > this is true, yes. If the number of Kafka partitions is less than the
> > > > parallelism then some of the sources might not be utilized. If you
> > > insert a
> > > > rebalance after the sources you should be able to utilize all the
> > > > downstream operations equally.
> > > >
> > > > Cheers,
> > > > Aljoscha
> > > >
> > > > On Mon, 4 Jul 2016 at 11:13 Vinay Patil <vinay18.pa...@gmail.com>
> > wrote:
> > > >
> > > > > Just an update, the task will be executed by multiple threads , my
> > bad
> > > I
> > > > > asked the wrong way.
> > > > > Can you please clarify other things.
> > > > >
> > > > > Out of 8 node only 3 of them are getting utilized, reading the data
> > > from
> > > > > Kafka , does it mean that the Kafka partitions are set to less
> > number ?
> > > > >
> > > > > What if we use rescale or rebalance since it evenly distributes ,
> > would
> > > > > that ensure maximum use of resources ?
> > > > >
> > > > > Regards,
> > > > > Vinay Patil
> > > > >
> > > > > On Fri, Jul 1, 2016 at 11:09 PM, Vinay Patil <
> > vinay18.pa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > According to the documentation :
> > > > > > *"**Each task is executed by one thread ,**Chaining operators
> > > together
> > > > > > into tasks is a useful optimization: it reduces the overhead of
> > > > > > thread-to-thread handover and buffering, and increases overall
> > > > throughput
> > > > > > while decreasing latency"*
> > > > > > So does it mean that the single box (refer below mails) represent
> > it
> > > as
> > > > > a *single
> > > > > > task* and  the task will be executed by single thread only ?
> > > > > >
> > > > > > I am having 8 node cluster (parallelism set to 56), so what is
> the
> > > > > correct
> > > > > > way to achieve maximum CPU utilization and parallelism ? Does
> > > complete
> > > > > > stream chaining into a single box achieve maximum parallelism ?
> > > > > >
> > > > > > The data we are processing is huge volume of data (60,000 records
> > per
> > > > > > second), so wanted to be sure what we can correct to achieve
> better
> > > > > > results.
> > > > > >
> > > > > > Regards,
> > > > > > Vinay Patil
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 1, 2016 at 9:23 PM, Aljoscha Krettek <
> > > aljos...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >> yes, the window operator is stateful, which means that it will
> > pick
> > > up
> > > > > >> where it left in case of a failure and restore.
> > > > > >>
> > > > > >> You're right about the graph, chained operators are shown as one
> > > box.
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Aljoscha
> > > > > >>
> > > > > >> On Fri, 1 Jul 2016 at 04:52 Vinay Patil <
> vinay18.pa...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Just watched the video on Robust Stream Processing .
> > > > > >> > So when we say Window is a stateful operator , does it mean
> that
> > > > even
> > > > > if
> > > > > >> > the task manager doing the window operation fails,  will it
> pick
> > > up
> > > > > from
> > > > > >> > the state left earlier when it comes up ? (Have not read more
> on
> > > > state
> > > > > >> for
> > > > > >> > now)
> > > > > >> >
> > > > > >> >
> > > > > >> > Also in one of our project when we deploy on cluster and check
> > the
> > > > Job
> > > > > >> > Graph , everything is shown in one box , why this happens ? Is
> > it
> > > > > >> because
> > > > > >> > of chaining of streams ?
> > > > > >> > So the box here represent the function flow, right ?
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Vinay Patil
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to