Hi,
unfortunately the reading of one Kafka partition cannot be split among
several parallel instances of the Kafka source. So if you have only 2
partitions your reading parallelism is limited to that. You are right that
this can lead to bad performance and underutilization. The only solution I
see right now is to have more partitions in Kafka so that more readers can
read in parallel.

+Robert Adding Robert directly because he might have something more to say
about this.

Cheers,
Aljoscha

On Tue, 5 Jul 2016 at 15:48 Vinay Patil <vinay18.pa...@gmail.com> wrote:

> Hi,
>
> The re-balance actually distributes it to all the task managers, and now
> all TM's are getting utilized, You were right , I am seeing two
> boxes(Tasks) now.
>
> I have one question regarding the task slots :
>
> For the source the parallelism is set to 56, now when we see on the UI and
> click on source sub-task , I see 56 entries , out of which only two are
> getting the data from Kafka (this may be because I have two kafka
> partitions)
>
> The 56 entries that I am seeing for a sub-task on UI are the total task
> slots of all TM's, right ?
>
> If yes, only two slots are getting utilized, how do I ensure enough task
> slots are getting utilized at the source ? I have 7 task managers (8 cores
> per TM), so if only 1 core each of two task manager is performing the
> consume operation, wouldn't it hamper the performance.
>
> Even if two Task managers are utilized , all 16 slots should have been used
> , right ?
>
> For the other sub-task, for all 56 entries I am seeing bytes received.
> (this may be because of applying rebalance after the source)
>
> P.S: I am reading over million records from Kafka , so need to utilize
> enough resources [Performance is the key here].
>
>
> Regards,
> Vinay Patil
>
> On Mon, Jul 4, 2016 at 8:55 PM, Vinay Patil <vinay18.pa...@gmail.com>
> wrote:
>
> > Thanks a lot guys, this helps to understand better
> >
> > Regards,
> > Vinay Patil
> >
> > On Mon, Jul 4, 2016 at 8:43 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> >> Just to be sure: Each *subtask* has one thread - so for each task, there
> >> are as many parallel threads (distributed across nodes) as your
> >> parallelism
> >> indicates.
> >>
> >> For most cases, having long chains and then a higher parallelism is a
> good
> >> choice.
> >> Cases where individual functions (MapFunction, etc) do something very
> CPU
> >> intensive are cases where you may want to not chain them, so they get a
> >> separate thread.
> >>
> >> If you see all tasks in one box in the UI, it probably means you have
> only
> >> "Filter" and "Map" as a function? In that case it is fine to have just
> one
> >> box (=Task) in the UI. The box still has parallelism via subtasks.
> >>
> >> If you insert a "rebalance()" between the Kafka Source and the
> >> Map/Filter/etc it makes sure that the data distribution in the
> >> Map/Filter/etc operators has best utilization independent of how the
> data
> >> was partitioned in Kafka.
> >> You should then also see two boxes in the UI - one for the Kafka Source,
> >> one for the actual processing.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Jul 4, 2016 at 5:00 PM, Aljoscha Krettek <aljos...@apache.org>
> >> wrote:
> >>
> >> > Hi,
> >> > chaining is useful to minimize communication overhead. But in your
> case
> >> you
> >> > might benefit more from having good cluster utilization. There seems
> to
> >> be
> >> > a tradeoff. Maybe you can run some easy tests to see how it behaves
> for
> >> > you.
> >> >
> >> > Cheers,
> >> > Aljoscha
> >> >
> >> > On Mon, 4 Jul 2016 at 16:28 Vinay Patil <vinay18.pa...@gmail.com>
> >> wrote:
> >> >
> >> > > Thanks,
> >> > >
> >> > > so is operator chaining useful in terms of utilizing the resources
> or
> >> we
> >> > > should keep the chaining to minimal use, say 3-4 operators and
> disable
> >> > > chaining ?
> >> > > I am worried because I am seeing all the operators in one box on
> flink
> >> > UI.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Vinay Patil
> >> > >
> >> > > On Mon, Jul 4, 2016 at 7:13 PM, Aljoscha Krettek <
> aljos...@apache.org
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > > this is true, yes. If the number of Kafka partitions is less than
> >> the
> >> > > > parallelism then some of the sources might not be utilized. If you
> >> > > insert a
> >> > > > rebalance after the sources you should be able to utilize all the
> >> > > > downstream operations equally.
> >> > > >
> >> > > > Cheers,
> >> > > > Aljoscha
> >> > > >
> >> > > > On Mon, 4 Jul 2016 at 11:13 Vinay Patil <vinay18.pa...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Just an update, the task will be executed by multiple threads ,
> my
> >> > bad
> >> > > I
> >> > > > > asked the wrong way.
> >> > > > > Can you please clarify other things.
> >> > > > >
> >> > > > > Out of 8 node only 3 of them are getting utilized, reading the
> >> data
> >> > > from
> >> > > > > Kafka , does it mean that the Kafka partitions are set to less
> >> > number ?
> >> > > > >
> >> > > > > What if we use rescale or rebalance since it evenly distributes
> ,
> >> > would
> >> > > > > that ensure maximum use of resources ?
> >> > > > >
> >> > > > > Regards,
> >> > > > > Vinay Patil
> >> > > > >
> >> > > > > On Fri, Jul 1, 2016 at 11:09 PM, Vinay Patil <
> >> > vinay18.pa...@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > According to the documentation :
> >> > > > > > *"**Each task is executed by one thread ,**Chaining operators
> >> > > together
> >> > > > > > into tasks is a useful optimization: it reduces the overhead
> of
> >> > > > > > thread-to-thread handover and buffering, and increases overall
> >> > > > throughput
> >> > > > > > while decreasing latency"*
> >> > > > > > So does it mean that the single box (refer below mails)
> >> represent
> >> > it
> >> > > as
> >> > > > > a *single
> >> > > > > > task* and  the task will be executed by single thread only ?
> >> > > > > >
> >> > > > > > I am having 8 node cluster (parallelism set to 56), so what is
> >> the
> >> > > > > correct
> >> > > > > > way to achieve maximum CPU utilization and parallelism ? Does
> >> > > complete
> >> > > > > > stream chaining into a single box achieve maximum parallelism
> ?
> >> > > > > >
> >> > > > > > The data we are processing is huge volume of data (60,000
> >> records
> >> > per
> >> > > > > > second), so wanted to be sure what we can correct to achieve
> >> better
> >> > > > > > results.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Vinay Patil
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Jul 1, 2016 at 9:23 PM, Aljoscha Krettek <
> >> > > aljos...@apache.org>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > >> Hi,
> >> > > > > >> yes, the window operator is stateful, which means that it
> will
> >> > pick
> >> > > up
> >> > > > > >> where it left in case of a failure and restore.
> >> > > > > >>
> >> > > > > >> You're right about the graph, chained operators are shown as
> >> one
> >> > > box.
> >> > > > > >>
> >> > > > > >> Cheers,
> >> > > > > >> Aljoscha
> >> > > > > >>
> >> > > > > >> On Fri, 1 Jul 2016 at 04:52 Vinay Patil <
> >> vinay18.pa...@gmail.com>
> >> > > > > wrote:
> >> > > > > >>
> >> > > > > >> > Hi,
> >> > > > > >> >
> >> > > > > >> > Just watched the video on Robust Stream Processing .
> >> > > > > >> > So when we say Window is a stateful operator , does it mean
> >> that
> >> > > > even
> >> > > > > if
> >> > > > > >> > the task manager doing the window operation fails,  will it
> >> pick
> >> > > up
> >> > > > > from
> >> > > > > >> > the state left earlier when it comes up ? (Have not read
> >> more on
> >> > > > state
> >> > > > > >> for
> >> > > > > >> > now)
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > Also in one of our project when we deploy on cluster and
> >> check
> >> > the
> >> > > > Job
> >> > > > > >> > Graph , everything is shown in one box , why this happens ?
> >> Is
> >> > it
> >> > > > > >> because
> >> > > > > >> > of chaining of streams ?
> >> > > > > >> > So the box here represent the function flow, right ?
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > Regards,
> >> > > > > >> > Vinay Patil
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to