Re: Understanding parallelism in Storm

Navin Ipe Mon, 16 May 2016 09:25:58 -0700

Err...guys....I appreciate the ongoing discussion, but the original
question remains unanswered. The one I've asked at the very beginning of
this conversation. Some help would be appreciated.
Referring to the code I posted and as per Nathan's answer, you say that int
*BoltParallelism* actually represents the tasks which are the number of
instances of Bolts/Spouts? And BoltTaskParallelism is the number of
executors (OS threads)?
If that's the case, then execute() will get called only after the previous
execute() call of a Bolt has completed. And nextTuple() will get called
only after the previous nextTuple() of a Spout has completed. That's a bit
reassuring, since now one does not have to cater to multithreading within a
Spout/Bolt.



On Mon, May 16, 2016 at 7:07 PM, Matthias J. Sax <[email protected]> wrote:

> Hi,
>
>
> So this is not correct:
> > and
> > the Bolt creates a task for processing each incoming Tuple.
>
> Storm create exactly *BoltTaskParallelism* tasks and assigns incoming
> messages to tasks (according to the used connection pattern -- shuffle,
> fieldsGrouping etc).
>
> Futhermore:
>
> > If there
> > are not enough tasks, then the excess Tuples are made to wait in a
> > queue of the executor.
>
> No. There is no thing as "not enough tasks". Each task has its own input
> queue/buffer and tuple are stored there.
>
> The executor threads process one or multiple tasks. Thus, if a task is
> currently "on hold", new tuples are just added to the task's input
> queue. If an executor picks up on of its tasks for processing, the
> buffered tuples of the task are processed.
>
>
> -Matthias
>
> On 05/16/2016 09:07 AM, Adrien Carreira wrote:
> > +1
> >
> > 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected]
> > <mailto:[email protected]>>:
> >
> >     Hi,
> >
> >     I've seen the explanations
> >     <
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
> >,
> >     but none of them explain it in terms of what I see in the code. This
> >     is what I understood:
> >
> >     int BoltParallelism = 3;
> >     int BoltTaskParallelism = 2;
> >     builder.setBolt("bolt1", new BoltA(), *BoltParallelism*)
> >                     .setNumTasks(*BoltTaskParallelism*)
> >
> >     BoltParallelism creates 3 instances of BoltA. These are the
> executors.
> >     BoltTaskParallelism allows Tuples to come into BoltA very fast, and
> >     the Bolt creates a task for processing each incoming Tuple. If there
> >     are not enough tasks, then the excess Tuples are made to wait in a
> >     queue of the executor.
> >
> >     Strange thing is that the explanation says the tasks are run in a
> >     single thread, so obviously I misunderstood something. Could you
> >     help me understand it?
> >
> >     --
> >     Regards,
> >     Navin
> >
> >
>
>


-- 
Regards,
Navin

Re: Understanding parallelism in Storm

Reply via email to