Re: Understanding parallelism in Storm

Nathan Leung Mon, 16 May 2016 09:48:10 -0700

I would reread Michael Noll's blog post.  In my opinion it's pretty clear.

With regards to BoltParallelism and BoltTaskParallelism you have it
backwards.  Your BoltTaskParallelism is # tasks and BoltParalllelism is #
executors.  This is made clear in the section "Configuring the parallelism
of a topology" of the blog post.


On Mon, May 16, 2016 at 12:24 PM, Navin Ipe <[email protected]
> wrote:

> Err...guys....I appreciate the ongoing discussion, but the original
> question remains unanswered. The one I've asked at the very beginning of
> this conversation. Some help would be appreciated.
> Referring to the code I posted and as per Nathan's answer, you say that
> int *BoltParallelism* actually represents the tasks which are the number
> of instances of Bolts/Spouts? And BoltTaskParallelism is the number of
> executors (OS threads)?
> If that's the case, then execute() will get called only after the previous
> execute() call of a Bolt has completed. And nextTuple() will get called
> only after the previous nextTuple() of a Spout has completed. That's a bit
> reassuring, since now one does not have to cater to multithreading within a
> Spout/Bolt.
>
>
> On Mon, May 16, 2016 at 7:07 PM, Matthias J. Sax <[email protected]> wrote:
>
>> Hi,
>>
>>
>> So this is not correct:
>> > and
>> > the Bolt creates a task for processing each incoming Tuple.
>>
>> Storm create exactly *BoltTaskParallelism* tasks and assigns incoming
>> messages to tasks (according to the used connection pattern -- shuffle,
>> fieldsGrouping etc).
>>
>> Futhermore:
>>
>> > If there
>> > are not enough tasks, then the excess Tuples are made to wait in a
>> > queue of the executor.
>>
>> No. There is no thing as "not enough tasks". Each task has its own input
>> queue/buffer and tuple are stored there.
>>
>> The executor threads process one or multiple tasks. Thus, if a task is
>> currently "on hold", new tuples are just added to the task's input
>> queue. If an executor picks up on of its tasks for processing, the
>> buffered tuples of the task are processed.
>>
>>
>> -Matthias
>>
>> On 05/16/2016 09:07 AM, Adrien Carreira wrote:
>> > +1
>> >
>> > 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected]
>> > <mailto:[email protected]>>:
>> >
>> >     Hi,
>> >
>> >     I've seen the explanations
>> >     <
>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>> >,
>> >     but none of them explain it in terms of what I see in the code. This
>> >     is what I understood:
>> >
>> >     int BoltParallelism = 3;
>> >     int BoltTaskParallelism = 2;
>> >     builder.setBolt("bolt1", new BoltA(), *BoltParallelism*)
>> >                     .setNumTasks(*BoltTaskParallelism*)
>> >
>> >     BoltParallelism creates 3 instances of BoltA. These are the
>> executors.
>> >     BoltTaskParallelism allows Tuples to come into BoltA very fast, and
>> >     the Bolt creates a task for processing each incoming Tuple. If there
>> >     are not enough tasks, then the excess Tuples are made to wait in a
>> >     queue of the executor.
>> >
>> >     Strange thing is that the explanation says the tasks are run in a
>> >     single thread, so obviously I misunderstood something. Could you
>> >     help me understand it?
>> >
>> >     --
>> >     Regards,
>> >     Navin
>> >
>> >
>>
>>
>
>
> --
> Regards,
> Navin
>

Re: Understanding parallelism in Storm

Reply via email to