I would reread Michael Noll's blog post. In my opinion it's pretty clear. With regards to BoltParallelism and BoltTaskParallelism you have it backwards. Your BoltTaskParallelism is # tasks and BoltParalllelism is # executors. This is made clear in the section "Configuring the parallelism of a topology" of the blog post.
On Mon, May 16, 2016 at 12:24 PM, Navin Ipe <[email protected] > wrote: > Err...guys....I appreciate the ongoing discussion, but the original > question remains unanswered. The one I've asked at the very beginning of > this conversation. Some help would be appreciated. > Referring to the code I posted and as per Nathan's answer, you say that > int *BoltParallelism* actually represents the tasks which are the number > of instances of Bolts/Spouts? And BoltTaskParallelism is the number of > executors (OS threads)? > If that's the case, then execute() will get called only after the previous > execute() call of a Bolt has completed. And nextTuple() will get called > only after the previous nextTuple() of a Spout has completed. That's a bit > reassuring, since now one does not have to cater to multithreading within a > Spout/Bolt. > > > On Mon, May 16, 2016 at 7:07 PM, Matthias J. Sax <[email protected]> wrote: > >> Hi, >> >> >> So this is not correct: >> > and >> > the Bolt creates a task for processing each incoming Tuple. >> >> Storm create exactly *BoltTaskParallelism* tasks and assigns incoming >> messages to tasks (according to the used connection pattern -- shuffle, >> fieldsGrouping etc). >> >> Futhermore: >> >> > If there >> > are not enough tasks, then the excess Tuples are made to wait in a >> > queue of the executor. >> >> No. There is no thing as "not enough tasks". Each task has its own input >> queue/buffer and tuple are stored there. >> >> The executor threads process one or multiple tasks. Thus, if a task is >> currently "on hold", new tuples are just added to the task's input >> queue. If an executor picks up on of its tasks for processing, the >> buffered tuples of the task are processed. >> >> >> -Matthias >> >> On 05/16/2016 09:07 AM, Adrien Carreira wrote: >> > +1 >> > >> > 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected] >> > <mailto:[email protected]>>: >> > >> > Hi, >> > >> > I've seen the explanations >> > < >> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ >> >, >> > but none of them explain it in terms of what I see in the code. This >> > is what I understood: >> > >> > int BoltParallelism = 3; >> > int BoltTaskParallelism = 2; >> > builder.setBolt("bolt1", new BoltA(), *BoltParallelism*) >> > .setNumTasks(*BoltTaskParallelism*) >> > >> > BoltParallelism creates 3 instances of BoltA. These are the >> executors. >> > BoltTaskParallelism allows Tuples to come into BoltA very fast, and >> > the Bolt creates a task for processing each incoming Tuple. If there >> > are not enough tasks, then the excess Tuples are made to wait in a >> > queue of the executor. >> > >> > Strange thing is that the explanation says the tasks are run in a >> > single thread, so obviously I misunderstood something. Could you >> > help me understand it? >> > >> > -- >> > Regards, >> > Navin >> > >> > >> >> > > > -- > Regards, > Navin >
