Thanks for you answers guys. Just one things that I don't understand.
My topology is a web-crawler (using storm-crawler). I've a fetcher Bolt that spawn mutliple Thread to fetch url. And a bolt to parse the html I've set up the parrallelism_hint to 4 for the fetcher Bolt First test : 1 worker in the supervisor => I've a speed of 200 url/s on Fetcher Bolt and 10ms execution time on parser Second test : 4 worker in the supervisor => I've a speed of 400url/s and 20ms execution time on parser What could be the difference ? I've the same hint of parallelism, so I don't understand why they is some difference beetween both cases. Also, I've read that multiple worker on same machine use network to communicate... 2016-05-16 14:33 GMT+02:00 Nathan Leung <[email protected]>: > The number of tasks is the number of spout objects that get created, that > each have their own distinct sets of tuples that are emitted, need to be > acked, etc. The number of executors is the number of OS threads > (potentially across more than 1 machine) that get created to service these > spout objects. Usually there is 1 executor for each task, but you may want > to create more tasks than executors if you think you will want to rebalance > in the future. > > On Mon, May 16, 2016 at 7:45 AM, Serega Sheypak <[email protected]> > wrote: > >> but when I set BoltParallelism, I noticed that there were 3 instances of >> BoltA created even though I did not specify any setNumTasks. >> >> >> https://storm.apache.org/releases/1.0.0/Understanding-the-parallelism-of-a-Storm-topology.html >> >> *By default, the number of tasks is set to be the same as the number of >> executors, i.e. Storm will run one task per thread.* >> >> Did I understand your concern? Does quote explain storm behaviour? >> >> 2016-05-16 10:06 GMT+02:00 Navin Ipe <[email protected]>: >> >>> @Matthew: That's the exact same article I've linked to in my first post >>> (on the word "explanation"). I couldn't understand it, which is why I asked >>> here. >>> Funny thing is, this answer on Stackoverflow >>> <http://stackoverflow.com/questions/27964271/storm-dynamically-increasing-the-executors> >>> explains tasks like as though they are the BoltParallelism (from my code), >>> but when I set BoltParallelism, I noticed that there were 3 instances of >>> BoltA created even though I did not specify any setNumTasks. >>> >>> On Mon, May 16, 2016 at 1:18 PM, Matthew Lowe <[email protected]> >>> wrote: >>> >>>> I think this is a great article to read: >>>> >>>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ >>>> >>>> Best Regards >>>> Matthew Lowe >>>> >>>> On 16 May 2016, at 09:07, Adrien Carreira <[email protected]> wrote: >>>> >>>> +1 >>>> >>>> 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected]>: >>>> >>>>> Hi, >>>>> >>>>> I've seen the explanations >>>>> <http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/>, >>>>> but none of them explain it in terms of what I see in the code. This is >>>>> what I understood: >>>>> >>>>> int BoltParallelism = 3; >>>>> int BoltTaskParallelism = 2; >>>>> builder.setBolt("bolt1", new BoltA(), *BoltParallelism*) >>>>> .setNumTasks(*BoltTaskParallelism*) >>>>> >>>>> BoltParallelism creates 3 instances of BoltA. These are the executors. >>>>> BoltTaskParallelism allows Tuples to come into BoltA very fast, and >>>>> the Bolt creates a task for processing each incoming Tuple. If there are >>>>> not enough tasks, then the excess Tuples are made to wait in a queue of >>>>> the >>>>> executor. >>>>> >>>>> Strange thing is that the explanation says the tasks are run in a >>>>> single thread, so obviously I misunderstood something. Could you help me >>>>> understand it? >>>>> >>>>> -- >>>>> Regards, >>>>> Navin >>>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Navin >>> >> >> >
