Re: Understanding parallelism in Storm

Adrien Carreira Mon, 16 May 2016 05:47:05 -0700

Thanks for you answers guys.

Just one things that I don't understand.


My topology is a web-crawler (using storm-crawler). I've a fetcher Bolt
that spawn mutliple Thread to fetch url.
And a bolt to parse the html

I've set up the parrallelism_hint to 4 for the fetcher Bolt
First test : 1 worker in the supervisor => I've a speed of 200 url/s on
Fetcher Bolt and 10ms execution time on parser
Second test : 4 worker in the supervisor => I've a speed of 400url/s and
20ms execution time on parser

What could be the difference ? I've the same hint of parallelism, so I
don't understand why they is some difference beetween both cases.

Also, I've read that multiple worker on same machine use network to
communicate...



2016-05-16 14:33 GMT+02:00 Nathan Leung <[email protected]>:

> The number of tasks is the number of spout objects that get created, that
> each have their own distinct sets of tuples that are emitted, need to be
> acked, etc.  The number of executors is the number of OS threads
> (potentially across more than 1 machine) that get created to service these
> spout objects.  Usually there is 1 executor for each task, but you may want
> to create more tasks than executors if you think you will want to rebalance
> in the future.
>
> On Mon, May 16, 2016 at 7:45 AM, Serega Sheypak <[email protected]>
> wrote:
>
>> but when I set BoltParallelism, I noticed that there were 3 instances of
>> BoltA created even though I did not specify any setNumTasks.
>>
>>
>> https://storm.apache.org/releases/1.0.0/Understanding-the-parallelism-of-a-Storm-topology.html
>>
>>  *By default, the number of tasks is set to be the same as the number of
>> executors, i.e. Storm will run one task per thread.*
>>
>> Did I understand your concern? Does quote explain storm behaviour?
>>
>> 2016-05-16 10:06 GMT+02:00 Navin Ipe <[email protected]>:
>>
>>> @Matthew: That's the exact same article I've linked to in my first post
>>> (on the word "explanation"). I couldn't understand it, which is why I asked
>>> here.
>>> Funny thing is, this answer on Stackoverflow
>>> <http://stackoverflow.com/questions/27964271/storm-dynamically-increasing-the-executors>
>>> explains tasks like as though they are the BoltParallelism (from my code),
>>> but when I set BoltParallelism, I noticed that there were 3 instances of
>>> BoltA created even though I did not specify any setNumTasks.
>>>
>>> On Mon, May 16, 2016 at 1:18 PM, Matthew Lowe <[email protected]>
>>> wrote:
>>>
>>>> I think this is a great article to read:
>>>>
>>>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>>>>
>>>> Best Regards
>>>> Matthew Lowe
>>>>
>>>> On 16 May 2016, at 09:07, Adrien Carreira <[email protected]> wrote:
>>>>
>>>> +1
>>>>
>>>> 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected]>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've seen the explanations
>>>>> <http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/>,
>>>>> but none of them explain it in terms of what I see in the code. This is
>>>>> what I understood:
>>>>>
>>>>> int BoltParallelism = 3;
>>>>> int BoltTaskParallelism = 2;
>>>>> builder.setBolt("bolt1", new BoltA(), *BoltParallelism*)
>>>>>                 .setNumTasks(*BoltTaskParallelism*)
>>>>>
>>>>> BoltParallelism creates 3 instances of BoltA. These are the executors.
>>>>> BoltTaskParallelism allows Tuples to come into BoltA very fast, and
>>>>> the Bolt creates a task for processing each incoming Tuple. If there are
>>>>> not enough tasks, then the excess Tuples are made to wait in a queue of 
>>>>> the
>>>>> executor.
>>>>>
>>>>> Strange thing is that the explanation says the tasks are run in a
>>>>> single thread, so obviously I misunderstood something. Could you help me
>>>>> understand it?
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Navin
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>
>>
>

Re: Understanding parallelism in Storm

Reply via email to