Re: Understanding parallelism in Storm

Nathan Leung Mon, 16 May 2016 05:48:08 -0700

You will have to profile to find the difference.  As to multiple workers on
the same machine using the network to communicate, that is somewhat
correct.  They will use the networking stack, but the data should never hit
the physical network.


On Mon, May 16, 2016 at 8:45 AM, Adrien Carreira <[email protected]>
wrote:

> Thanks for you answers guys.
>
> Just one things that I don't understand.
>
> My topology is a web-crawler (using storm-crawler). I've a fetcher Bolt
> that spawn mutliple Thread to fetch url.
> And a bolt to parse the html
>
> I've set up the parrallelism_hint to 4 for the fetcher Bolt
> First test : 1 worker in the supervisor => I've a speed of 200 url/s on
> Fetcher Bolt and 10ms execution time on parser
> Second test : 4 worker in the supervisor => I've a speed of 400url/s and
> 20ms execution time on parser
>
> What could be the difference ? I've the same hint of parallelism, so I
> don't understand why they is some difference beetween both cases.
>
> Also, I've read that multiple worker on same machine use network to
> communicate...
>
>
>
> 2016-05-16 14:33 GMT+02:00 Nathan Leung <[email protected]>:
>
>> The number of tasks is the number of spout objects that get created, that
>> each have their own distinct sets of tuples that are emitted, need to be
>> acked, etc.  The number of executors is the number of OS threads
>> (potentially across more than 1 machine) that get created to service these
>> spout objects.  Usually there is 1 executor for each task, but you may want
>> to create more tasks than executors if you think you will want to rebalance
>> in the future.
>>
>> On Mon, May 16, 2016 at 7:45 AM, Serega Sheypak <[email protected]
>> > wrote:
>>
>>> but when I set BoltParallelism, I noticed that there were 3 instances of
>>> BoltA created even though I did not specify any setNumTasks.
>>>
>>>
>>> https://storm.apache.org/releases/1.0.0/Understanding-the-parallelism-of-a-Storm-topology.html
>>>
>>>  *By default, the number of tasks is set to be the same as the number
>>> of executors, i.e. Storm will run one task per thread.*
>>>
>>> Did I understand your concern? Does quote explain storm behaviour?
>>>
>>> 2016-05-16 10:06 GMT+02:00 Navin Ipe <[email protected]>:
>>>
>>>> @Matthew: That's the exact same article I've linked to in my first post
>>>> (on the word "explanation"). I couldn't understand it, which is why I asked
>>>> here.
>>>> Funny thing is, this answer on Stackoverflow
>>>> <http://stackoverflow.com/questions/27964271/storm-dynamically-increasing-the-executors>
>>>> explains tasks like as though they are the BoltParallelism (from my code),
>>>> but when I set BoltParallelism, I noticed that there were 3 instances of
>>>> BoltA created even though I did not specify any setNumTasks.
>>>>
>>>> On Mon, May 16, 2016 at 1:18 PM, Matthew Lowe <[email protected]>
>>>> wrote:
>>>>
>>>>> I think this is a great article to read:
>>>>>
>>>>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>>>>>
>>>>> Best Regards
>>>>> Matthew Lowe
>>>>>
>>>>> On 16 May 2016, at 09:07, Adrien Carreira <[email protected]>
>>>>> wrote:
>>>>>
>>>>> +1
>>>>>
>>>>> 2016-05-16 6:40 GMT+02:00 Navin Ipe <[email protected]>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've seen the explanations
>>>>>> <http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/>,
>>>>>> but none of them explain it in terms of what I see in the code. This is
>>>>>> what I understood:
>>>>>>
>>>>>> int BoltParallelism = 3;
>>>>>> int BoltTaskParallelism = 2;
>>>>>> builder.setBolt("bolt1", new BoltA(), *BoltParallelism*)
>>>>>>                 .setNumTasks(*BoltTaskParallelism*)
>>>>>>
>>>>>> BoltParallelism creates 3 instances of BoltA. These are the executors.
>>>>>> BoltTaskParallelism allows Tuples to come into BoltA very fast, and
>>>>>> the Bolt creates a task for processing each incoming Tuple. If there are
>>>>>> not enough tasks, then the excess Tuples are made to wait in a queue of 
>>>>>> the
>>>>>> executor.
>>>>>>
>>>>>> Strange thing is that the explanation says the tasks are run in a
>>>>>> single thread, so obviously I misunderstood something. Could you help me
>>>>>> understand it?
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Navin
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Navin
>>>>
>>>
>>>
>>
>

Re: Understanding parallelism in Storm

Reply via email to