This is the core function:
https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java
public List<Integer> chooseTasks(int taskId, List<Object> values,
LoadMapping load) {
   if ((lastUpdate + 1000) < System.currentTimeMillis()) {
     int local_total = 0;
     for (int i = 0; i < targets.length; i++) {
       int val = (int)(101 - (load.get(targets[i]) * 100));
       loads[i] = val;
       local_total += val;
     }
     total = local_total;
     lastUpdate = System.currentTimeMillis();
     }
     int selected = random.nextInt(total);
     int sum = 0;
     for (int i = 0; i < targets.length; i++) {
       sum += loads[i];
       if (selected < sum) {
         return rets[i];
       }
    }
    return rets[rets.length-1];
}

My question: If we have to select the target task with min/max load, why
are using a random value from the total load and comparing all target tasks
to it?

Best regards,
Yamini Joshi

On Tue, Apr 26, 2016 at 11:04 AM, Rodrigo Valladares <
[email protected]> wrote:

> I was doing some research on this topic. I checked the code a while back
> and from what I understood  this grouping balances the tuples according to
> the current queue of each executor.
>
> For example Bolt A have 4 executors 2 with a queue at 50% capacity and 2
> at 40% capacity. The distribution would be proportional to "left capacity
> on the queue" (1 - capacity) so for executor 1 and 2 it would be:
> 0.5/(0.5*2 +0.6*2) .
>
> I think it was built taking heterogeneous clusters in consideration. In
> cases where some machines are faster than others shuffle grouping perform
> very poorly since it does not take into consideration that executors have
> different latencies. You might expect that slower machine will built a
> queue faster on its executors so this balancing would address this
> limitation. I did not develop or tested this so I can't assure you it will
> work like this, but I think this the idea behind it.
>
> 2016-04-26 10:46 GMT-05:00 Tom Brown <[email protected]>:
>
>> Hi Yamini,
>>
>> Your question piqued my interest, so I decided to see what I could figure
>> out. After reading this JIRA
>> https://issues.apache.org/jira/browse/STORM-162, I am still confused.
>> Hopefully someone else on the list will be able to help us.
>>
>> Is LoadAwareShuffleGrouping a way of changing a basic shuffle operation
>> (e.g. randomized round robin) to include downstream load as an input? Or is
>> it an extension of a fields grouping that could allow the same tuple to be
>> sent to different downstream tasks (A or B) depending on which has the
>> higher load?
>>
>> --Tom
>>
>> On Tue, Apr 26, 2016 at 8:53 AM, Yamini Joshi <[email protected]>
>> wrote:
>>
>>> Hello everyone!
>>>
>>> I am new to Storm and I was looking into different stream grouping when
>>> I came across LoadAwareShuffleGrouping. Can someone tell me what it is
>>> exactly? Has it been included in the latest storm build? All my google
>>> searches on this point to JIRA tickets.
>>>
>>> Any help is appreciated.
>>>
>>> Thank you.
>>>
>>>
>>>
>>>
>>
>
>
> --
> Rodrigo Valladares Cotta
> Master's Student, Computer Science
> University of Nebraska-Lincoln
>

Reply via email to