Hi!

In shuffle grouping, you cannot assume that some executors continuously
receive tuples which points to hotspot region since tuples are distributed
randomly.
Worst case is that hotspot region tuples are going to specific executors.
Tuples are shuffled so sometimes it can be happened, but it would not be
happened all the time.
If you use field grouping with rowkey issue can be raised. In this case,
partial key grouping which is planned to be introduced at 0.10.0 may help
you (or not).

Hope this helps.

Best,
Jungtaek Lim (HeartSaVioR)




2015-06-14 1:38 GMT+09:00 Banias H <[email protected]>:

> I have a topology in which one of the bolts read from HBase. That bolt is
> setup to have one task per executor, and it got tuples from shuffle
> grouping so every executor of the bolt will have the same number of tuples.
>
> The problem is that some executors will take longer than others (because
> of hot-spotting in HBase region servers). For example, 5% of the executors
> have latency of 100ms while the rest 95% have around 25ms. Now with the
> guarantee of equal number of tuples per executors, my understanding is that
> 95% of the fast executors will have to wait. Thus it brings down the
> throughput. Please correct me if I mis-understand.
>
> Ideally, I would love to have a load-balance-biased shuffle grouping so
> that the 95% of the fast executors would get more tuples.
>
> Is this something I can leverage other existing groupings or patterns to
> implement? In an earlier post entitled "long running bolts", Mike Thomsen
> and Nathan Leung discussed a nice idea of taking long running tuples
> elsewhere (paraphrase below):
>
> "*create tasks for the bolt using the same class but different name in
> the topology... route long running bolts (without acks) to the separate
> instances and they will not affect your normal processing*"
>
> Since my long running executors are not taking that long (just around
> 100ms), this may not be worth the effort to take the tuples elsewhere.
>
> I would appreciate any comment and suggestions. Many thanks.
>
> BH
>



-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Reply via email to