Hi,

I got a chance to re-run this query today and it does auto-reduce the new
CUSTOM_EDGE join as well.

However it does it too early, before it has got enough information about
both sides of the join.

TPC-DS Query79 at 1Tb scale generates a CUSTOM_EDGE between the ms alias
and 
customer tables.

They're 13.7 million & 12 million rows each - the impl is not causing any
trouble right now because they're roughly equal.

This might be a feature I might disable for now, since the
ShuffleVertexManager
has no idea which side is destined for the hashtable build side & might
squeeze
the auto-reducer tighter than it should causing a possible OOM.

Plus, the size that the system gets is actually the shuffle size, so it is
post-compression
and varies with the data compressibility (for instance, clickstreams
compress really well
due to the large number of common strings).

A pre-req for the last problem would be to go fix
<https://issues.apache.org/jira/browse/TEZ-2962>,
which tracks pre-compression sizes.

I'll have to think a bit more about the other part of the problem.

Cheers,
Gopal

On 7/6/16, 12:52 PM, "Gopal Vijayaraghavan" <go...@hortonworks.com on
behalf of gop...@apache.org> wrote:

>
>> I tried running the shuffle hash join with auto reducer parallelism
>>again. But, it didn't seem to take effect. With merge join and auto
>>reduce parallelism on, number of
>> reducers drops from 1009 to 337, but didn't see that change in case of
>>shuffle hash join .Should I be doing something more ?
>
>In your runtime, can you cross-check the YARN logs for
>
>"Defer scheduling tasks"
>
>and 
>
>"Reduce auto parallelism for vertex"
>
>
>The kick-off point for both join algorithms are slightly different.
>
>The merge-join uses stats from both sides (since until all tasks on both
>sides are complete, no join ops can be performed).
>
>While the dynamic hash-join uses stats from only the hashtable side to do
>this, since it can start producing rows immediately after the hashtable
>side is complete & at least 1 task from the big-table side has completed.
>
>There's a scenario where the small table is too small even when it is
>entirely complete at which point the system just goes ahead with reduce
>tasks without any reduction (which would be faster than waiting for the
>big table to hit slow-start before doing that).
>
>Cheers,
>Gopal


Reply via email to