Thanks for the response Andrew.
*1. The approach*
The approach I mentioned will not introduce any new skew, so it should only
be worsen performance if the user was relying on the shuffle to fix skew
they had before.
The user can address this by either not introducing their own skewed
partition in
“Thoughts on this approach?“
Just to warn you this is a hazardous optimization without cardinality
information. Removing columns from the hash exchange reduces entropy
potentially resulting in skew. Also keep in mind that if you reduce the number
of columns on one side of the join you need todo
I am going to review this carefully today. Thanks for the work!
Li
On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon wrote:
> Thanks for comments Maciej - I am addressing them.
> adding Li Jin too.
>
> I plan to proceed this late this week or early next week to make it on
> time before code freeze.