Hi Xiao,
Performance-wise, without the manual tuning, the query cannot be finished, and
with the tuning the query can finish in minutes in TPCH 100G data.
I have created https://issues.apache.org/jira/browse/SPARK-11704 and
https://issues.apache.org/jira/browse/SPARK-11705 for these two issues,
Hi, Zhan,
That sounds really interesting! Please at me when you submit the PR. If
possible, please also posted the performance difference.
Thanks,
Xiao Li
2015-11-11 14:45 GMT-08:00 Zhan Zhang :
> Hi Folks,
>
> I did some performance measurement based on TPC-H recently, and want to
> bring up
Hi Folks,
I did some performance measurement based on TPC-H recently, and want to bring
up some performance issue I observed. Both are related to cartesian join.
1. CartesianProduct implementation.
Currently CartesianProduct relies on RDD.cartesian, in which the computation is
realized as foll