[
https://issues.apache.org/jira/browse/SPARK-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin closed SPARK-17375.
-------------------------------
Resolution: Duplicate
Marking this as duplicate of SPARK-17626
> Star Join Optimization
> ----------------------
>
> Key: SPARK-17375
> URL: https://issues.apache.org/jira/browse/SPARK-17375
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Yan
>
> The star schema is the simplest style of data mart schema and is the approach
> often seen in BI/Decision Support systems. Star Join is a popular SQL query
> pattern that joins one or (a few) fact tables with a few dimension tables in
> star schemas. Star Join Query Optimizations aim to optimize the performance
> and use of resource for the star joins.
> Currently the existing Spark SQL optimization works on broadcasting the
> usually small (after filtering and projection) dimension tables to avoid
> costly shuffling of fact table and the "reduce" operations based on the join
> keys.
> This improvement proposal tries to further improve the broadcast star joins
> in the two areas:
> 1) avoid materialization of the intermediate rows that otherwise could
> eventually not make to the final result row set after further joined with
> other dimensions that are more restricting;
> 2) avoid the performance variations among different join orders. This could
> also have been largely achieved by cost analysis and heuristics and selecting
> a reasonably optimal join order. But we are here trying to achieve similar
> improvement without relying on such info.
> A preliminary test against a small TPCDS 1GB data set indicates between
> 5%-40% improvement (with codegen disabled on both tests) vs. the multiple
> broadcast joins on one Query (Q27) that inner joins 4 dimension table with
> one fact table. The large variation (5%-40%) is due to the different join
> ordering of the 4 broadcast joins. Tests using larger data sets and other
> TPCDS queries are yet to be performed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]