[ 
https://issues.apache.org/jira/browse/SPARK-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620517#comment-15620517
 ] 

Ioana Delaney commented on SPARK-17791:
---------------------------------------

[~ron8hu] I appreciate your comment. Thank you. I agree that the algorithm will 
have to evolve as CBO introduces new features such as cardinality, predicate 
selectivity, and ultimately the cost-based planning itself. The current 
proposal is conservative in choosing a star plan and can be made even more 
conservative. I can look at what CBO implements today for the number of 
distinct values and base table cardinality as suggested by [~wangzhenhua]. A 
check for pseudo RI using these two estimates can be easily incorporated into 
our current star-schema detection. 

The algorithm is also disabled by default. We can keep it disabled until we 
have a tighter integration with CBO. But there are advantages in letting the 
code in before CBO is completely implemented. From an implementation point of 
view, this will allow us to incrementally deliver our work. Then, given its 
good performance results, the feature can be enabled on demand for warehouse 
workloads that can take advantage of star join planning.

Thank you.


> Join reordering using star schema detection
> -------------------------------------------
>
>                 Key: SPARK-17791
>                 URL: https://issues.apache.org/jira/browse/SPARK-17791
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Ioana Delaney
>            Assignee: Ioana Delaney
>            Priority: Critical
>         Attachments: StarJoinReordering1005.doc
>
>
> This JIRA is a sub-task of SPARK-17626.
> The objective is to provide a consistent performance improvement for star 
> schema queries. Star schema consists of one or more fact tables referencing a 
> number of dimension tables. In general, queries against star schema are 
> expected to run fast  because of the established RI constraints among the 
> tables. This design proposes a join reordering based on natural, generally 
> accepted heuristics for star schema queries:
> * Finds the star join with the largest fact table and places it on the 
> driving arm of the left-deep join. This plan avoids large tables on the 
> inner, and thus favors hash joins. 
> * Applies the most selective dimensions early in the plan to reduce the 
> amount of data flow.
> The design description is included in the below attached document.
> \\



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to