[ https://issues.apache.org/jira/browse/SPARK-49881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Asif updated SPARK-49881: ------------------------- Target Version/s: 4.1 Affects Version/s: 4.1 > SPIP : Improving analyzer performance by skipping DeduplicateRelations rule > conditionally > ----------------------------------------------------------------------------------------- > > Key: SPARK-49881 > URL: https://issues.apache.org/jira/browse/SPARK-49881 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0, 3.5.3, 4.1 > Reporter: Asif > Priority: Major > > In many cases, it has been observed that DeduplicateRelations rule, though > essential, but by its nature has impacted query analysis time to big extent > especially when dealing with large query plans. > It appears that in many situations we can guarantee that there would be no > duplicate relations present and thus avoid applying the rule. > Those situations are : > 1) When dataframe api's like select/filter which operate on existing > dataframe, are used. > > Also if we store the MultiInstanceRelations in the QueryExecution ,for a > given plan, then we can make use of that information , while creating new > dataframes where there is a possibility of duplicate relations ( like join, > union, intersection etc). > If two datasets being unioned/intersected/joined ..etc have no common > MultiInstanceRelation , then it should be safe to assume that there is no > possibility of duplicate relations , there by allowing skipping the Dedup rule > > Atleast that is the idea. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org