Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

via GitHub Sun, 12 Jan 2025 23:09:38 -0800


jonahgao commented on code in PR #14102:
URL: https://github.com/apache/datafusion/pull/14102#discussion_r1912749696



##########
datafusion/expr/src/utils.rs:
##########
@@ -705,27 +711,20 @@ pub fn exprlist_to_fields<'a>(
         .map(|e| match e {
             Expr::Wildcard { qualifier, options } => match qualifier {

Review Comment:
   Although we have moved wildcard expansions to the analyzer 
https://github.com/apache/datafusion/pull/11681, it still does wildcard 
expansions when computing plan schemas(in 
[exprlist_to_fields](https://github.com/apache/datafusion/blob/f9cc3325cdb5891b7566a6f3503c1f7ac6ad51e0/datafusion/expr/src/utils.rs#L696)
 and 
[exprlist_len](https://github.com/apache/datafusion/blob/f9cc3325cdb5891b7566a6f3503c1f7ac6ad51e0/datafusion/expr/src/utils.rs#L797C8-L797C20)).
 I wonder if performing wildcard expansions before computing schemas would be 
simplier, at least it would avoid redundant work.



##########
datafusion/expr/src/utils.rs:
##########
@@ -379,14 +379,12 @@ fn get_exprs_except_skipped(
     }
 }
 
-/// Resolves an `Expr::Wildcard` to a collection of `Expr::Column`'s.
-pub fn expand_wildcard(
-    schema: &DFSchema,
-    plan: &LogicalPlan,
-    wildcard_options: Option<&WildcardOptions>,
-) -> Result<Vec<Expr>> {
+/// For each column specified in the USING JOIN condition, the JOIN plan 
outputs it twice
+/// (once for each join side), but an unqualified wildcard should include it 
only once.
+/// This function returns the columns that should be excluded.
+fn exclude_using_columns(plan: &LogicalPlan) -> Result<HashSet<Column>> {

Review Comment:
   Although we have moved wildcard expansions to the analyzer #11681, it still 
does wildcard expansions when computing plan schemas(in 
[exprlist_to_fields](https://github.com/apache/datafusion/blob/f9cc3325cdb5891b7566a6f3503c1f7ac6ad51e0/datafusion/expr/src/utils.rs#L696)
 and 
[exprlist_len](https://github.com/apache/datafusion/blob/f9cc3325cdb5891b7566a6f3503c1f7ac6ad51e0/datafusion/expr/src/utils.rs#L797C8-L797C20)).
  I wonder if performing wildcard expansions before computing schemas would be 
simplier, at least it would avoid redundant work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

Reply via email to