Blizzara commented on code in PR #17299:
URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323707179


##########
datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs:
##########
@@ -62,7 +62,17 @@ pub async fn from_project_rel(
                 // to transform it into a column reference
                 window_exprs.insert(e.clone());
             }
-            explicit_exprs.push(name_tracker.get_uniquely_named_expr(e)?);
+            // Since substrait removes aliases, we need to assign literals 
with a UUID alias to avoid
+            // ambiguous names when the same literal is used before and after 
a join.
+            // The name tracker will ensure that two literals in the same 
project would have
+            // unique names but, it does not ensure that if a literal column 
exists in a previous
+            // project say before a join that it is deduplicated with respect 
to those columns.

Review Comment:
   I think the problem is the NameTracker doesn't ignore qualifiers, but the 
"ambiguous schema" check does. Thus if the input to the Project has e.g. 
"table1.NULL" column and adds a "NULL" column (from `lit(NULL)`), the 
NameTracker doesn't rename the newly added column, and then we get both 
`table1.NULL` and `NULL` columns which fails the ambiguous check.
   
   I think my recommendation would be to make the NameTracker more robust 
instead, so that it ignores the qualifiers at least when there is also a 
non-qualified name. While this UUID-aliasing of literals seems like it should 
work for this specific case, I can imagine there might be some other case where 
the clash happens with non-literal columns (though I'm not able to come up with 
an example right now).
   
   (Also hey 👋  @xanderbailey!)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to