rkrishn7 commented on issue #13510: URL: https://github.com/apache/datafusion/issues/13510#issuecomment-2626258439
Hello! I've dug into this issue a bit and it seems that the problem here arises from the fact that the table name for each table scan in the plan defaults to `UNNAMED_TABLE` (`"?table?"`). Because of this, column references in the "on" portion of the join node are ambiguous. In fact, if we dedup the references by either changing column names or qualifying at least one table in the example with an alias, it works because type coercion is able to determine that a cast is necessary. Without this, the data type lookup for the columns during type coercion will yield that associated with the first matched field in the schema (the left-hand side). Thus, there appears to be no cast necessary. Other join types (e.g. inner) fail earlier in planning with a `DuplicateQualifiedField` error because they compute the _joined_ schema, performing duplicate name checks as a result. I'm thinking a good path forward here may be to perform this validation for all join types? That way we fail earlier in planning. Would love some feedback/thoughts! Thanks! (FYI @alamb I tested #13370 rebased off the latest main and it does not fix the issue. But this is expected based on the description above! 😅 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org