alamb commented on PR #15135: URL: https://github.com/apache/datafusion/pull/15135#issuecomment-2719004026
> There seems to have been quite a few cracks at this particular situation: > > * [Join two tables with the same schema, then union throws `SchemaError(DuplicateUnqualifiedField )` #5410](https://github.com/apache/datafusion/issues/5410) > * [Strip table qualifiers from schema in `UNION ALL` for unparser #11082](https://github.com/apache/datafusion/pull/11082) > * [feat: Implement UNION ALL BY NAME #14538](https://github.com/apache/datafusion/pull/14538) > * [Strip table qualifiers from schema in `UNION ALL` #10707](https://github.com/apache/datafusion/pull/10707) > > The trouble seems to stem from the decision to maintain `TableReference` qualifiers in unions for disambiguation. This makes it very difficult to know when these qualifiers should or shouldn't be passed through at different points in the optimizer. Unless there are well specified semantics around this that I'm missing, I think it would be better go back to the assumption that qualifiers never persist through a union. And we can continue to support field disambiguation by generating unique column names as is done in DuckDB ([deduplicating-identifiers](https://duckdb.org/docs/stable/sql/dialect/keywords_and_identifiers.html#deduplicating-identifiers)) Maybe @jonahgao can offer an opinion on this suggestion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org