gruuya commented on PR #20819:
URL: https://github.com/apache/datafusion/pull/20819#issuecomment-4088922846
> This approach doesn't handle queries like:
>
> SELECT a, b FROM t1 UNION ALL SELECT count(*), count(*) FROM t2;
Added a fix (and tests) for this now.
> I wonder if it would be cleaner to arrange not to check for duplicate
column names in set operation queries that aren't the left-most query? Since
the column names for such queries are discarded anyway, it seems a bit
laborious to first rewrite them to ensure they are unique, and then check that
they are indeed unique, before discarding them anyway.
Indeed, whilst in theory this sounds like the best approach, in practice it
would require a much more substantial change. We'd need to
1. extend `PlannerContext` with a flag to denote that we're planning a set
expression
2. set that flag in in `set_expr_to_plan`, right after we plan the left side
3. thread the flag down into `SqlToRel::project`
4. then we'd either need to
a. introduce a breaking change to `LogicalPlanBuilder::project` in order
to pass that flag, or
b. introduce a new method to `LogicalPlanBuilder`, something like
`project_without_validation` where we'd skip `validate_unique_names` (but not
normalize & columnize aspect of `project_with_validation`)
5. unset the flag at the outermost set expression (i.e. where we've set it),
so that further sql->plan calls perform go through `validate_unique_names`
All in all it's more verbose and complex compared to this approach, but if
you'd like i could open a PR for that so that we can compare.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]