goldmedal commented on PR #11035:
URL: https://github.com/apache/datafusion/pull/11035#issuecomment-2293855497
> Yes, I saw something like that in the code: using tmp_table as the default
alias. But I'm not sure if it is the right way, because it might cause problems
when resolving column names?
@holicc
After some experimentation, I found that it's not straightforward. I tried
implementing a `TableProvider` with a custom `get_logical_plan` method to set
an alias for the table by default. However, I found that the internal plan is
invoked during the analysis phase, which is too late to modify column names
since all projections have already been planned.
The plan will look like this:
```sql
> EXPLAIN SELECT sum(a) FROM
'/Users/jax/git/datafusion/datafusion/core/tests/data/2.json'
+---------------+-------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Aggregate: groupBy=[[]],
aggr=[[sum(/Users/jax/git/datafusion/datafusion/core/tests/data/2.json.a)]]
|
| | SubqueryAlias:
/Users/jax/git/datafusion/datafusion/core/tests/data/2.json
|
| | TableScan: ?url? projection=[a]
|
| physical_plan | AggregateExec: mode=Final, gby=[],
aggr=[sum(/Users/jax/git/datafusion/datafusion/core/tests/data/2.json.a)]
|
| | CoalescePartitionsExec
|
| | AggregateExec: mode=Partial, gby=[],
aggr=[sum(/Users/jax/git/datafusion/datafusion/core/tests/data/2.json.a)] |
| | RepartitionExec: partitioning=RoundRobinBatch(8),
input_partitions=1 |
| | JsonExec: file_groups={1 group:
[[Users/jax/git/datafusion/datafusion/core/tests/data/2.json]]}, projection=[a]
|
| |
|
+---------------+-------------------------------------------------------------------------------------------------------------------------+
```
If we want to improve readability, we might need to create an `AnalyzerRule`
for it. However, this is not easy due to the complexity of column resolution,
as you mentioned. I think that we could address this issue in a separate pull
request if needed.
A simpler solution is to manually add an alias when querying:
```sql
> EXPLAIN SELECT sum(a) FROM
'/Users/jax/git/datafusion/datafusion/core/tests/data/2.json' as t
+---------------+-------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Aggregate: groupBy=[[]], aggr=[[sum(t.a)]]
|
| | SubqueryAlias: t
|
| | TableScan:
/Users/jax/git/datafusion/datafusion/core/tests/data/2.json projection=[a]
|
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[sum(t.a)]
|
| | CoalescePartitionsExec
|
| | AggregateExec: mode=Partial, gby=[], aggr=[sum(t.a)]
|
| | RepartitionExec: partitioning=RoundRobinBatch(8),
input_partitions=1 |
| | JsonExec: file_groups={1 group:
[[Users/jax/git/datafusion/datafusion/core/tests/data/2.json]]}, projection=[a]
|
| |
|
+---------------+-------------------------------------------------------------------------------------------------------------------------+
```
This is a straightforward way to produce a more readable plan without
complicating the code.
cc @alamb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]