LiaCastaneda opened a new issue, #14799: URL: https://github.com/apache/datafusion/issues/14799
### Describe the bug I have encountered an error while doing a query with a join: `failed to optimize plan: Internal error: Input field name count(Int64(1)) does not match with the projection expression count(Int64(1)):1` This occurs when I do a join and the resulting schema has more than 2 fields with a count(*) (or any aggregator), I understand that datafusion internally tries to rename the column names to be able to create a DFSchema, since arrow does not support duplicate names. While debugging I saw that what it does internally is: 1. The name tracker sees `count(Int64(1))` 2. The name traccker sees `count(Int64(1))`again and renames it to `count(Int64(1)):1` 3. The name traccker sees `count(Int64(1))`again and renames it to `count(Int64(1)):2` or `count(Int64(1)):1` again? I have another related error that arises when trying to create the logical plan that is: `failed to translate modified plan to DataFusion: Schema error: Schema contains duplicate unqualified field name "count(Int64(1)):1"` On my local clone I managed to make datafusion generate a logical plan by tweaking [this function ](https://github.com/apache/datafusion/blob/84232d806070f4b177e35148255c501a5f6e740b/datafusion/expr/src/logical_plan/builder.rs#L1392) but then when trying to build the physical plan I get the error I mentioned first. ### To Reproduce ``` CREATE TABLE left_table ( id INT PRIMARY KEY, category TEXT, timestamp TIMESTAMP ); CREATE TABLE right_table ( id INT PRIMARY KEY, category TEXT, timestamp TIMESTAMP ); INSERT INTO left_table (id, category, timestamp) VALUES (1, 'business_logic', '2024-02-15 10:00:00'), (2, 'attack_attempt', '2024-02-15 10:05:00'); INSERT INTO right_table (id, category, timestamp) VALUES (1, 'info', '2024-02-15 10:10:00'), (2, 'low', '2024-02-15 10:15:00'); WITH first_agg AS ( SELECT id, COUNT(*) AS count_first FROM left_table GROUP BY id ), second_agg AS ( SELECT id, COUNT(*) AS count_second FROM right_table GROUP BY id ), third_agg AS ( SELECT id, COUNT(*) AS count_third FROM right_table GROUP BY id ), fourth_random_table AS ( SELECT id, category FROM right_table GROUP BY id, category ) select count_first, category, count_second, count_third from first_agg LEFT JOIN fourth_random_table using (id) LEFT JOIN second_agg using (id) LEFT JOIN third_agg using (id) ` ``` ### Expected behavior _No response_ ### Additional context I'm using version 43.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org