fullstart opened a new issue, #14147: URL: https://github.com/apache/datafusion/issues/14147
### Describe the bug Encountered an issue joining dataframes with duplicate column names if they generated from file read (I tried csv and parquet). Dataframes produced from python dict do join without problem. I did my testing with latest version of Datafusion on Windows. ### To Reproduce Fine with dataframes from dict ``` from datafusion import SessionContext ctx = SessionContext() x1 = ctx.from_pydict({'id1': [1, 2, 4, 5, 6], 'col2': [3, 4, 3, 5, 2], 'col3': [3, 4, 1, 2, 3]}) x2 = ctx.from_pydict({'id1': [1, 2, 4, 5, 6], 'col2': [3, 4, 3, 5, 2], 'col3': [5, 6, 7, 8, 9]}) x1.join(x2, on="id1") Out[16]: DataFrame() +-----+------+------+-----+------+------+ | id1 | col2 | col3 | id1 | col2 | col3 | +-----+------+------+-----+------+------+ | 1 | 3 | 3 | 1 | 3 | 5 | | 2 | 4 | 4 | 2 | 4 | 6 | | 4 | 3 | 1 | 4 | 3 | 7 | | 5 | 5 | 2 | 5 | 5 | 8 | | 6 | 2 | 3 | 6 | 2 | 9 | +-----+------+------+-----+------+------+ ``` Continue to file read ``` x1.write_csv("df1.csv") x2.write_csv("df2.csv") x1_f = ctx.read_csv("df1.csv") x2_f = ctx.read_csv("df2.csv") x1_f.join(x2_f, on="id1") --------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In[21], line 1 ----> 1 x1_f.join(x2_f, on="id1") File ~\prj\datafusion_test\venv\Lib\site-packages\datafusion\dataframe.py:468, in DataFrame.join(self, right, on, how, left_on, right_on, join_keys) 465 if isinstance(right_on, str): 466 right_on = [right_on] --> 468 return DataFrame(self.df.join(right.df, how, left_on, right_on)) Exception: Schema error: No field named id1. Valid fields are "?table?"."1", "?table?"."3". ``` ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org