[I] Schema error after loading parquet stored with datafusion.execution.keep_partition_by_columns = TRUE [datafusion]

via GitHub Sat, 06 Sep 2025 21:12:05 -0700


valkum opened a new issue, #17420:
URL: https://github.com/apache/datafusion/issues/17420


   ### Describe the bug
   
   When reading a parquet hive that was stored with 
`datafusion.execution.keep_partition_by_columns = TRUE`, the created table has 
two columns with the same name, raising a `Schema error: Schema contains 
duplicate qualified field name table.<partition_col>`
   
   ### To Reproduce
   
   Create sample table with at least a group column to partition by.
   ```SET datafusion.execution.keep_partition_by_columns = TRUE
   COPY table TO 'test'
   STORED AS PARQUET
   PARTITION BY (group)
   ```
   then
   ```
   CREATE EXTERNAL TABLE test2
   STORED AS PARQUET
   PARTITIONED BY (group)
   LOCATION 'test';
   
   SELECT * FROM test2;
   ```
   
   
   
   ### Expected behavior
   
   I guess in a session with `datafusion.execution.keep_partition_by_columns = 
TRUE` it makes sense to drop one of those two columns.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Schema error after loading parquet stored with datafusion.execution.keep_partition_by_columns = TRUE [datafusion]

Reply via email to