gabotechs opened a new issue, #20041:
URL: https://github.com/apache/datafusion/issues/20041

   ### Describe the bug
   
   When reading parquet files with dictionary-encoded columns, if a file has 
constant column values (detected from statistics where min == max), the scan 
fails with a schema mismatch
   
   error:
   
   ArrowError(InvalidArgumentError("column types must match schema types, 
expected Dictionary(UInt16, Utf8) but found Utf8 at column index 1"))
   
   The root cause is in constant_value_from_stats() in opener.rs. When 
statistics indicate a column has a constant value, that value is used as a 
literal replacement in the projection. However, the statistics store values 
using the "unpacked" type (e.g., Utf8) rather than the dictionary type (e.g., 
Dictionary(UInt16, Utf8)), causing a type mismatch when constructing the output 
batch.
   
   
   ### To Reproduce
   
   Steps to reproduce in @gene-bordegaray's PR here 
https://github.com/datafusion-contrib/datafusion-distributed/pull/324
   
   ### Expected behavior
   
   The query should succeed, with the constant value correctly cast to the 
expected dictionary type before being used as a literal replacement in the 
projection.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to