alamb commented on issue #12510:
URL: https://github.com/apache/datafusion/issues/12510#issuecomment-2366773157

   I looked into this issue more -- I think fundamentally the schema is 
different in the files, and there isn't any way, short of some sort of 
configuration to cast Binary --> String always, we would be able to special 
case this
   
   hits.parquet
   ```
   Metadata for file: hits.parquet
   
   version: 1
   num of rows: 99997497
   created by: parquet-cpp version 1.5.1-SNAPSHOT
   message schema {
     REQUIRED INT64 WatchID;
     REQUIRED INT32 JavaEnable (INTEGER(16,true));
     REQUIRED BYTE_ARRAY Title (STRING);
   ...
   ```
   
   Thus I am closing this issue as won't do -- please let me know if you have 
found something different @thinh2 
   
   hits_partitioned/hits_55.parquet
   ```
   Metadata for file: hits_partitioned/hits_55.parquet
   
   version: 1
   num of rows: 1000000
   created by: parquet-cpp version 1.5.1-SNAPSHOT
   message schema {
     OPTIONAL INT64 WatchID;
     OPTIONAL INT32 JavaEnable (INTEGER(16,true));
     OPTIONAL BYTE_ARRAY Title;
   ...
   ```
   
   
[hits_55.parquet.schema.txt](https://github.com/user-attachments/files/17089765/hits_55.parquet.schema.txt)
   
[hits.parquet.schema.txt](https://github.com/user-attachments/files/17089766/hits.parquet.schema.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to