mbutrovich opened a new pull request, #1142: URL: https://github.com/apache/datafusion-comet/pull/1142
The current logic takes the data schema and the required schema from the Java side (in the scan node) and: 1. Converts back to a Parquet schema 2. Serializes it to the native side 3. Parses it to a schema descriptor 4. Converts that to an Arrow schema This process is introducing conversion errors that are difficult to recover from (e.g. Timestamp(milli) -> INT96 -> Timestamp(nano)). This PR simplifies the schema serialization and conversion to native side, building on what @viirya did with the partition schema (thank you for the inspiration!). In this PR, data schema and required schema are now serialized as Spark types. On the native side they are converted to Arrow types. We also now serialize more schema info (column names, nullability) than we did for just partition schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org