comphead commented on issue #15162: URL: https://github.com/apache/datafusion/issues/15162#issuecomment-2714937399
Agree. Arrow-rs is not very configurable unlike to DataFusion, and to be honest I would love if Arrow-rs support external configs so it could be more flexible for common areas like Parquet reader for INT96 or redefine some other behavior. But this is another topic. :) > IMO this is the issue, the spark code is not returning data with the same schema as is then used to construct the RecordBatch. Where does the schema provided to the RecordBatch constructor come from? IMO either this schema needs to be updated to match what spark is actually returning, or the spark code needs to be updated to return the expected schema (e.g. by coercing on output). Apache Spark expects `element` to come back, this is hardcoded value the same as `item` in Arrow-rs and Apache Spark users rely on this naming, changing will break the Apache Spark users queries. If talking about the specific case for now it is `make_array` function https://github.com/apache/datafusion/blob/8f3f70877febaa79be3349875e979d3a6e65c30e/datafusion/functions-nested/src/make_array.rs#L278 It this particular code the column arrays schema created as `item` although the schema is `element`, and in this place there is no `SessionContext` where we could read the external params and parametrize the Listtype with specific field name `Field::new`. It is possible to do in DataFusion although it is gonna be a huge change to cover all array functions, so the first idea was if it is possible to have a solution on arrow-rs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org