comphead commented on issue #15162:
URL: https://github.com/apache/datafusion/issues/15162#issuecomment-2714937399

   Agree. Arrow-rs is not very configurable unlike to DataFusion, and to be 
honest I would love if Arrow-rs support external configs so it could be more 
flexible for common areas like Parquet reader for INT96 or redefine some other 
behavior. But this is another topic. :) 
   
   > IMO this is the issue, the spark code is not returning data with the same 
schema as is then used to construct the RecordBatch. Where does the schema 
provided to the RecordBatch constructor come from? IMO either this schema needs 
to be updated to match what spark is actually returning, or the spark code 
needs to be updated to return the expected schema (e.g. by coercing on output).
   
   Apache Spark expects `element` to come back, this is hardcoded value the 
same as `item` in Arrow-rs and Apache Spark users rely on this naming, changing 
will break the Apache Spark users queries.
   
   If talking about the specific case for now it is `make_array` function 
   
https://github.com/apache/datafusion/blob/8f3f70877febaa79be3349875e979d3a6e65c30e/datafusion/functions-nested/src/make_array.rs#L278
   
   It this particular code the column arrays schema created as `item` although 
the schema is `element`, and in this place there is no `SessionContext` where 
we could read the external params and parametrize the Listtype with specific 
field name `Field::new`. It is possible to do in DataFusion although it is 
gonna be a huge change to cover all array functions, so the first idea was if 
it is possible to have a solution on arrow-rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to