alamb opened a new issue, #12119:
URL: https://github.com/apache/datafusion/issues/12119

   ### Is your feature request related to a problem or challenge?
   
   Part of https://github.com/apache/datafusion/issues/11752
   
   We are trying to change DataFusion to use StringViewArray by default when 
reading parquet (and, for example, when it makes more sense such as the 
`substr` function), StringView enables many  interesting optimization 
opportunities. However, as StringView is still being adopted across the rest of 
the arrow ecosystem, if DataFusion begins to emit `StringViewArray` in some 
places, it may cause issues with other parts of the ecosystem (e.g. flight 
clients may not be able to interpret data sent by a server using DataFusion)
   
   
   ### Describe the solution you'd like
   
   I would like DataFusion to retain maximum compatibility at the interfaces, 
but be able to use StringViewArray internally when it improves performance
   
   ### Describe alternatives you've considered
   
   I recommend a config flag that makes it possible to convert 
`Utf8View`/`BinaryView` --> `Utf8` / `Binary` at the query output and I think 
this conversion should be done by default. 
   
   For example we might add this configuration flag:
   
   
   ```
   datafusion.optimizer.expand_views_at_output=true
   ```
   
   If this flag is true, 
   1. add code in the Analyzer (maybe in the TypeCOercion code)
   2.  check the output columns of a plan, and if any are `DataType::Utf8View` 
or `DataType::BinaryView`, add ProjectionExec` that converts them to 
Utf8/Binary (by adding a cast to `DataType::Utf8` or `DataType::Binary` 
respectively
   
   
   
   
   
   ### Additional context
   
   We already have to do something similar in flight with dictionary arrays


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to