mbutrovich commented on PR #17195:
URL: https://github.com/apache/datafusion/pull/17195#issuecomment-3271252630

   > Can you explain what is happening in comet? For example, is regexp_replace 
being called with `regexp_replace(UTF8View, Utf8View, Utf8View)`
   
   In my experimental branch adding `StringView` support to Comet, we need a 
way to represent string literals during serialization from the Spark side to 
DataFusion. Currently all string literals come over as `Utf8` and that just 
works. However, with `Utf8View` columns coming out of the Parquet reader, Arrow 
complains about not being able to evaluate filter expressions with mismatched 
types. I changed all string literals to be `Utf8View`, which underneath doesn't 
really change anything underneath for single `ScalarValue`s. Now, however, I 
have problems with functions like `regexp_replace` which expect literals to be 
`Utf8`. Since Comet does not use DataFusion's front-end, we don't get the cast 
operations inserted into the plan that the signature logic is designed for.
   
   I am increasingly of the mind that Comet needs to start doing some passes 
over the physical plan, and type coercion like this might be one reason.
   
   I think this PR is good to go, but also am okay if we think it's needless 
complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to