alamb opened a new issue, #12180:
URL: https://github.com/apache/datafusion/issues/12180

   ### Is your feature request related to a problem or challenge?
   
   Part of https://github.com/apache/datafusion/issues/11752
   
   As we work to complete StringView support in DataFusion @2010YOUY01  noticed 
on https://github.com/apache/datafusion/issues/11752#issuecomment-2308176932 
that we don't currently support Regexp like binary operators 
https://datafusion.apache.org/user-guide/sql/operators.html#op-re-match for 
string view
   
   
   Reproducer
   
   ```sql
   CREATE TABLE t0(v0 DOUBLE, v1 DOUBLE, v2 BOOLEAN, v3 BOOLEAN, v4 BOOLEAN, v5 
STRING);
   INSERT INTO t0(v1, v5, v2) VALUES (0.7183242196192607, 'Tn', true);
   CREATE TABLE t0_stringview AS SELECT v0, v1, v2, v3, v4, arrow_cast(v5, 
'Utf8View') as v5 FROM t0;
   ```
    
   ```sql
   > select v5 ~ 'foo' from t0_stringview;
   Internal error: Data type Utf8View not supported for 
binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker
   ```
   
   ```
   > select regexp_match(v5, 'foo') from t0_stringview;
   +--------------------------------------------+
   | regexp_match(t0_stringview.v5,Utf8("foo")) |
   +--------------------------------------------+
   |                                            |
   +--------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.034 seconds.
   ```
   
   ### Describe the solution you'd like
   
   StringView should be supported for these operators
   
   ### Describe alternatives you've considered
   
   Here are the relevant operator names:
   
   ```rust
               | Operator::RegexMatch
               | Operator::RegexIMatch
               | Operator::RegexNotMatch
               | Operator::RegexNotIMatch
               | Operator::LikeMatch
               | Operator::ILikeMatch
               | Operator::NotLikeMatch
               | Operator::NotILikeMatch
   ```
   
   Here is the dispatch code:
   
   
https://github.com/apache/datafusion/blob/0f96af5b500efff72314f840a59a736787cc3def/datafusion/physical-expr/src/expressions/binary.rs#L621-L632
   
   It appears that the corresponding arrow-rs kernel does not yet have support 
for StringView
   https://docs.rs/arrow-string/52.2.0/src/arrow_string/regexp.rs.html#307-311
   
   So what I would suggest is:
   1. Implement a PR in datafusion with coercion from Utf8View --> Utf8 (aka 
cast arguments back to string)
   2. File an upstream ticket in arrow-rs for supporting string view with the 
regexp_like kernels and leave a link to that ticket in the datafusion code
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to