pepijnve commented on code in PR #17839:
URL: https://github.com/apache/datafusion/pull/17839#discussion_r2402499183


##########
datafusion/sqllogictest/test_files/string/string_view.slt:
##########
@@ -784,7 +784,7 @@ EXPLAIN SELECT
 FROM test;
 ----
 logical_plan
-01)Projection: regexp_like(test.column1_utf8view, 
Utf8("^https?://(?:www\.)?([^/]+)/.*$")) AS k
+01)Projection: test.column1_utf8view ~ 
Utf8View("^https?://(?:www\.)?([^/]+)/.*$") AS k

Review Comment:
   See https://github.com/apache/datafusion/issues/17838#issuecomment-3355083929
   
   The operator logic is in `physical_expr`, while `regexp_like` lives in 
`functions`. We would probably have to move the common logic to a separate 
crate. This PR was intended as a stopgap solution for common cases.
   
   We can only rewrite in some cases because of the optional `flags` argument. 
With the operators all you have is the case sensitivity (i.e. the `i`flag).
   
   The reason for the operator being more efficient is that it will make use of 
the `regexp_is_match_scalar` kernel if it can, while `regexp_like` always uses 
`regexp_is_match`. `regexp_is_match` does maintain a cache of compiled regexes 
so at least the pattern isn't compiled over and over again, but it's still 
quite a bit more code compared to `regexp_is_match_scalar`.
   
   Additionally there's a regular expression simplification rule that only 
operates on `BinaryExpr` with one of the regex matching operators. The 
transformation here enables that optimisation for `regexp_like` calls as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to