pepijnve commented on code in PR #17839:
URL: https://github.com/apache/datafusion/pull/17839#discussion_r2402499183
##########
datafusion/sqllogictest/test_files/string/string_view.slt:
##########
@@ -784,7 +784,7 @@ EXPLAIN SELECT
FROM test;
----
logical_plan
-01)Projection: regexp_like(test.column1_utf8view,
Utf8("^https?://(?:www\.)?([^/]+)/.*$")) AS k
+01)Projection: test.column1_utf8view ~
Utf8View("^https?://(?:www\.)?([^/]+)/.*$") AS k
Review Comment:
See https://github.com/apache/datafusion/issues/17838#issuecomment-3355083929
The operator logic is in `physical_expr`, while `regexp_like` lives in
`functions`. We would probably have to move the common logic to a separate
crate. This PR was intended as a stopgap solution for common cases.
We can only rewrite in some cases because of the optional `flags` argument.
With the operators all you have is the case sensitivity (i.e. the `i`flag).
The reason for the operator being more efficient is that it will make use of
the `regexp_is_match_scalar` kernel if it can, while `regexp_like` always uses
`regexp_is_match`. `regexp_is_match` does maintain a cache of compiled regexes
so at least the pattern isn't compiled over and over again, but it's still
quite a bit more code compared to `regexp_is_match_scalar`.
Additionally there's a regular expression simplification rule that only
operates on `BinaryExpr` with one of the regex matching operators. The
transformation here enables that optimisation for `regexp_like` calls as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]