danielhumanmod commented on code in PR #2831:
URL: https://github.com/apache/datafusion-comet/pull/2831#discussion_r2596610973
##########
spark/src/main/scala/org/apache/comet/serde/strings.scala:
##########
@@ -286,3 +286,83 @@ trait CommonStringExprs {
}
}
}
+
+object CometRegExpExtract extends CometExpressionSerde[RegExpExtract] {
+ override def getSupportLevel(expr: RegExpExtract): SupportLevel = {
+ // Check if the pattern is compatible with Spark or allow incompatible
patterns
+ expr.regexp match {
+ case Literal(pattern, DataTypes.StringType) =>
+ if (!RegExp.isSupportedPattern(pattern.toString) &&
+ !CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.get()) {
+ withInfo(
+ expr,
+ s"Regexp pattern $pattern is not compatible with Spark. " +
+ s"Set ${CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.key}=true " +
+ "to allow it anyway.")
+ return Incompatible()
+ }
+ case _ =>
+ return Unsupported(Some("Only literal regexp patterns are supported"))
+ }
+
+ // Check if idx is a literal
+ expr.idx match {
+ case Literal(_, DataTypes.IntegerType) =>
+ Compatible()
+ case _ =>
+ Unsupported(Some("Only literal group index is supported"))
Review Comment:
Look back this comment, given that Spark only convert index into i32 in UDF
impl, do we want to keep behavior aligned to support only IntegerType here?
Let me know if that's acceptable — I can adjust the patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]