walterddr commented on a change in pull request #7114:
URL: https://github.com/apache/pinot/pull/7114#discussion_r709695969
##########
File path:
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
##########
@@ -179,6 +180,48 @@ public static String rtrim(String input) {
return RTRIM.matcher(input).replaceAll("");
}
+ /**
+ * @see Pattern#matches(String, CharSequence)
+ * @param value input value
+ * @param regexp regular expression
+ * @return the matched result.
+ */
+ @ScalarFunction
+ public static String regexpExtract(String value, String regexp) {
+ return regexpExtract(value, regexp, 1, "");
+ }
+
+ /**
+ * Regular expression extract that accepts starting position as argument.
+ * @param value input value
+ * @param regexp regular expression
+ * @param occurrence the specified i-th occurrence to extract
+ * @return the matched result.
+ */
+ @ScalarFunction
+ public static String regexpExtract(String value, String regexp, int
occurrence) {
+ return regexpExtract(value, regexp, occurrence, "");
+ }
+
+ /**
+ * Regular expression extract that accepts starting position and i-th
occurrence as argument.
+ * @param value input value
+ * @param regexp regular expression
+ * @param occurrence the specified i-th occurrence to extract
+ * @param defaultValue the default value if no match found
+ * @return the matched result
+ */
+ @ScalarFunction
+ public static String regexpExtract(String value, String regexp, int
occurrence, String defaultValue) {
+ Pattern p = Pattern.compile(regexp);
+ Matcher matcher = p.matcher(value);
+ if (matcher.find()) {
+ return matcher.group(occurrence - 1);
Review comment:
I got some time to think about it. I am mixing the presto definition
with the big query definition
- presto defines the integer as the index of the capture group
- big query defines the integer as the # of times the ENTIRE regexp match
occurrence.
what I implemented here is actually the presto version, but I wrongly named
the 3rd argument "occurrence", it should've been "groupIdx"
I think we need to take a step back and discuss exactly what we wanted to
support. I will bring this up in the original issue
##########
File path:
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java
##########
@@ -179,6 +180,48 @@ public static String rtrim(String input) {
return RTRIM.matcher(input).replaceAll("");
}
+ /**
+ * @see Pattern#matches(String, CharSequence)
+ * @param value input value
+ * @param regexp regular expression
+ * @return the matched result.
+ */
+ @ScalarFunction
+ public static String regexpExtract(String value, String regexp) {
+ return regexpExtract(value, regexp, 1, "");
+ }
+
+ /**
+ * Regular expression extract that accepts starting position as argument.
+ * @param value input value
+ * @param regexp regular expression
+ * @param occurrence the specified i-th occurrence to extract
+ * @return the matched result.
+ */
+ @ScalarFunction
+ public static String regexpExtract(String value, String regexp, int
occurrence) {
+ return regexpExtract(value, regexp, occurrence, "");
+ }
+
+ /**
+ * Regular expression extract that accepts starting position and i-th
occurrence as argument.
+ * @param value input value
+ * @param regexp regular expression
+ * @param occurrence the specified i-th occurrence to extract
+ * @param defaultValue the default value if no match found
+ * @return the matched result
+ */
+ @ScalarFunction
+ public static String regexpExtract(String value, String regexp, int
occurrence, String defaultValue) {
+ Pattern p = Pattern.compile(regexp);
+ Matcher matcher = p.matcher(value);
+ if (matcher.find()) {
+ return matcher.group(occurrence - 1);
Review comment:
I got some time to think about it. I am mixing the presto definition
with the big query definition
- presto defines the integer as the index of the capture group
- big query defines the integer as the # of times the ENTIRE regexp match
occurrence.
what I implemented here is actually the presto version, but I wrongly named
the 3rd argument "occurrence", it should've been "captureGroupNumber"
I think we need to take a step back and discuss exactly what we wanted to
support. I will bring this up in the original issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]