Jackie-Jiang commented on code in PR #16276:
URL: https://github.com/apache/pinot/pull/16276#discussion_r2216583751


##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/regexp/RegexpLikeConstFunctions.java:
##########
@@ -40,6 +41,37 @@ public boolean regexpLike(String inputStr, String 
regexPatternStr) {
     return _matcher.reset(inputStr).find();
   }
 
+  @ScalarFunction
+  public boolean regexpLike(String inputStr, String regexPatternStr, String 
matchParameter) {
+    if (_matcher == null) {
+      _matcher = buildPattern(regexPatternStr, matchParameter).matcher("");
+    }
+
+    return _matcher.reset(inputStr).find();
+  }
+
+  private Pattern buildPattern(String pattern, String matchParameter) {

Review Comment:
   Use `RegexpPatternConverterUtils`



##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/RegexpLikePredicate.java:
##########
@@ -20,19 +20,28 @@
 
 import java.util.Objects;
 import org.apache.pinot.common.request.context.ExpressionContext;
+import org.apache.pinot.common.utils.RegexpPatternConverterUtils;
 import org.apache.pinot.common.utils.regex.Pattern;
 import org.apache.pinot.common.utils.regex.PatternFactory;
 
 /**
- * Predicate for REGEXP_LIKE.
+ * Predicate for REGEXP_LIKE with optional match parameters
  */
 public class RegexpLikePredicate extends BasePredicate {
   private final String _value;
+  private final Boolean _caseInsensitive;

Review Comment:
   ```suggestion
     private final boolean _caseInsensitive;
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/fst/RegexpMatcher.java:
##########
@@ -22,36 +22,48 @@
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
+import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IntsRefBuilder;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.CharacterRunAutomaton;
 import org.apache.lucene.util.automaton.RegExp;
 import org.apache.lucene.util.automaton.Transition;
 import org.apache.lucene.util.fst.FST;
 import org.apache.lucene.util.fst.Util;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 /**
  * RegexpMatcher is a helper to retrieve matching values for a given regexp 
query.
  * Regexp query is converted into an automaton and we run the matching 
algorithm on FST.
+ * Supports both FST<Long> (case-sensitive) and FST<BytesRef> 
(case-insensitive) types.
  *
  * Two main functions of this class are
  *   regexMatchOnFST() Function runs matching on FST (See function comments 
for more details)
  *   match(input) Function builds the automaton and matches given input.
  */
 public class RegexpMatcher {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(RegexpMatcher.class);

Review Comment:
   (minor) Not used



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/fst/RegexpMatcher.java:
##########
@@ -22,36 +22,48 @@
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
+import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IntsRefBuilder;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.CharacterRunAutomaton;
 import org.apache.lucene.util.automaton.RegExp;
 import org.apache.lucene.util.automaton.Transition;
 import org.apache.lucene.util.fst.FST;
 import org.apache.lucene.util.fst.Util;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 /**
  * RegexpMatcher is a helper to retrieve matching values for a given regexp 
query.
  * Regexp query is converted into an automaton and we run the matching 
algorithm on FST.
+ * Supports both FST<Long> (case-sensitive) and FST<BytesRef> 
(case-insensitive) types.
  *
  * Two main functions of this class are
  *   regexMatchOnFST() Function runs matching on FST (See function comments 
for more details)
  *   match(input) Function builds the automaton and matches given input.
  */
 public class RegexpMatcher {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(RegexpMatcher.class);
 
   private final String _regexQuery;
-  private final FST<Long> _fst;
+  private final FST<?> _fst;
   private final Automaton _automaton;
+  private final boolean _isBytesRefFST;
 
-  public RegexpMatcher(String regexQuery, FST<Long> fst) {
-    _regexQuery = regexQuery;
+  public RegexpMatcher(String regexQuery, FST<?> fst) {

Review Comment:
   Suggest adding a separate `RegexpMatcherCaseInsensitive` given the logic is 
not sharable



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java:
##########
@@ -106,6 +107,18 @@ public BaseFilterOperator 
getLeafFilterOperator(QueryContext queryContext, Predi
         }
         return new ScanBasedFilterOperator(queryContext, predicateEvaluator, 
dataSource, numDocs);
       } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+        // Check if case-insensitive flag is present
+        RegexpLikePredicate regexpLikePredicate = (RegexpLikePredicate) 
predicateEvaluator.getPredicate();
+        boolean isCaseInsensitive = regexpLikePredicate.isCaseInsensitive();
+
+        if (isCaseInsensitive && dataSource.getIFSTIndex() != null && 
dataSource.getDataSourceMetadata().isSorted()

Review Comment:
   (MAJOR) Add a if check for `caseInsensitive` first, then check index 
existence. Current logic will misuse FST index on case insensitive case



##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/regexp/RegexpLikeVarFunctions.java:
##########
@@ -33,8 +33,21 @@ private RegexpLikeVarFunctions() {
 
   @ScalarFunction
   public static boolean regexpLikeVar(String inputStr, String regexPatternStr) 
{
-    Pattern pattern = Pattern.compile(regexPatternStr, Pattern.UNICODE_CASE | 
Pattern.CASE_INSENSITIVE);
-    return pattern.matcher(inputStr).find();
+    return Pattern.compile(regexPatternStr, 
Pattern.UNICODE_CASE).matcher(inputStr).find();

Review Comment:
   Not introduced in this PR, but any idea why this one uses UNICODE_CASE? I 
feel it should be consistent with `regexp_like`



##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/regexp/RegexpLikeConstFunctions.java:
##########
@@ -40,6 +41,37 @@ public boolean regexpLike(String inputStr, String 
regexPatternStr) {
     return _matcher.reset(inputStr).find();
   }
 
+  @ScalarFunction
+  public boolean regexpLike(String inputStr, String regexPatternStr, String 
matchParameter) {
+    if (_matcher == null) {
+      _matcher = buildPattern(regexPatternStr, matchParameter).matcher("");
+    }
+
+    return _matcher.reset(inputStr).find();
+  }
+
+  private Pattern buildPattern(String pattern, String matchParameter) {

Review Comment:
   Should we also use `PatternFactory` to build pattern here for consistency?



##########
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java:
##########
@@ -296,19 +297,22 @@ private BaseFilterOperator 
constructPhysicalOperator(FilterContext filter, int n
                 return new TextMatchFilterOperator(textIndexReader, 
(TextMatchPredicate) predicate, numDocs);
               }
             case REGEXP_LIKE:
-              // FST Index is available only for rolled out segments. So, we 
use different evaluator for rolled out and
-              // consuming segments.
-              //
-              // Rolled out segments (immutable): FST Index reader is 
available use FSTBasedEvaluator
-              // else use regular flow of getting predicate evaluator.
-              //
-              // Consuming segments: When FST is enabled, use 
AutomatonBasedEvaluator so that regexp matching logic is
-              // similar to that of FSTBasedEvaluator, else use regular flow 
of getting predicate evaluator.
-              if (dataSource.getFSTIndex() != null) {
+              // Check if case-insensitive flag is present
+              RegexpLikePredicate regexpLikePredicate = (RegexpLikePredicate) 
predicate;
+              boolean isCaseInsensitive = 
regexpLikePredicate.isCaseInsensitive();
+
+              if (isCaseInsensitive && dataSource.getIFSTIndex() != null) {

Review Comment:
   (MAJOR) Same here. Add a if check on `caseInsensitive` itself. We should add 
a unit test to catch this



##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/RegexpLikePredicate.java:
##########
@@ -44,13 +53,25 @@ public String getValue() {
     return _value;
   }
 
+  public boolean isCaseInsensitive() {
+    return _caseInsensitive;
+  }
+
   public Pattern getPattern() {
     if (_pattern == null) {
-      _pattern = PatternFactory.compile(_value);
+      _pattern = buildPattern(_value, _caseInsensitive);

Review Comment:
   ```suggestion
         _pattern = PatternFactory.compile(_value, _caseInsensitive);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to