Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23793 )

Change subject: IMPALA-14092 Part3: Enable predicate pushdown for paimon table.
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/main/java/org/apache/impala/planner/paimon/ImpalaExprConverter.java
File fe/src/main/java/org/apache/impala/planner/paimon/ImpalaExprConverter.java:

http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/main/java/org/apache/impala/planner/paimon/ImpalaExprConverter.java@67
PS8, Line 67:   private static final Pattern LIKE_STARTS_OR_ENDS_WITH_PATTERN =
            :       Pattern.compile("([^%_]*)(?:%+)([^%_]*)");
            :   private static final Pattern LIKE_ENDSWITH_PATTER = 
Pattern.compile("(?:%+)([^%_]*)");
            :   private static final Pattern LIKE_STARTS_WITH_PATTERN =
            :       Pattern.compile("([^%_]*)(?:%+)");
            :   private static final Pattern LIKE_EQUALS_PATTERN = 
Pattern.compile("([^%_]*)");
            :   private static final Pattern 
LIKE_ENDS_WITH_ESCAPED_WILDCARD_PATTERN =
            :       Pattern.compile(".*\\\\%$");
Instead of all these regexes, I think it would be cleaner to write custom code 
that finds the first and last non-escaped wildcards, e.g.:

 int first = -1;
 int last = -1;
 boolean isEscaped = false;

 for (int i = 0; i < input.length(); i++) {
   char c = input.charAt(i);
   if (isEscaped) {
     // Previous character was escape, ignore this character.
     isEscaped = false;
     continue;
   }
   if (c == '\\') {
     // Activate the escape flag for the next character
     isEscaped = true;
     continue;
   }
   if (c == '%' || c == '_') {
     if (first == -1) first = i;
     last = i;
   }
 }

Then if first==-1 and last==-1, then it is an equal match.
Otherwise:

 if (first > 0) {
   builder_.startsWith(...)
 }
 if (last != -1 && last != input.length() - 1) {
   builder_.endsWith(...)

 }


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java
File 
fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java:

http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@216
PS8, Line 216:       checkAcceptableLikeExpr(
             :           String.format("select * from %s where %s regexp 
'^a.*'", TBL, col),
Can you add test for '^a*b'?


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@241
PS8, Line 241:           String.format("select * from %s where %s like 
'aaa%%bbb'", TBL, col),
             :           PREDICATE_OP.STARTS_ENDS_WITH, "aaa", "bbb");
Can you add test for 'aaa%bbb%ccc'? Also for 'aaa_bbb_ccc', and 'aaa_b%c_d%e_f'?


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@267
PS8, Line 267:       checkUnAcceptableExpr(
             :           String.format("select * from %s where %s rlike 
'^a.*.*b$'", TBL, col));
Should be PREDICATE_OP.STARTS_ENDS_WITH, "a", "b"


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@272
PS8, Line 272:       checkUnAcceptableExpr(
             :           String.format("select * from %s where %s like 'a_b'", 
TBL, col));
I think this should be PREDICATE_OP.STARTS_ENDS_WITH, "a", "b"


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@274
PS8, Line 274:       checkUnAcceptableExpr(
             :           String.format("select * from %s where %s like 'a_%%'", 
TBL, col));
This should be PREDICATE_OP.STARTS_WITH, "a"


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@276
PS8, Line 276:       checkUnAcceptableExpr(
             :           String.format("select * from %s where %s like '%%_a'", 
TBL, col));
This should be PREDICATE_OP.ENDS_WITH, "a"


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@278
PS8, Line 278:       checkUnAcceptableExpr(
             :           String.format("select * from %s where %s like 
'a_%%b'", TBL, col));
This should be PREDICATE_OP.STARTS_ENDS_WITH, "a", "b"


http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@281
PS8, Line 281:       checkAcceptableLikeExpr(
             :           String.format("select * from %s where %s like 
'a\\\\%%b'", TBL, col),
             :           PREDICATE_OP.STARTS_ENDS_WITH, "a\\\\", "b");
I think the results of these should be verified by e2e tests on actual data.



--
To view, visit http://gerrit.cloudera.org:8080/23793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee07fa35de8381121a20b2976d6424626d705483
Gerrit-Change-Number: 23793
Gerrit-PatchSet: 8
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Fri, 20 Feb 2026 16:32:14 +0000
Gerrit-HasComments: Yes

Reply via email to