Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23793 )
Change subject: IMPALA-14092 Part3: Enable predicate pushdown for paimon table. ...................................................................... Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/main/java/org/apache/impala/planner/paimon/ImpalaExprConverter.java File fe/src/main/java/org/apache/impala/planner/paimon/ImpalaExprConverter.java: http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/main/java/org/apache/impala/planner/paimon/ImpalaExprConverter.java@67 PS8, Line 67: private static final Pattern LIKE_STARTS_OR_ENDS_WITH_PATTERN = : Pattern.compile("([^%_]*)(?:%+)([^%_]*)"); : private static final Pattern LIKE_ENDSWITH_PATTER = Pattern.compile("(?:%+)([^%_]*)"); : private static final Pattern LIKE_STARTS_WITH_PATTERN = : Pattern.compile("([^%_]*)(?:%+)"); : private static final Pattern LIKE_EQUALS_PATTERN = Pattern.compile("([^%_]*)"); : private static final Pattern LIKE_ENDS_WITH_ESCAPED_WILDCARD_PATTERN = : Pattern.compile(".*\\\\%$"); Instead of all these regexes, I think it would be cleaner to write custom code that finds the first and last non-escaped wildcards, e.g.: int first = -1; int last = -1; boolean isEscaped = false; for (int i = 0; i < input.length(); i++) { char c = input.charAt(i); if (isEscaped) { // Previous character was escape, ignore this character. isEscaped = false; continue; } if (c == '\\') { // Activate the escape flag for the next character isEscaped = true; continue; } if (c == '%' || c == '_') { if (first == -1) first = i; last = i; } } Then if first==-1 and last==-1, then it is an equal match. Otherwise: if (first > 0) { builder_.startsWith(...) } if (last != -1 && last != input.length() - 1) { builder_.endsWith(...) } http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java File fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java: http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@216 PS8, Line 216: checkAcceptableLikeExpr( : String.format("select * from %s where %s regexp '^a.*'", TBL, col), Can you add test for '^a*b'? http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@241 PS8, Line 241: String.format("select * from %s where %s like 'aaa%%bbb'", TBL, col), : PREDICATE_OP.STARTS_ENDS_WITH, "aaa", "bbb"); Can you add test for 'aaa%bbb%ccc'? Also for 'aaa_bbb_ccc', and 'aaa_b%c_d%e_f'? http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@267 PS8, Line 267: checkUnAcceptableExpr( : String.format("select * from %s where %s rlike '^a.*.*b$'", TBL, col)); Should be PREDICATE_OP.STARTS_ENDS_WITH, "a", "b" http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@272 PS8, Line 272: checkUnAcceptableExpr( : String.format("select * from %s where %s like 'a_b'", TBL, col)); I think this should be PREDICATE_OP.STARTS_ENDS_WITH, "a", "b" http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@274 PS8, Line 274: checkUnAcceptableExpr( : String.format("select * from %s where %s like 'a_%%'", TBL, col)); This should be PREDICATE_OP.STARTS_WITH, "a" http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@276 PS8, Line 276: checkUnAcceptableExpr( : String.format("select * from %s where %s like '%%_a'", TBL, col)); This should be PREDICATE_OP.ENDS_WITH, "a" http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@278 PS8, Line 278: checkUnAcceptableExpr( : String.format("select * from %s where %s like 'a_%%b'", TBL, col)); This should be PREDICATE_OP.STARTS_ENDS_WITH, "a", "b" http://gerrit.cloudera.org:8080/#/c/23793/8/fe/src/test/java/org/apache/impala/planner/paimon/PaimonImpalaExprConverterTest.java@281 PS8, Line 281: checkAcceptableLikeExpr( : String.format("select * from %s where %s like 'a\\\\%%b'", TBL, col), : PREDICATE_OP.STARTS_ENDS_WITH, "a\\\\", "b"); I think the results of these should be verified by e2e tests on actual data. -- To view, visit http://gerrit.cloudera.org:8080/23793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee07fa35de8381121a20b2976d6424626d705483 Gerrit-Change-Number: 23793 Gerrit-PatchSet: 8 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Fri, 20 Feb 2026 16:32:14 +0000 Gerrit-HasComments: Yes
