Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/22603 )
Change subject: WIP IMPALA-10349: Support constant folding for non ascii strings ...................................................................... Patch Set 6: (3 comments) The change is still WIP, I didn't want to finalize things before agreeing on storing with String vs byte[] >Supprised that we don't need BE changes. Thrift's binary and string are both std::string in c++, so luckily no change was needed http://gerrit.cloudera.org:8080/#/c/22603/5/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/22603/5/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@69 PS5, Line 69: binValue_ = StringUtils.toUtf8Array(value); > Can we just generate this byte array in need? StringLiteral is also used in Maybe we could keep the byte array, and generate the String when needed? This could decrease the memory footprint, as in most case 1 byte per char would be enough instead of 2. If I see correctly we keep the strings in unescaped format. https://github.com/apache/impala/blob/32836eab4a42541e44d95435f815a5d0cf9d3b24/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1280 http://gerrit.cloudera.org:8080/#/c/22603/5/fe/src/main/java/org/apache/impala/util/StringUtils.java File fe/src/main/java/org/apache/impala/util/StringUtils.java: http://gerrit.cloudera.org:8080/#/c/22603/5/fe/src/main/java/org/apache/impala/util/StringUtils.java@28 PS5, Line 28: class St > Don't need "abstract". Done http://gerrit.cloudera.org:8080/#/c/22603/5/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: http://gerrit.cloudera.org:8080/#/c/22603/5/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test@488 PS5, Line 488: concat("a", "รก") > Can we add tests for UTF-8 aware string functions in both UTF-8 and non-UTF done, added a test that use a few utf8 aware functions -- To view, visit http://gerrit.cloudera.org:8080/22603 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70663457a0b0a3443e586350f0a5996bb75ba64a Gerrit-Change-Number: 22603 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Sat, 05 Apr 2025 10:40:48 +0000 Gerrit-HasComments: Yes