Hello Zoltan Borok-Nagy, Csaba Ringhofer, Noemi Pap-Takacs, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/23190

to look at the new patch set (#9).

Change subject: IMPALA-14237: Fix Iceberg partition values encoding
......................................................................

IMPALA-14237: Fix Iceberg partition values encoding

This patch modifies the string overload of
IcebergFunctions::TruncatePartitionTransform so that it always handles
strings as UTF-8-encoded ones, because the Iceberg specification states
that that strings are UTF-8 encoded.

Also, for an Iceberg table UrlEncode is called in not the
Hive-compatible way, rather than the standard way, similar to Java's
URLEncoder.encode() (which the Iceberg API also uses) to conform with
existing practices by Hive, Spark and Trino. This included a change in
the set of characters which are not escaped to follow the URL Standard's
application/x-www-form-urlencoded format. [1] Also renamed it from
ShouldNotEscape to IsUrlSafe for better readability.

Testing:
 * add and extend e2e tests to check partitions with Unicode characters
 * add be tests to coding-util-test.cc

[1]: 
https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set

Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755
---
M be/src/exec/table-sink-base.cc
M be/src/exprs/iceberg-functions-ir.cc
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
7 files changed, 165 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/23190/9
-- 
To view, visit http://gerrit.cloudera.org:8080/23190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755
Gerrit-Change-Number: 23190
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Vanko <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Vanko <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to