Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/23190 )
Change subject: IMPALA-14237: Fix Iceberg partition values encoding ...................................................................... IMPALA-14237: Fix Iceberg partition values encoding This patch modifies the string overload of IcebergFunctions::TruncatePartitionTransform so that it always handles strings as UTF-8-encoded ones, because the Iceberg specification states that that strings are UTF-8 encoded. Also, for an Iceberg table UrlEncode is called in not the Hive-compatible way, rather than the standard way, similar to Java's URLEncoder.encode() (which the Iceberg API also uses) to conform with existing practices by Hive, Spark and Trino. This included a change in the set of characters which are not escaped to follow the URL Standard's application/x-www-form-urlencoded format. [1] Also renamed it from ShouldNotEscape to IsUrlSafe for better readability. Testing: * add and extend e2e tests to check partitions with Unicode characters * add be tests to coding-util-test.cc [1]: https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755 Reviewed-on: http://gerrit.cloudera.org:8080/23190 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/table-sink-base.cc M be/src/exprs/iceberg-functions-ir.cc M be/src/util/coding-util-test.cc M be/src/util/coding-util.cc M testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test M tests/query_test/test_insert.py 7 files changed, 165 insertions(+), 26 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/23190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755 Gerrit-Change-Number: 23190 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Vanko <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Vanko <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
