Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21131 )
Change subject: IMPALA-11499: Refactor UrlEncode function to handle special characters ...................................................................... Patch Set 11: (17 comments) http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@9 PS11, Line 9: The An error - we're introducing it now. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@14 PS11, Line 14: was matching with matched one of the ... http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@15 PS11, Line 15: '\u00FF' '\u00FF' is not a valid character literal in C++. "\u00FF" was included in a string, it was part of a string literal. At last I found out that it can actually be used in C++ to specify arbitrary unicode values, but of course it may be encoded on multiple bytes, see https://en.cppreference.com/w/cpp/language/escape. Also include in the commit message that including this character was probably a mistake from the beginning, it should have been '\x7F'. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@18 PS11, Line 18: '\xFF' See the comment above, we should explain that it was a mistake from the beginning. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22 PS11, Line 22: Some It would be good to include in which file the new tests are. http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22 PS11, Line 22: for both parquet and iceberg Parquet and Iceberg are not exclusive things, we also use Parquet in our Iceberg tables. Parquet is a file format and Iceberg is a table format. We could say "tests on both traditional Hive tables and Iceberg tables ...". http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc File be/src/util/coding-util.cc: http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@25 PS11, Line 25: #include <boost/algorithm/string.hpp> The includes within a group should be in alphabetical order, so this should go before L23. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@31 PS11, Line 31: #include "sasl/saslutil.h" Includes from the standard library should be the first group after the header, i.e. this should come after #include <sstream>. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51 PS11, Line 51: '\x20' > Represents space character For those characters that have a readable literal representation, e.g. " ", "\n" etc., we should use that even if Hive doesn't. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@57 PS11, Line 57: Used It would be clearer like this: "uppercase" and "hex" only affect the insertion of integers, not that of char values. http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@62 PS11, Line 62: // Hive-compat mode, and the character is not alphanumeric or is one This doesn't match the code. Try this: "... b) we are not in Hive-compat mode and the character is not alphanumeric and it is not one of the characters specifically excluded from escaping (see ShouldNotEscape())." http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test File testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test: http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@302 PS11, Line 302: show tables; I don't think it's needed. Also, if another table is added later before this query, this will have to be changed as well. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@309 PS11, Line 309: insert into unicode_partition_values partition(p='运营业务数据') values (0); You don't need to put these inserts into separate QUERY segments. It could also go together with the select on L318. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@327 PS11, Line 327: drop table unicode_partition_values; You don't need to drop the table, the tables are created in a unique database, see test_column_unicode.py::TestColumnUnicode::test_create_unicode_table. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@337 PS11, Line 337: show tables; I don't think this is needed. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@344 PS11, Line 344: insert into unicode_partition_values_iceberg values (0, '运'); You should do the same insertions into the Iceberg table as into the other one. http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@354 PS11, Line 354: drop table unicode_partition_values_iceberg; See L327. -- To view, visit http://gerrit.cloudera.org:8080/21131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Gerrit-Change-Number: 21131 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Mon, 29 Apr 2024 12:34:17 +0000 Gerrit-HasComments: Yes
