Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23493 )
Change subject: IMPALA-14472: Add create/read support for ARRAY column of Kudu ...................................................................... Patch Set 2: Code-Review+1 (12 comments) overall looks good to me, just a few questions and nits http://gerrit.cloudera.org:8080/#/c/23493/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23493/1//COMMIT_MSG@10 PS1, Line 10: intial nit: initial http://gerrit.cloudera.org:8080/#/c/23493/1//COMMIT_MSG@10 PS1, Line 10: support : to create and select Kudu table with array column type nit: maybe, replace with support for working with Kudu tables having array type columns http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-array-inserter.cc File be/src/exec/kudu/kudu-array-inserter.cc: http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-array-inserter.cc@45 PS1, Line 45: const char* KUDU_MASTER_DEFAULT_ADDR = "localhost:7051"; // Same as in tests/conftest.py : const char* KUDU_TEST_TABLE_NAME = "impala::functional_kudu.kudu_array"; nit: these might be 'constexpr const char* const' http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-array-inserter.cc@79 PS1, Line 79: KUDU_ASSERT_OK nit: is it meant to be KUDU_RETURN_NOT_OK or KUDU_CHECK_OK instead? Since this isn't running in gtest environment, I'd rather use KUDU_RETURN_NOT_OK and make functions return kudu::Status http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-array-inserter.cc@124 PS1, Line 124: vector<KuduError*> errors; : bool overflowed; : session->GetPendingErrors(&errors, &overflowed); Since this interface was designed to be C++98-compatible (i.e. no std::unique_ptr is available), if getting pending errors like this, it's necessary to deallocate/free the memory if any KuduError is returned. This code snippet might serve as a reference (AFAIK, Impala's code also has ElementDeleter in gutil/stl_util.h): https://github.com/apache/kudu/blob/16689973a72e03649898c568d7ab423bc4bb8a35/src/kudu/client/client-test.cc#L2096-L2099 http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-array-inserter.cc@129 PS1, Line 129: KUDU_EXPECT_OK(error->status()); This doesn't make much sense: if there were errors, non of the statuses would be Status::OK. Or this is just to print out the information on the errors? http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-array-inserter.cc@137 PS1, Line 137: return 0; Does it make sense returning non-zero status if anything went wrong? http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-scanner.cc File be/src/exec/kudu/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/23493/1/be/src/exec/kudu/kudu-scanner.cc@402 PS1, Line 402: else { readability nit: 'else' isn't necessary since 'if' above contains 'return' http://gerrit.cloudera.org:8080/#/c/23493/1/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/23493/1/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@496 PS1, Line 496: "Cannot create table '%s': Type %s is not supported in Kudu", : getTbl(), col.getType().toSql())); It seems 3rd parameter is missing (given the format string). http://gerrit.cloudera.org:8080/#/c/23493/1/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: http://gerrit.cloudera.org:8080/#/c/23493/1/fe/src/main/java/org/apache/impala/util/KuduUtil.java@441 PS1, Line 441: } else { readability nit: it's possible to omit the 'else' part of the clause because there 'return' in the 'if' part above http://gerrit.cloudera.org:8080/#/c/23493/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java: http://gerrit.cloudera.org:8080/#/c/23493/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@394 PS1, Line 394: AnalyzesOk("create table tab (x ARRAY<INT> primary key) " + : "partition by hash(x) partitions 3 stored as kudu", : isExternalPurgeTbl As of now, I'm not sure Kudu actually works as expected with array columns being part of primary key. Probably, at Kudu side we will need to add a guardrail to explicitly tell it's not an option. I'll clarify on that and keep you posted. I guess it's OK to keep this for a while: maybe, it will be quite easy to add the missing functionality for that, and instead of reporting explicit error on such a DDL statement, the underlying Kudu table will indeed be able to work as expected with using an array column as a part of primary key :) http://gerrit.cloudera.org:8080/#/c/23493/1/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/23493/1/testdata/datasets/functional/functional_schema_template.sql@4820 PS1, Line 4820: array_DECIMAL ARRAY<DECIMAL(18,18)> I'm curious: what drives the selection of array element types for this test scenario? Is that about special cases in GetKuduArrayElementSize() implementation? This looks good to me as-is, but I'd think of adding more columns to cover all the supported types, at least for the following: * columns of floating point arrays (FLOAT, DOUBLE) * STRING (or BINARY) arrays * BOOL arrays: there might be some special handling required to work with raw data returned by ScanBatch::RowPtr::direct_data() in case of BOOL types -- the elements come as bytes (not a single bit per element) -- To view, visit http://gerrit.cloudera.org:8080/23493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Gerrit-Change-Number: 23493 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Comment-Date: Fri, 03 Oct 2025 23:30:24 +0000 Gerrit-HasComments: Yes
