Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21492 )
Change subject: IMPALA-12370: allow converting timestamps to UTC when writing Kudu ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG@16 PS2, Line 16: To be able to read back Kudu tables written by Impala correctly : convert_kudu_utc_timestamps and write_kudu_utc_timestamps need to : have the same value. > +1 The reason for adding a new flag (and not changing the name of the old one) is backward compatibility. See discussion at IMPALA-12322 http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@882 PS2, Line 882: When true, UNIXTIME_MICRO columns read from Kudu will be interpreted as UTC an > To be complete: TIMESTAMP vs UNIXTIME_MICROS: Impala uses TIMESTAMP as SQL type (and does not support DATETIME). Impala TIMESTAMPs have no timezone associated with individual values (or individual columns), only a "global" (per query) timezone that affects some conversions. Modified the comment, I hope that is is more precise now. http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@944 PS2, Line 944: > Ditto: please consider updating this similar to the comment for CONVERT_KUD Modified the comment. http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@946 PS2, Line 946: // written to Kudu as UNI > Make both the query names similar. For read, it starts with CONVERT_ and fo I agree that the current names are not too good, but "CONVERT_KUDU_UTC_TIMESTAMPS" is already released, so changing the name or the semantics could break existing workloads. http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test File testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test: http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test@a3 PS2, Line 3: > what's 'kutu'? Done http://gerrit.cloudera.org:8080/#/c/21492/2/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: http://gerrit.cloudera.org:8080/#/c/21492/2/tests/common/test_result_verifier.py@427 PS2, Line 427: if file_format == 'avro' and 'TIMESTAMP' in expected_types: > I don't know much about the context here, but Kudu still doesn't support TI This is about verifying the results in our .test files. TIMESTAMP refers to the Impala data type returned by the query and file_format is about our test vector (we try to run the same queries with different file/table formats). e.g. this would also affect select cast(string_col as timestamp) from kudu_tbl; which doesn't even use UNIXTIME_MICRO. So this looks like an old hack to me to skip some tests Kudu/Avro in .test files while running the others. It was a total surprise when I realized that the results of the tests I wrote is actually not checked. This could still make sense for tests that use nanosecond timestamps as those are not supported in Kudu, but there are better tools to skip those tests. -- To view, visit http://gerrit.cloudera.org:8080/21492 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb4995a64e042e7bb261fcc6e6bf7ffce61e9bd1 Gerrit-Change-Number: 21492 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Mon, 10 Jun 2024 14:12:54 +0000 Gerrit-HasComments: Yes
