Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21492 )
Change subject: WIP: IMPALA-12370: allow converting timestamps to UTC when writing Kudu ...................................................................... Patch Set 2: (7 comments) I took a quick look. I don't know much about Impala's planning phase and whether there are other places that need to be updated in this context, and my comments might be considered as nits/cosmetic stuff. However, I think it makes sense to make a bit of effort to reduce the confusion w.r.t. Kudu not actually supporting a timestamp type, but rather being Unix epoch time with microsecond precision. Thank you! http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG@12 PS2, Line 12: utc nit: UTC http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG@16 PS2, Line 16: To be able to read back Kudu tables written by Impala correctly : convert_kudu_utc_timestamps and write_kudu_utc_timestamps need to : have the same value. This leads to an obvious question: why to introduce an extra option --write_kudu_utc_timestamps then? Shouldn't Impala's behavior be updated to perform the conversion upon writing as well if --convert_kudu_utc_timestamps=true? If it's indeed necessary to separate two different phases of the conversion, maybe it makes sense to articulate why adding this new flag --write_kudu_utc_timestamps is needed? Mentioning a particular use case might be enough, I guess. http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@882 PS2, Line 882: When true, TIMESTAMPs read from Kudu will be converted from UTC to local time. While you are at this, I'd suggest stop using the deprecated TIMESTAMP name for the corresponding Kudu column type. It's been renamed into UNIXTIME_MICROS a long time ago -- that's to reduce the confusion. That reflects the fact that Kudu doesn't have a timestamp type in the common sense at least how it's been used in the database realm, but columns of the UNIXTIME_MICROS type are used to store Unix epoch time with microsecond precision, and no timezone information is provided along with 64-bit integer (so, technically that's not a timestamp in that sense). Maybe, this should be replaced with: When true, UNIXTIME_MICRO column values read from Kudu tables are converted into UTC timezone timestamps. http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@944 PS2, Line 944: When true, TIMESTAMPs written to Kudu will be converted from local time to UTC. Ditto: please consider updating this similar to the comment for CONVERT_KUDU_UTC_TIMESTAMPS above. http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test File testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test: http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test@a3 PS2, Line 3: what's 'kutu'? http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test@3 PS2, Line 3: UTC timestamp Please consider updating this -- Kudu doesn't store timestamps, it stores Unix epoch times. http://gerrit.cloudera.org:8080/#/c/21492/2/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: http://gerrit.cloudera.org:8080/#/c/21492/2/tests/common/test_result_verifier.py@427 PS2, Line 427: if file_format == 'avro' and 'TIMESTAMP' in expected_types: I don't know much about the context here, but Kudu still doesn't support TIMESTAMP type in the sense of storing date/time along with timezone information. In Kudu that's just Unix epoch times with microsecond precision. -- To view, visit http://gerrit.cloudera.org:8080/21492 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb4995a64e042e7bb261fcc6e6bf7ffce61e9bd1 Gerrit-Change-Number: 21492 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Sat, 08 Jun 2024 00:27:08 +0000 Gerrit-HasComments: Yes
