Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21492 )

Change subject: WIP: IMPALA-12370: allow converting timestamps to UTC when 
writing Kudu
......................................................................


Patch Set 2:

(7 comments)

I took a quick look.  I don't know much about Impala's planning phase and 
whether there are other places that need to be updated in this context, and my 
comments might be considered as nits/cosmetic stuff.  However, I think it makes 
sense to make a bit of effort to reduce the confusion w.r.t. Kudu not actually 
supporting a timestamp type, but rather being Unix epoch time with microsecond 
precision.

Thank you!

http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG@12
PS2, Line 12: utc
nit: UTC


http://gerrit.cloudera.org:8080/#/c/21492/2//COMMIT_MSG@16
PS2, Line 16: To be able to read back Kudu tables written by Impala correctly
            : convert_kudu_utc_timestamps and write_kudu_utc_timestamps need to
            : have the same value.
This leads to an obvious question: why to introduce an extra option 
--write_kudu_utc_timestamps then?  Shouldn't Impala's behavior be updated to 
perform the conversion upon writing as well if 
--convert_kudu_utc_timestamps=true?

If it's indeed necessary to separate two different phases of the conversion, 
maybe it makes sense to articulate why adding this new flag 
--write_kudu_utc_timestamps is needed?  Mentioning a particular use case might 
be enough, I guess.


http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@882
PS2, Line 882: When true, TIMESTAMPs read from Kudu will be converted from UTC 
to local time.
While you are at this, I'd suggest stop using the deprecated TIMESTAMP name for 
the corresponding Kudu column type.  It's been renamed into UNIXTIME_MICROS a 
long time ago -- that's to reduce the confusion.

That reflects the fact that Kudu doesn't have a timestamp type in the common 
sense at least how it's been used in the database realm, but columns of the 
UNIXTIME_MICROS type are used to store Unix epoch time with microsecond 
precision, and no timezone information is provided along with 64-bit integer 
(so, technically that's not a timestamp in that sense).

Maybe, this should be replaced with:

  When true, UNIXTIME_MICRO column values read from Kudu tables are converted 
into UTC timezone timestamps.


http://gerrit.cloudera.org:8080/#/c/21492/2/common/thrift/ImpalaService.thrift@944
PS2, Line 944: When true, TIMESTAMPs written to Kudu will be converted from 
local time to UTC.
Ditto: please consider updating this similar to the comment for 
CONVERT_KUDU_UTC_TIMESTAMPS above.


http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test:

http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test@a3
PS2, Line 3:
what's 'kutu'?


http://gerrit.cloudera.org:8080/#/c/21492/2/testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test@3
PS2, Line 3: UTC timestamp
Please consider updating this -- Kudu doesn't store timestamps, it stores Unix 
epoch times.


http://gerrit.cloudera.org:8080/#/c/21492/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

http://gerrit.cloudera.org:8080/#/c/21492/2/tests/common/test_result_verifier.py@427
PS2, Line 427:     if file_format == 'avro' and 'TIMESTAMP' in expected_types:
I don't know much about the context here, but Kudu still doesn't support 
TIMESTAMP type in the sense of storing date/time along with timezone 
information.  In Kudu that's just Unix epoch times with microsecond precision.



--
To view, visit http://gerrit.cloudera.org:8080/21492
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb4995a64e042e7bb261fcc6e6bf7ffce61e9bd1
Gerrit-Change-Number: 21492
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Sat, 08 Jun 2024 00:27:08 +0000
Gerrit-HasComments: Yes

Reply via email to