Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22293 )

Change subject: IMPALA-13627: Handle legacy Hive timezone conversion
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/22293/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22293/3//COMMIT_MSG@37
PS3, Line 37: impact on performance
Did you make some benchmarks? It could be useful to have an estimate of the 
impact, e.g. is it just minor or may ruin performance. 
convert-timestamp-benchmark.cc already exists for similar measurements.
https://github.com/apache/impala/blob/5f4321373a3404867e170f7381db04a843aa7310/be/src/benchmarks/convert-timestamp-benchmark.cc#L380


http://gerrit.cloudera.org:8080/#/c/22293/3/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/22293/3/be/src/exec/parquet/parquet-common.h@815
PS3, Line 815:   /// If timestamp t >= v before conversion, then this function 
converts v in such a
             :   /// way that the same will be true after t is converted.
             :   void ConvertMinStatToLocalTime(TimestampValue* v) const;
             :
             :   /// If timestamp t <= v before conversion, then this function 
converts v in such a
             :   /// way that the same will be true after t is converted.
             :   void ConvertMaxStatToLocalTime(TimestampValue* v) const;
As these are not modified, different timezone db will be used during per-row 
conversion and stat filtering. This can lead to incorrectly dropping some rows. 
The simple solution is to turn off min/max filtering for the timestamp column 
in case Hive legacy TZ conversion is used. Another way would be trying to do 
the LT/GT predicated mapping from local to utc, but this can be convoluted near 
DST changes.


http://gerrit.cloudera.org:8080/#/c/22293/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/22293/3/be/src/service/impala-server.cc@362
PS3, Line 362: use_legacy_hive_timestamp_conversion
I don't think that a new flag is really needed as you can override the query 
option in default_query_options. 
use_local_tz_for_unix_timestamp_conversions/convert_legacy_hive_parquet_utc_timestamps
 are there due to legacy reasons, as these flags were added before a query 
option existed.


http://gerrit.cloudera.org:8080/#/c/22293/3/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/22293/3/common/thrift/Frontend.thrift@1081
PS3, Line 1081: i64
nit: why int64? Calendar.get() return int


http://gerrit.cloudera.org:8080/#/c/22293/3/tests/query_test/test_hive_timestamp_conversion.py
File tests/query_test/test_hive_timestamp_conversion.py:

http://gerrit.cloudera.org:8080/#/c/22293/3/tests/query_test/test_hive_timestamp_conversion.py@52
PS3, Line 52:     self.client.execute(
            :        "create table %s.t (d timestamp) stored as parquet" % 
unique_database)
            :     tbl_loc = get_fs_path("/test-warehouse/%s.db/t" % 
unique_database)
            :     
self.filesystem_client.copy_from_local(os.environ['IMPALA_HOME']
            :         + "/testdata/data/hive_kuala_lumpur_legacy.parquet", 
tbl_loc)
There are some convenience functions to do this, see 
create_table_and_copy_files() and create_table_from_parquet() in 
https://github.com/apache/impala/blob/master/tests/common/file_utils.py


http://gerrit.cloudera.org:8080/#/c/22293/3/tests/query_test/test_hive_timestamp_conversion.py@62
PS3, Line 62:         '1899-12-31 16:00:00',
optional: I would prefer moving these tests to a .test file (and merge the 3 
tests to avoid .test file spam)
This is just my preference as I find sql tests more readable without the Python 
"noise".



--
To view, visit http://gerrit.cloudera.org:8080/22293
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067
Gerrit-Change-Number: 22293
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Sat, 11 Jan 2025 12:59:46 +0000
Gerrit-HasComments: Yes

Reply via email to