Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: [WIP] IMPALA-12322: Support converting UTC timestamps read from 
Kudu to local time
......................................................................


Patch Set 6:

(3 comments)

Thanks for working on the DST issues! I generally think that doing this for 
compile time predicate push down is critical while runtime filters are 
secondary, as they only matter if the timestamp column is a join key, which 
seems atypical for me.

For testing: I created a test table for Parquet, 
int64_timestamps_at_dst_changes. There are only a few values/tests, but it 
could be a good starting point.

http://gerrit.cloudera.org:8080/#/c/20681/5/be/src/exec/kudu/kudu-scanner.cc
File be/src/exec/kudu/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20681/5/be/src/exec/kudu/kudu-scanner.cc@288
PS5, Line 288:         TimestampValue ts_min;
             :         TimestampValue ts_max;
             :         if (state_->query_options().convert_kudu_utc_timestamps 
&&
             :             col_type.type == TYPE_TIMESTAMP) {
             :           ts_min = *reinterpret_cast<const TimestampValue*>(min);
             :           ts_max = *reinterpret_cast<const TimestampValue*>(max);
             :           ConvertLocalTimeMinStatToUTC(&ts_min);
             :           ConvertLocalTimeMaxStatToUTC(&ts_max);
             :           min = &ts_min;
             :           max = &ts_max;
             :         }
             :
> > This can filter out values incorrectly during dst changes. The
Looks great!


http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@573
PS5, Line 573:           kuduPredicate = 
KuduPredicate.newComparisonPredicate(column, op, unixMicros);
> > This may not work correctly during DST change, see my other
I agree with the solution. EQ predicates could be turned to an in-list if it is 
ambiguous. ExprUtil.localTimestampToUnixTimeMicros could get an extra argument 
to define what kind of value is expected. There is an explanation of pre and 
post values at 
https://github.com/google/cctz/blob/2acd9f2b463eee05c354901a9fd5235fe06c017b/include/cctz/time_zone.h#L98


http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20681/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@485
PS5, Line 485:         // The filter is targeted for Kudu scan node with source 
timestamp truncation.
             :         String functionName = 
analyzer.getQueryOptions().isConvert_kudu_utc_timestamps() ?
             :             "to_unix_micros" : "utc_to_unix_micros";
             :         Expr toUnixTimeExpr =
> > Bloom filters with timezone conversion are tricky - the problem is
I think that it is ok to disable bloom filters in case 
isConvert_kudu_utc_timestamps is true and the timezone is not UTC. I don't know 
how frequently people use timestamp columns as join keys on Kudu tables - this 
may be niche use case not worth optimizing. If someone needs this to be fast, 
they can still rewrite their query to do the join on UTC timestamps, which 
should be the fastest and most precise way if all tables involved store the 
timestamp as UTC.

I am also ok with adding a query option.

An alternative way would be to not push down the bloom filter to Kudu but 
evaluate in the scanner post timezone conversion. I think that this would work 
correctly, but would be less optimal than pushing down to Kudu and would still 
need some work.



--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 6
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Wed, 15 Nov 2023 14:30:02 +0000
Gerrit-HasComments: Yes

Reply via email to