Sanjar Akhmedov created HIVE-22477:
--------------------------------------

             Summary: Avro logical type timestamp conversion is slow
                 Key: HIVE-22477
                 URL: https://issues.apache.org/jira/browse/HIVE-22477
             Project: Hive
          Issue Type: Improvement
    Affects Versions: 3.1.0
         Environment: Hive 3.1.0
            Reporter: Sanjar Akhmedov


We have an avro backed table with hundreds of billions timestamps. Simple 
{{SELECT COUNT(*) FROM t}} query takes many hours to complete in version 3.1.0 
versus tens of minutes in version 1.2.1.

Looking at the attached flamegraph of one of the yarn containers, hive is 
spending most of the time throwing exceptions during avro timestamp conversion.

It is generally good idea to avoid throwing exceptions in performance critical 
sections, as exception creation is an expensive operation, and potentially 
repeating for many rows/values in a query can have drastic performance 
implications.

Afaics there is no reason to convert numeric timestamp to a string and enter 
very lenient 
{{org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String, 
java.time.ZoneId)}} to do timezone conversion.

This patch changes the conversion of {{Date}} and {{Timestamp}} to 
{{TimestampTZ}} such that it doesn't invoke {{parse}}.

JMH timings before:
{code:java}
Benchmark                                     Mode  Cnt      Score   Error  
Units
TimestampTZUtilBench.convertDate              avgt    2  10091.990          
ns/op
TimestampTZUtilBench.convertTimestamp         avgt    2  10657.596          
ns/op
{code}
JMH timings after:
{code:java}
Benchmark                                     Mode  Cnt   Score   Error  Units
TimestampTZUtilBench.convertDate              avgt    2  48.371          ns/op
TimestampTZUtilBench.convertTimestamp         avgt    2  51.170          ns/op
{code}
JMH stack profile before:
{code:java}
Secondary result 
"org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack":
Stack profiler:

....[Thread state 
distributions]....................................................................
100.0%         RUNNABLE

....[Thread state: 
RUNNABLE]........................................................................
 97.4%  97.4% java.lang.Throwable.fillInStackTrace
  1.6%   1.6% java.time.format.DateTimeFormatter.parse
  0.2%   0.2% java.time.ZoneId.from
  0.1%   0.1% java.util.HashMap.hash
  0.1%   0.1% java.lang.Number.<init>
  0.1%   0.1% 
java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format
  0.1%   0.1% java.lang.StringBuilder.append
  0.1%   0.1% java.util.HashMap.putVal
  0.1%   0.1% java.lang.String.valueOf
  0.1%   0.1% java.util.regex.Pattern$BmpCharProperty.match
  0.2%   0.2% <other>

...

Secondary result 
"org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack":
Stack profiler:

....[Thread state 
distributions]....................................................................
100.0%         RUNNABLE

....[Thread state: 
RUNNABLE]........................................................................
 96.5%  96.5% java.lang.Throwable.fillInStackTrace
  1.0%   1.0% java.time.format.DateTimeFormatter.parse
  0.6%   0.6% org.apache.hadoop.hive.common.type.TimestampTZUtil.parse
  0.4%   0.4% java.time.ZoneId.from
  0.2%   0.2% 
java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format
  0.2%   0.2% java.time.format.Parsed.resolveFields
  0.2%   0.2% java.lang.String.valueOf
  0.1%   0.1% java.lang.StringBuilder.append
  0.1%   0.1% java.util.HashMap.hash
  0.1%   0.1% java.time.format.DateTimeParseContext.toResolved
  0.6%   0.6% <other>
{code}
JMH stack profile after:
{code:java}
Secondary result 
"org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack":
Stack profiler:

....[Thread state 
distributions]....................................................................
100.0%         RUNNABLE

....[Thread state: 
RUNNABLE]........................................................................
 91.6%  91.6% java.time.ZonedDateTime.ofInstant
  8.0%   8.0% 
org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertDate_jmhTest.convertDate_avgt_jmhStub
  0.1%   0.1% java.time.zone.ZoneRules.<init>
  0.1%   0.1% java.time.LocalDateTime.ofEpochSecond
  0.1%   0.1% org.apache.hadoop.hive.common.type.TimestampTZUtil.convert
  0.1%   0.1% java.time.LocalDate.ofEpochDay
  0.1%   0.1% java.time.ZonedDateTime.create

...

Secondary result 
"org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack":
Stack profiler:

....[Thread state 
distributions]....................................................................
100.0%         RUNNABLE

....[Thread state: 
RUNNABLE]........................................................................
 90.7%  90.7% java.time.ZonedDateTime.ofInstant
  9.0%   9.0% 
org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_avgt_jmhStub
  0.1%   0.1% java.time.zone.ZoneRules.<init>
  0.1%   0.1% 
org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_AverageTime
  0.1%   0.1% java.time.LocalDateTime.ofEpochSecond
  0.1%   0.1% java.time.LocalDate.ofEpochDay
  0.1%   0.1% java.time.ZonedDateTime.create
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to