Sanjar Akhmedov created HIVE-22477: -------------------------------------- Summary: Avro logical type timestamp conversion is slow Key: HIVE-22477 URL: https://issues.apache.org/jira/browse/HIVE-22477 Project: Hive Issue Type: Improvement Affects Versions: 3.1.0 Environment: Hive 3.1.0 Reporter: Sanjar Akhmedov
We have an avro backed table with hundreds of billions timestamps. Simple {{SELECT COUNT(*) FROM t}} query takes many hours to complete in version 3.1.0 versus tens of minutes in version 1.2.1. Looking at the attached flamegraph of one of the yarn containers, hive is spending most of the time throwing exceptions during avro timestamp conversion. It is generally good idea to avoid throwing exceptions in performance critical sections, as exception creation is an expensive operation, and potentially repeating for many rows/values in a query can have drastic performance implications. Afaics there is no reason to convert numeric timestamp to a string and enter very lenient {{org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String, java.time.ZoneId)}} to do timezone conversion. This patch changes the conversion of {{Date}} and {{Timestamp}} to {{TimestampTZ}} such that it doesn't invoke {{parse}}. JMH timings before: {code:java} Benchmark Mode Cnt Score Error Units TimestampTZUtilBench.convertDate avgt 2 10091.990 ns/op TimestampTZUtilBench.convertTimestamp avgt 2 10657.596 ns/op {code} JMH timings after: {code:java} Benchmark Mode Cnt Score Error Units TimestampTZUtilBench.convertDate avgt 2 48.371 ns/op TimestampTZUtilBench.convertTimestamp avgt 2 51.170 ns/op {code} JMH stack profile before: {code:java} Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack": Stack profiler: ....[Thread state distributions].................................................................... 100.0% RUNNABLE ....[Thread state: RUNNABLE]........................................................................ 97.4% 97.4% java.lang.Throwable.fillInStackTrace 1.6% 1.6% java.time.format.DateTimeFormatter.parse 0.2% 0.2% java.time.ZoneId.from 0.1% 0.1% java.util.HashMap.hash 0.1% 0.1% java.lang.Number.<init> 0.1% 0.1% java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format 0.1% 0.1% java.lang.StringBuilder.append 0.1% 0.1% java.util.HashMap.putVal 0.1% 0.1% java.lang.String.valueOf 0.1% 0.1% java.util.regex.Pattern$BmpCharProperty.match 0.2% 0.2% <other> ... Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack": Stack profiler: ....[Thread state distributions].................................................................... 100.0% RUNNABLE ....[Thread state: RUNNABLE]........................................................................ 96.5% 96.5% java.lang.Throwable.fillInStackTrace 1.0% 1.0% java.time.format.DateTimeFormatter.parse 0.6% 0.6% org.apache.hadoop.hive.common.type.TimestampTZUtil.parse 0.4% 0.4% java.time.ZoneId.from 0.2% 0.2% java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format 0.2% 0.2% java.time.format.Parsed.resolveFields 0.2% 0.2% java.lang.String.valueOf 0.1% 0.1% java.lang.StringBuilder.append 0.1% 0.1% java.util.HashMap.hash 0.1% 0.1% java.time.format.DateTimeParseContext.toResolved 0.6% 0.6% <other> {code} JMH stack profile after: {code:java} Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack": Stack profiler: ....[Thread state distributions].................................................................... 100.0% RUNNABLE ....[Thread state: RUNNABLE]........................................................................ 91.6% 91.6% java.time.ZonedDateTime.ofInstant 8.0% 8.0% org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertDate_jmhTest.convertDate_avgt_jmhStub 0.1% 0.1% java.time.zone.ZoneRules.<init> 0.1% 0.1% java.time.LocalDateTime.ofEpochSecond 0.1% 0.1% org.apache.hadoop.hive.common.type.TimestampTZUtil.convert 0.1% 0.1% java.time.LocalDate.ofEpochDay 0.1% 0.1% java.time.ZonedDateTime.create ... Secondary result "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack": Stack profiler: ....[Thread state distributions].................................................................... 100.0% RUNNABLE ....[Thread state: RUNNABLE]........................................................................ 90.7% 90.7% java.time.ZonedDateTime.ofInstant 9.0% 9.0% org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_avgt_jmhStub 0.1% 0.1% java.time.zone.ZoneRules.<init> 0.1% 0.1% org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_AverageTime 0.1% 0.1% java.time.LocalDateTime.ofEpochSecond 0.1% 0.1% java.time.LocalDate.ofEpochDay 0.1% 0.1% java.time.ZonedDateTime.create {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)