[ https://issues.apache.org/jira/browse/HIVE-22477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sanjar Akhmedov updated HIVE-22477: ----------------------------------- Attachment: flamegraph.svg > Avro logical type timestamp conversion is slow > ---------------------------------------------- > > Key: HIVE-22477 > URL: https://issues.apache.org/jira/browse/HIVE-22477 > Project: Hive > Issue Type: Improvement > Affects Versions: 3.1.0 > Environment: Hive 3.1.0 > Reporter: Sanjar Akhmedov > Priority: Major > Labels: Performance > Attachments: flamegraph.svg > > > We have an avro backed table with hundreds of billions timestamps. Simple > {{SELECT COUNT(*) FROM t}} query takes many hours to complete in version > 3.1.0 versus tens of minutes in version 1.2.1. > Looking at the attached flamegraph of one of the yarn containers, hive is > spending most of the time throwing exceptions during avro timestamp > conversion. > It is generally good idea to avoid throwing exceptions in performance > critical sections, as exception creation is an expensive operation, and > potentially repeating for many rows/values in a query can have drastic > performance implications. > Afaics there is no reason to convert numeric timestamp to a string and enter > very lenient > {{org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String, > java.time.ZoneId)}} to do timezone conversion. > This patch changes the conversion of {{Date}} and {{Timestamp}} to > {{TimestampTZ}} such that it doesn't invoke {{parse}}. > JMH timings before: > {code:java} > Benchmark Mode Cnt Score Error > Units > TimestampTZUtilBench.convertDate avgt 2 10091.990 > ns/op > TimestampTZUtilBench.convertTimestamp avgt 2 10657.596 > ns/op > {code} > JMH timings after: > {code:java} > Benchmark Mode Cnt Score Error Units > TimestampTZUtilBench.convertDate avgt 2 48.371 ns/op > TimestampTZUtilBench.convertTimestamp avgt 2 51.170 ns/op > {code} > JMH stack profile before: > {code:java} > Secondary result > "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack": > Stack profiler: > ....[Thread state > distributions].................................................................... > 100.0% RUNNABLE > ....[Thread state: > RUNNABLE]........................................................................ > 97.4% 97.4% java.lang.Throwable.fillInStackTrace > 1.6% 1.6% java.time.format.DateTimeFormatter.parse > 0.2% 0.2% java.time.ZoneId.from > 0.1% 0.1% java.util.HashMap.hash > 0.1% 0.1% java.lang.Number.<init> > 0.1% 0.1% > java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format > 0.1% 0.1% java.lang.StringBuilder.append > 0.1% 0.1% java.util.HashMap.putVal > 0.1% 0.1% java.lang.String.valueOf > 0.1% 0.1% java.util.regex.Pattern$BmpCharProperty.match > 0.2% 0.2% <other> > ... > Secondary result > "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack": > Stack profiler: > ....[Thread state > distributions].................................................................... > 100.0% RUNNABLE > ....[Thread state: > RUNNABLE]........................................................................ > 96.5% 96.5% java.lang.Throwable.fillInStackTrace > 1.0% 1.0% java.time.format.DateTimeFormatter.parse > 0.6% 0.6% org.apache.hadoop.hive.common.type.TimestampTZUtil.parse > 0.4% 0.4% java.time.ZoneId.from > 0.2% 0.2% > java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format > 0.2% 0.2% java.time.format.Parsed.resolveFields > 0.2% 0.2% java.lang.String.valueOf > 0.1% 0.1% java.lang.StringBuilder.append > 0.1% 0.1% java.util.HashMap.hash > 0.1% 0.1% java.time.format.DateTimeParseContext.toResolved > 0.6% 0.6% <other> > {code} > JMH stack profile after: > {code:java} > Secondary result > "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertDate:·stack": > Stack profiler: > ....[Thread state > distributions].................................................................... > 100.0% RUNNABLE > ....[Thread state: > RUNNABLE]........................................................................ > 91.6% 91.6% java.time.ZonedDateTime.ofInstant > 8.0% 8.0% > org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertDate_jmhTest.convertDate_avgt_jmhStub > 0.1% 0.1% java.time.zone.ZoneRules.<init> > 0.1% 0.1% java.time.LocalDateTime.ofEpochSecond > 0.1% 0.1% org.apache.hadoop.hive.common.type.TimestampTZUtil.convert > 0.1% 0.1% java.time.LocalDate.ofEpochDay > 0.1% 0.1% java.time.ZonedDateTime.create > ... > Secondary result > "org.apache.hive.benchmark.common.TimestampTZUtilBench.convertTimestamp:·stack": > Stack profiler: > ....[Thread state > distributions].................................................................... > 100.0% RUNNABLE > ....[Thread state: > RUNNABLE]........................................................................ > 90.7% 90.7% java.time.ZonedDateTime.ofInstant > 9.0% 9.0% > org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_avgt_jmhStub > 0.1% 0.1% java.time.zone.ZoneRules.<init> > 0.1% 0.1% > org.apache.hive.benchmark.common.generated.TimestampTZUtilBench_convertTimestamp_jmhTest.convertTimestamp_AverageTime > 0.1% 0.1% java.time.LocalDateTime.ofEpochSecond > 0.1% 0.1% java.time.LocalDate.ofEpochDay > 0.1% 0.1% java.time.ZonedDateTime.create > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)