[ https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768058#comment-15768058 ]
Matt McCline commented on HIVE-15335: ------------------------------------- Query benchmark on V1 showed very, very high cost in HiveDecimalWritable (serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in ORC decimal deserialization (BigInteger). The cost of V1 decimal add turns out not to be add but the cost of HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger bytes for HiveDecimalWritable.set. Everywhere code was doing a getHiveDecimal to pass it around between components. Making HiveDecimalWritable a fast, first class citizen was major part of this change. That included making HiveDecimalWritable the object of choice to pass around or operate on directly. E.g. Vectorized SUM aggregation eliminated almost call calls HiveDecimalWritable.getHiveDecimal() for its summing. One query benchmark on the new code showed 3X improvement and the add method cost was in the noise. So storing decimals in 1 long instead of 3 (i.e. so called fast path isn't the place to look. Microbenchmarks on add cost miss the boat. The fast path is using HiveDecimalWritable.mutableAdd and the fast V2 serialization/deserialization methods including the HiveDecimal.create family / HiveDecimalWritable.set family. > Fast Decimal > ------------ > > Key: HIVE-15335 > URL: https://issues.apache.org/jira/browse/HIVE-15335 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, > HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, > HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, > HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, > HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, > HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, > HIVE-15335.099.patch, HIVE-15335.0991.patch > > > Replace HiveDecimal implementation that currently represents the decimal > internally as a BigDecimal with a faster version that does not allocate extra > objects > Replace HiveDecimalWritable implementation with a faster version that has new > mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and > stores the result as a fast decimal instead of a slow byte array containing a > serialized BigInteger. > Provide faster ways to serialize/deserialize decimals. -- This message was sent by Atlassian JIRA (v6.3.4#6332)