[jira] [Commented] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive

Michael Howard (JIRA) Tue, 24 Feb 2015 15:13:14 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335635#comment-14335635
 ]


Michael Howard commented on HIVE-6017:
--------------------------------------

These are observations about the problems with "native" multiplication and 
division in Decimal128. 

Hive 0.13 source code is visible at:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/0.13.0/org/apache/hadoop/hive/common/type/Decimal128.java

The "native" Decimal128 multiply and divide code is not in use. 

Instead, there is a comment which says:
> As of 2/11/2014 this has a known bug in multiplication. ...
> The fix employed now is to replace this code with code that uses Java 
> BigDecimal multiply. 

The fundamental problem with the multiply code is that 
UnsignedInt128.multiplyArrays4And4To4NoOverflow() is broken. 

There are multiple stages where the intermediate/temp variable 'product' is 
calculated. 
Unfortunately, carry information is being lost (out the top) whenever the 
results of two or more multiplications are being summed together. 

It is perfectly fine to say:
  uint64 = uint32 * uint32
or
  uint64 = uint32 * uint32 + carry

However, the following is invalid because it will lose (carry) information
 uint64 = (uint32 * uint32) + (uint32 * uint32)

an example case is:
0xFFFFFFFFL * 0xFFFFFFFFL + 0xFFFFFFFFL * 0xFFFFFFFFL

This sum cannot be held in a uint64 ... a carry out the top is lost. 

This is easier to envision if you do the same operation with 8=>16 bits instead 
of 32=>64 bits
0xFF * 0xFF = 0xFE01 // ok
0xFF * 0xFF + 0xFF = 0xFE01 + 0xFF = 0xFEFF // ok
0xFF * 0xFF + 0xFF * 0xFF = 0xFE01 + 0xFE01 = 0x1FC02 // not ok ... overflows 
16 bits ... carry lost

The UnsignedInt128.multiplyArrays4And4To4NoOverflow() and 
UnsignedInt128.multiplyArrays4And4To8() methods use this pattern repeatedly, 
and therefore are broken. 

These routines need to be implemented in such a way that the right-hand-side 
contains no more than a single multiply (uint32 * uint32) and a single addition 
(+ uint32)

This is fine:
1899    product = (right[0] & SqlMathUtil.LONG_MASK)
1900        * (left[0] & SqlMathUtil.LONG_MASK);
1901    int z0 = (int) product;

This is not ... the sum will overflow and 'product' will have lost the carry
1903    product = (right[0] & SqlMathUtil.LONG_MASK)
1904        * (left[1] & SqlMathUtil.LONG_MASK)
1905        + (right[1] & SqlMathUtil.LONG_MASK)
1906        * (left[0] & SqlMathUtil.LONG_MASK) + (product >>> 32);
1907    int z1 = (int) product;

This code needs to be rewritten as a sum of terms where each term is a 
uint32[4]*uint32 which is left shifted. 

UnsignedInt128.multiplyDestructive(int) and 
UnsignedInt128.addDestructive(int[]) are just fine. 
SqlMathUtil.multiplyMultiprecision(int[], int) is also fine. 

UnsignedInt128.multiplyArrays4And4To4NoOverflow() and 
UnsignedInt128.multiplyArrays4And4To8()  need to be implemented using these as 
building blocks. 

The goal of this was to provide "high-performance". The potential benefit comes 
from fact that these objects are mutable and it avoids the object creation/GC 
associated with BigDecimal. Therefore, it is a little painful to see:

1173  public void More ...multiplyDestructive(Decimal128 right, short newScale) 
{
1174    HiveDecimal rightHD = HiveDecimal.create(right.toBigDecimal());
1175    HiveDecimal thisHD = HiveDecimal.create(this.toBigDecimal());
1176    HiveDecimal result = thisHD.multiply(rightHD);
...
1185    this.update(result.bigDecimalValue().toPlainString(), newScale);


Enough on multiplication. Regarding division ...

1240  public void More ...divideDestructiveNativeDecimal128

has a comment:
  As of 1/20/2014 this has a known bug in division. See 
TestDecimal128.testKnownPriorErrors(). 

TestDecimal128.testKnownPriorErrors uses both Decimal128.multiplication and 
Decimal128.division. We know that there is a problem with multiplication. 
Unclear whether or not there is a problem with Decimal128.division.

I have not investigated Decimal128.division in detail. 
Based upon a quick glance, I did not see any fundamental problems with 
SqlMathUtil.divideMultiPrecision(). So the fundamental underpinnings might be 
OK. 

However, I did not see any reference to RoundingMode behavior in 
Decimal128.divideDestructiveNativeDecimal128(). 
Decimal128 should produce the same results as BigDecimal ... that is at least 
desired, and probably required. 
In order to accomplish this Decimal128 will need to deal with RoundingModes ... 
or at least round consistently with RoundingMode.HALF_UP.


> Contribute Decimal128 high-performance decimal(p, s) package from Microsoft 
> to Hive
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-6017
>                 URL: https://issues.apache.org/jira/browse/HIVE-6017
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.13.0
>            Reporter: Eric Hanson
>            Assignee: Eric Hanson
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6017.01.patch, HIVE-6017.02.patch, 
> HIVE-6017.03.patch, HIVE-6017.04.patch
>
>
> Contribute the Decimal128 high-performance decimal package developed by 
> Microsoft to Hive. This was originally written for Microsoft PolyBase by 
> Hideaki Kimura.
> This code is about 8X more efficient than Java BigDecimal for typical 
> operations. It uses a finite (128 bit) precision and can handle up to 
> decimal(38, X). It is also "mutable" so you can change the contents of an 
> existing object. This helps reduce the cost of new() and garbage collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive

Reply via email to