[ https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843601#comment-13843601 ]
Eric Hanson commented on HIVE-5762: ----------------------------------- I'm thinking about using this basic structure for a decimal column vector for limited-precision decimals. Then a utility package of static functions can be implemented to do decimal arithmetic on individual values. It should be possible to make this a lot faster than if the code relies on java.math.BigDecimal, because it is less general, and because new() and garbage collection will be reduced. {code} public class DecimalColumnVector extends ColumnVector { public int precision; // precision of all elements in vector (max 38) public int scale; // scale of all elements in vector (max 38) public static final int WORDS_PER_VALUE = 4; /** * Logically a vector of 128 bit unsigned int, that is "little-endian." This * means that for a value v, v[0] is least significant. The 4-word * 32 bit values are treated as unsigned. However,the high-order bit * of the highest word (word 3) must be 0. */ public int[][] vector; public byte[] sign; // -1 if negative, 0 if zero, 1 if positive public DecimalColumnVector() { super(VectorizedRowBatch.DEFAULT_SIZE); final int len = VectorizedRowBatch.DEFAULT_SIZE; vector = new int[len][]; for (int i = 0; i < len; i++) { vector[i] = new int[WORDS_PER_VALUE]; } sign = new byte[len]; } ... } {code} > Implement vectorized support for the DECIMAL data type > ------------------------------------------------------ > > Key: HIVE-5762 > URL: https://issues.apache.org/jira/browse/HIVE-5762 > Project: Hive > Issue Type: Sub-task > Reporter: Eric Hanson > > Add support to allow queries referencing DECIMAL columns and expression > results to run efficiently in vectorized mode. Include unit tests and > end-to-end tests. > Before starting or at least going very far, please write design specification > (a new section for the design spec attached to HIVE-4160) for how support for > the different DECIMAL types should work in vectorized mode, and the roadmap, > and have it reviewed. > It may be feasible to re-use LongColumnVector and related VectorExpression > classes for fixed-point decimal in certain data ranges. That should be at > least considered to get faster performance and save code. For unlimited > precision DECIMAL, a new column vector subtype may be needed, or a > BytesColumnVector could be re-used. -- This message was sent by Atlassian JIRA (v6.1.4#6159)