[ 
https://issues.apache.org/jira/browse/HIVE-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-6511:
---------------------------------------

    Attachment: HIVE-6511.1.patch

The longValue function in Decimal128 rounds the value. HiveDecimal just 
discards the fractional part. This patch adds another method to Decimal128, 
that discards the fractional part, and is used in the CastDecimalToLong 
expression.

> casting from decimal to tinyint,smallint, int and bigint generates different 
> result when vectorization is on
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6511
>                 URL: https://issues.apache.org/jira/browse/HIVE-6511
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: HIVE-6511.1.patch
>
>
> select dc,cast(dc as int), cast(dc as smallint),cast(dc as tinyint) from 
> vectortab10korc limit 20 generates following result when vectorization is 
> enabled:
> {code}
> 4619756289662.078125  -1628520834     -16770  126
> 1553532646710.316406  -1245514442     -2762   54
> 3367942487288.360352  688127224       -776    -8
> 4386447830839.337891  1286221623      12087   55
> -3234165331139.458008 -54957251       27453   61
> -488378613475.326172  1247658269      -16099  29
> -493942492598.691406  -21253559       -19895  73
> 3101852523586.039062  886135874       23618   66
> 2544105595941.381836  1484956709      -23515  37
> -3997512403067.0625   1102149509      30597   -123
> -1183754978977.589355 1655994718      31070   94
> 1408783849655.676758  34576568        -26440  -72
> -2993175106993.426758 417098319       27215   79
> 3004723551798.100586  -1753555402     -8650   54
> 1103792083527.786133  -14511544       -28088  72
> 469767055288.485352   1615620024      26552   -72
> -1263700791098.294434 -980406074      12486   -58
> -4244889766496.484375 -1462078048     30112   -96
> -3962729491139.782715 1525323068      -27332  60
> NULL  NULL    NULL    NULL
> {code}
> When vectorization is disabled, result looks like this:
> {code}
> 4619756289662.078125  -1628520834     -16770  126
> 1553532646710.316406  -1245514442     -2762   54
> 3367942487288.360352  688127224       -776    -8
> 4386447830839.337891  1286221623      12087   55
> -3234165331139.458008 -54957251       27453   61
> -488378613475.326172  1247658269      -16099  29
> -493942492598.691406  -21253558       -19894  74
> 3101852523586.039062  886135874       23618   66
> 2544105595941.381836  1484956709      -23515  37
> -3997512403067.0625   1102149509      30597   -123
> -1183754978977.589355 1655994719      31071   95
> 1408783849655.676758  34576567        -26441  -73
> -2993175106993.426758 417098319       27215   79
> 3004723551798.100586  -1753555402     -8650   54
> 1103792083527.786133  -14511545       -28089  71
> 469767055288.485352   1615620024      26552   -72
> -1263700791098.294434 -980406074      12486   -58
> -4244889766496.484375 -1462078048     30112   -96
> -3962729491139.782715 1525323069      -27331  61
> NULL  NULL    NULL    NULL
> {code}
> This issue is visible only for certain decimal values. In above example, row 
> 7,11,12, and 15 generates different results.
> vectortab10korc table schema:
> {code}
> t                     tinyint                 from deserializer   
> si                    smallint                from deserializer   
> i                     int                     from deserializer   
> b                     bigint                  from deserializer   
> f                     float                   from deserializer   
> d                     double                  from deserializer   
> dc                    decimal(38,18)          from deserializer   
> bo                    boolean                 from deserializer   
> s                     string                  from deserializer   
> s2                    string                  from deserializer   
> ts                    timestamp               from deserializer   
>                
> # Detailed Table Information           
> Database:             default                  
> Owner:                xyz                      
> CreateTime:           Tue Feb 25 21:54:28 UTC 2014     
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             
> hdfs://host1.domain.com:8020/apps/hive/warehouse/vectortab10korc         
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   true                
>       numFiles                1                   
>       numRows                 10000               
>       rawDataSize             0                   
>       totalSize               344748              
>       transient_lastDdlTime   1393365281          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.ql.io.orc.OrcSerde        
> InputFormat:          org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat:         org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        
>  
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       serialization.format    1                   
> Time taken: 0.196 seconds, Fetched: 41 row(s
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to