[ https://issues.apache.org/jira/browse/HIVE-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-14893: ------------------------------------ Description: See the results for vectorized in decimal_11 test added in HIVE-14863. We cast decimal to various int types; the cast is specialized for each type on non-vectorized side; on vectorized side, it's only specialized for LongColumnVector, so all the decimals get converted to longs. LongColumnVector gets converted to a proper type in some other mysterious place later, and tiny/small/regular ints become truncated at that point. Logically, I am not sure if every vectorized expression should be aware of the underlying type for the LongColumnVector (that seems implausible - I am not sure if type information is even available, and if yes it doesn't look like it's used in other places), or if the long-to-smaller-type automatic conversion should be fixed to produce nulls on overflow. However it seems like a good idea to do the latter in any case, to have a catch-all for all the vectorized expressions that might treat LongCV as representing longs at all times. Update - I see 10s of places in the code where it does something like this: (int) ((LongColumnVector) batch.cols[projectionColumnNum]).vector[adjustedIndex] Also for other types. These might all be problematic. was: See the results for vectorized in decimal_11 test added in HIVE-14863. We cast decimal to various int types; the cast is specialized for each type on non-vectorized side; on vectorized side, it's only specialized for LongColumnVector, so all the decimals get converted to longs. LongColumnVector gets converted to a proper type in some other mysterious place later, and tiny/small/regular ints become truncated at that point. Logically, I am not sure if every vectorized expression should be aware of the underlying type for the LongColumnVector (that seems implausible - I am not sure if type information is even available, and if yes it doesn't look like it's used in other places), or if the long-to-smaller-type automatic conversion should be fixed to produce nulls on overflow. However it seems like a good idea to do the latter in any case, to have a catch-all for all the vectorized expressions that might treat LongCV as representing longs at all times. > vectorized execution may convert LongCV to smaller types incorrectly > -------------------------------------------------------------------- > > Key: HIVE-14893 > URL: https://issues.apache.org/jira/browse/HIVE-14893 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Matt McCline > Priority: Critical > > See the results for vectorized in decimal_11 test added in HIVE-14863. > We cast decimal to various int types; the cast is specialized for each type > on non-vectorized side; on vectorized side, it's only specialized for > LongColumnVector, so all the decimals get converted to longs. > LongColumnVector gets converted to a proper type in some other mysterious > place later, and tiny/small/regular ints become truncated at that point. > Logically, I am not sure if every vectorized expression should be aware of > the underlying type for the LongColumnVector (that seems implausible - I am > not sure if type information is even available, and if yes it doesn't look > like it's used in other places), or if the long-to-smaller-type automatic > conversion should be fixed to produce nulls on overflow. > However it seems like a good idea to do the latter in any case, to have a > catch-all for all the vectorized expressions that might treat LongCV as > representing longs at all times. > Update - I see 10s of places in the code where it does something like this: > (int) ((LongColumnVector) > batch.cols[projectionColumnNum]).vector[adjustedIndex] > Also for other types. These might all be problematic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)