[ https://issues.apache.org/jira/browse/HIVE-24354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-24354: -------------------------------- Description: While writing HIVE-24245 I found that ColumnVector doesn't have any methods for getting a value from the vector, like: {code} ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy? ColumnVector.getHash(n) // get the murmur hash for the nth element {code} Because of this, I ended up writing different vectorized UDAFs for different data types, and the only difference was a single line which was about obtaining a value from the vector. In the current vector expressions I can see a pattern where we copy the whole expression with an abstract logic and the loops (this is something I was thinking about in the scope of HIVE-21465 already), but I don't like that way. When I create an abstract vectorized udaf, and extend it for certain data types, I'm already allowed to bring in the overhead of function calls for every single value, but I don't think I violate basic vectorization principles, as we have vectors, so e.g. the object inspection overhead is already eliminated. I propose some convenience methods like above, which can define a strict contract about how to retrieve data from a ColumnVector, I mean the nth elment of the vector in particular. > ColumnVector should declare abstract convenience methods for getting values > --------------------------------------------------------------------------- > > Key: HIVE-24354 > URL: https://issues.apache.org/jira/browse/HIVE-24354 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > > While writing HIVE-24245 I found that ColumnVector doesn't have any methods > for getting a value from the vector, like: > {code} > ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy? > ColumnVector.getHash(n) // get the murmur hash for the nth element > {code} > Because of this, I ended up writing different vectorized UDAFs for different > data types, and the only difference was a single line which was about > obtaining a value from the vector. In the current vector expressions I can > see a pattern where we copy the whole expression with an abstract logic and > the loops (this is something I was thinking about in the scope of HIVE-21465 > already), but I don't like that way. When I create an abstract vectorized > udaf, and extend it for certain data types, I'm already allowed to bring in > the overhead of function calls for every single value, but I don't think I > violate basic vectorization principles, as we have vectors, so e.g. the > object inspection overhead is already eliminated. > I propose some convenience methods like above, which can define a strict > contract about how to retrieve data from a ColumnVector, I mean the nth > elment of the vector in particular. -- This message was sent by Atlassian Jira (v8.3.4#803005)