[ 
https://issues.apache.org/jira/browse/HIVE-24354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24354:
--------------------------------
    Description: 
While writing HIVE-24245 I found that ColumnVector doesn't have any methods for 
getting a value from the vector, like:
{code}
ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
ColumnVector.getHash(n) // get the murmur hash for the nth element
{code}

Because of this, I ended up writing different vectorized UDAFs for different 
data types, and the only difference was a single line which was about obtaining 
a value from the vector. In the current vector expressions I can see a pattern 
where we copy the whole expression with an abstract logic and the loops (this 
is something I was thinking about in the scope of HIVE-21465 already), but I 
don't like that way. When I create an abstract vectorized udaf, and extend it 
for certain data types, I'm already allowed to bring in the overhead of 
function calls for every single value, but I don't think I violate basic 
vectorization principles, as we have vectors, so e.g. the object inspection 
overhead is already eliminated.
I propose some convenience methods like above, which can define a strict 
contract about how to retrieve data from a ColumnVector, I mean the nth elment 
of the vector in particular.


> ColumnVector should declare abstract convenience methods for getting values
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24354
>                 URL: https://issues.apache.org/jira/browse/HIVE-24354
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> While writing HIVE-24245 I found that ColumnVector doesn't have any methods 
> for getting a value from the vector, like:
> {code}
> ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
> ColumnVector.getHash(n) // get the murmur hash for the nth element
> {code}
> Because of this, I ended up writing different vectorized UDAFs for different 
> data types, and the only difference was a single line which was about 
> obtaining a value from the vector. In the current vector expressions I can 
> see a pattern where we copy the whole expression with an abstract logic and 
> the loops (this is something I was thinking about in the scope of HIVE-21465 
> already), but I don't like that way. When I create an abstract vectorized 
> udaf, and extend it for certain data types, I'm already allowed to bring in 
> the overhead of function calls for every single value, but I don't think I 
> violate basic vectorization principles, as we have vectors, so e.g. the 
> object inspection overhead is already eliminated.
> I propose some convenience methods like above, which can define a strict 
> contract about how to retrieve data from a ColumnVector, I mean the nth 
> elment of the vector in particular.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to