[ 
https://issues.apache.org/jira/browse/HIVE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811796#comment-13811796
 ] 

Eric Hanson commented on HIVE-5397:
-----------------------------------

Hi Brock,

I'm in favor of encapsulation for most code. But this is different because this 
is a low-level performance enhancement project that has some research behind 
it. The theory behind the vectorized query execution technique that we use was 
published in this paper:

Peter Boncz et al., MonetDB/X100: Hyper-Pipelining Query Execution, Proceedings 
of the CIDR Conference, 2005. 
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C26BD72358252F6A301DA1FF6E37D44B?doi=10.1.1.324.9516&rep=rep1&type=pdf

Please see the performance numbers in the paper.

State of the art query execution systems like the one in Microsoft SQL Server, 
Vectorwise, Vertica, and ParAccel/Redshift (not in any particular order), all 
use this strategy or something like it. It's well known in the industry that 
this is a place where being architecture-conscious pays big dividends. That 
requires some violation of encapsulation. 

It is possible that the compiler might do some function inlining for us in the 
inner loop of some of the vector "for" loops, but that is too much of a risk 
for us in most cases to rely on the compiler here for the most primitive 
operations like arithmetic and comparisons. Arguably, using put/get methods to 
access columns rather than array access like we use in our VectorExpression 
subclasses probably would not lose much perfomance. But we already decided to 
use array access to get columns, and it is used in hundreds of places in the 
code. I think it is a reasonable choice and not necessary to change it.

-Eric




> VectorizedRowBatch member variables are public.
> -----------------------------------------------
>
>                 Key: HIVE-5397
>                 URL: https://issues.apache.org/jira/browse/HIVE-5397
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>
> VectorizedRowBatch exposes members as public to avoid method call overheads. 
> Alternative is to rely on JIT to inline the methods. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to