[ https://issues.apache.org/jira/browse/HIVE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811796#comment-13811796 ]
Eric Hanson commented on HIVE-5397: ----------------------------------- Hi Brock, I'm in favor of encapsulation for most code. But this is different because this is a low-level performance enhancement project that has some research behind it. The theory behind the vectorized query execution technique that we use was published in this paper: Peter Boncz et al., MonetDB/X100: Hyper-Pipelining Query Execution, Proceedings of the CIDR Conference, 2005. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C26BD72358252F6A301DA1FF6E37D44B?doi=10.1.1.324.9516&rep=rep1&type=pdf Please see the performance numbers in the paper. State of the art query execution systems like the one in Microsoft SQL Server, Vectorwise, Vertica, and ParAccel/Redshift (not in any particular order), all use this strategy or something like it. It's well known in the industry that this is a place where being architecture-conscious pays big dividends. That requires some violation of encapsulation. It is possible that the compiler might do some function inlining for us in the inner loop of some of the vector "for" loops, but that is too much of a risk for us in most cases to rely on the compiler here for the most primitive operations like arithmetic and comparisons. Arguably, using put/get methods to access columns rather than array access like we use in our VectorExpression subclasses probably would not lose much perfomance. But we already decided to use array access to get columns, and it is used in hundreds of places in the code. I think it is a reasonable choice and not necessary to change it. -Eric > VectorizedRowBatch member variables are public. > ----------------------------------------------- > > Key: HIVE-5397 > URL: https://issues.apache.org/jira/browse/HIVE-5397 > Project: Hive > Issue Type: Sub-task > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > > VectorizedRowBatch exposes members as public to avoid method call overheads. > Alternative is to rely on JIT to inline the methods. -- This message was sent by Atlassian JIRA (v6.1#6144)