[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624378#comment-14624378 ]
Dong Chen commented on HIVE-8128: --------------------------------- Hi [~nezihyigitbasi], I updated and run Hive POC based on the latest changes at your repo: https://github.com/nezihyigitbasi-nflx/parquet-mr/commits/vector All looks good. Thanks. During development, I got some thoughts about the vector API. Could you help to take a look at them? * In {{ColumnVector}}, how about adding two attributes: one is {{boolean noNulls}}, which indicates whether the whole column vector has no null value. The other is {{boolean isRepeating}}, which indicates whether the same value repeats for whole column vector. They could be calculated at the same time when we read a vector. The reason we want them is that Hive vector engine can check these attribute to skip some values. And it might be better to calculate them in Parquet once, instead of calculate them by re-visit vectors again in Hive. (Not sure other engines need this. But it should be ok that Parquet supports this.) * In {{RowBatch}}, how about adding one attribute {{int size}}, which indicates the number of rows in this batch. This is just for easy usage. Its value should be the same as {{RowBatch.columns\[0\].numValues}}. What do you think? > Improve Parquet Vectorization > ----------------------------- > > Key: HIVE-8128 > URL: https://issues.apache.org/jira/browse/HIVE-8128 > Project: Hive > Issue Type: Sub-task > Reporter: Brock Noland > Assignee: Dong Chen > Fix For: parquet-branch > > Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch > > > NO PRECOMMIT TESTS > We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, > VectorizedOrcSerde) which was partially done in HIVE-5998. > As discussed in PARQUET-131, we will work out Hive POC based on the new > Parquet vectorized API, and then finish the implementation after finilized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)