[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222774#comment-14222774 ]
Dong Chen commented on HIVE-8128: --------------------------------- To improve Parquet Vectorization, I think we need following changes, and they should be based on PARQUET-131. These are some initial thoughts and I will make them more specific after working on parquet side for a while. Assuming the RecordReader in Hive will get data of type {{ParquetVectorizedRowBatch}}. 1. The next() method of {{VectorizedParquetRecordReader}} should be {{next(NullWritable key, ParquetVectorizedRowBatch outputBatch)}}. This will let Hive get a vectorized batch of rows of Parquet at a time. 2. A {{VectorizedParquetHiveSerDe}} will be added to convert {{ParquetVectorizedRowBatch}} to Hive recognized {{VectorizedRowBatch}}. In order to make conversion efficiently, the Parquet vectorized API design might consider this. The more similar between the 2 kinds of row batch, the better. 3. The support for partition has been in trunk. Whether it works for Parquet should be verified after main work is done, and make possible changes if neccessary. > Improve Parquet Vectorization > ----------------------------- > > Key: HIVE-8128 > URL: https://issues.apache.org/jira/browse/HIVE-8128 > Project: Hive > Issue Type: Sub-task > Reporter: Brock Noland > Assignee: Dong Chen > > We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, > VectorizedOrcSerde) which was partially done in HIVE-5998. -- This message was sent by Atlassian JIRA (v6.3.4#6332)