----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17899/ -----------------------------------------------------------
Review request for hive, Brock Noland, Eric Hanson, and Jitendra Pandey. Bugs: HIVE-5998 https://issues.apache.org/jira/browse/HIVE-5998 Repository: hive-git Description ------- Implementation is straight forward and very simple, but offers all benefits of vectorization possible with a 'shallow' vectorized reader (ie. one that doe not got into parquet-mr project changes). the only complication arrised because of discrepancies between the object inspector seen by the inputformat and the actual output provided by the Parquet readers (eg. OI declares 'byte' primitives but the Parquet reader outputs IntWritable). I had to create a just-in-time VectorColumnAssigner colelciton base don whatever writers the Parquet record reader provides. It is assumed the reader does not change it's output during the iteration. Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java d1a75df ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 0b504de ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java f513188 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java d3412df ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java PRE-CREATION Diff: https://reviews.apache.org/r/17899/diff/ Testing ------- Manually tested. I will add .q query but I need to get home (to my Mac) where I can actually run tests and create expected output(s). Thanks, Remus Rusanu