> On April 24, 2013, 11:01 p.m., Eric Hanson wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java, > > line 97 > > <https://reviews.apache.org/r/10712/diff/2/?file=284237#file284237line97> > > > > if there are no nulls in a strip or split for a column, we should be > > able to do a fast code path that doesn't need this check and if-else > > > > I haven't see noNulls get set anywhere. What is the plan for setting > > noNulls as an optimization? That has a big performance impact in QE (about > > 30% time savings for filters and arithmetic) > >
This is being set in the parent class TreeReader::nextVector > On April 24, 2013, 11:01 p.m., Eric Hanson wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line > > 1486 > > <https://reviews.apache.org/r/10712/diff/2/?file=284236#file284236line1486> > > > > I don't understand this. map and struct are not supported yet, so I > > think this should be unimplemented. A table is represented as struct in ORC so this is required. > On April 24, 2013, 11:01 p.m., Eric Hanson wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line > > 1029 > > <https://reviews.apache.org/r/10712/diff/2/?file=284236#file284236line1029> > > > > The plan was to not support struct yet, but later, to support a field > > of a struct just like it was a regular column. STruct field access would > > just be a naming convention. > > > > A query might not access every field of a struct. This reads every > > field of the struct. > > > > I think probably we should leave this unimplemented and then come back > > and do it later using the naming-convention technique. A table is represented as struct in ORC so this is required. We are not reading all the columns of the table/struct, ORC record reader reads only the columns that are required. RecordReaderImpl::readStrip()in ORC is the method that does this. > On April 24, 2013, 11:01 p.m., Eric Hanson wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 173 > > <https://reviews.apache.org/r/10712/diff/2/?file=284236#file284236line173> > > > > I recommend this method take and return a ColumnVector instead of an > > Object since I don't think it would every make sense to note take a > > ColumnVector subtype > > > > this applies to all nextVector methods The reason this method is returning an object is because for struct tree readers, the return value is ColumnVector[] and not ColumnVector. Similarly each of the complex data type readers can opt to return different objects. > On April 24, 2013, 11:01 p.m., Eric Hanson wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line > > 1475 > > <https://reviews.apache.org/r/10712/diff/2/?file=284236#file284236line1475> > > > > put a javadoc comment describing method The javadoc for this method is at org.apache.hadoop.hive.ql.io.orc.RecordReader. - Sarvesh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10712/#review19674 ----------------------------------------------------------- On April 24, 2013, 9:53 p.m., Sarvesh Sakalanaga wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10712/ > ----------------------------------------------------------- > > (Updated April 24, 2013, 9:53 p.m.) > > > Review request for hive. > > > Description > ------- > > The patch contains changes to ORC reader to return a batch of rows instead of > a row. A new method called nextBatch() is added to ORC reader and tree > readers of ORC. Currently only int,long,short,double,float,string and struct > support batch processing. > > > This addresses bug HIVE-4370. > https://issues.apache.org/jira/browse/HIVE-4370 > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java > 246170d > ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java fc4e53b > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java 05240ce > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java d044cd8 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java > 2825c64 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/10712/diff/ > > > Testing > ------- > > > Thanks, > > Sarvesh Sakalanaga > >