----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11530/#review21203 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java <https://reviews.apache.org/r/11530/#comment44042> Sun Java style convention is to have a blank line before all comments. I don't personally mind but that's what I've been asked to do :-). ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java <https://reviews.apache.org/r/11530/#comment44039> Please use full sentences starting with caps and ending with period. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java <https://reviews.apache.org/r/11530/#comment44038> Can you confirm this expression will be constant-folded by the compiler? Otherwise this should be evaluated by hand in advance. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java <https://reviews.apache.org/r/11530/#comment44040> If you know in what cases ms can be negative, can you add that to the comment? That seems unusual. In think this because if the time is negative (before the epoch) you could get negative nanos. So you want to convert before creating the timestamp. Is that right? Please elaborate a little in the comment. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java <https://reviews.apache.org/r/11530/#comment44041> Would it be possible to re-use a timestamp that belongs to the class here, rather than calling new? If so, please do that to speed this up. I think you can do what you need with setTime() and setNanos(). Eliminating new() in the inner loop of vector processing tends to speed things up a lot. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java <https://reviews.apache.org/r/11530/#comment44044> Please explain why constant 4 is correct here and what it means. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java <https://reviews.apache.org/r/11530/#comment44049> Nice work! Good, compact code to initialize year boundaries. No use of new() or calls to heavy calendar methods in the inner loop. Awesome! :-) ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java <https://reviews.apache.org/r/11530/#comment44045> can you use minYear and maxYear here instead of literals? ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java <https://reviews.apache.org/r/11530/#comment44047> can you use minYear + 1 and maxYear instead of literals? ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java <https://reviews.apache.org/r/11530/#comment44051> Given that this function is moderately heavy anyway (with the binary search) I think making it virtual will not slow things down much. But if it gets faster we should seriously consider creating a separate evaluate method for VectorUDFYearLong and make this a static function to avoid virtual method call overhead. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java <https://reviews.apache.org/r/11530/#comment44050> Did you consider calculating approximate year using something like approxYear = yearBase + (int)(time / nanosPerYear) - 1; and then linearly search forward to find year boundary? I wonder if that would be faster than binary search. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java <https://reviews.apache.org/r/11530/#comment44063> Overall this is a good set of unit tests! It's quite comprehensive. Thanks. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java <https://reviews.apache.org/r/11530/#comment44054> Can you comment this function to explain how you are using long[] inputs? I think I understand but a comment would help. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java <https://reviews.apache.org/r/11530/#comment44061> if inputs.length == 1 (which I think it often does in your tests), then I % 1 is always 0. So you are always loading up input vectors with all 0. Is there a reason for this? If so, please explain, or if not, consider a wider range of inputs. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java <https://reviews.apache.org/r/11530/#comment44052> what does /*begin-macro*/ mean? ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java <https://reviews.apache.org/r/11530/#comment44053> I'm trying to standardize on using "batch" instead of "vrg" for batches. vrg is used in a lot of places but g stands for group not batch so it is confusing. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java <https://reviews.apache.org/r/11530/#comment44060> here you know !(vrg.cols[in].noNulls || vrg.cols[in].isNull[i] == false) ==> !noNulls and isNull[I] == true But you are checking for the value of isNull[I] for "in" in the assert which is a bit misleading. Maybe test against true in the else branch? - Eric Hanson On May 29, 2013, 11:36 p.m., Gopal V wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11530/ > ----------------------------------------------------------- > > (Updated May 29, 2013, 11:36 p.m.) > > > Review request for hive, Jitendra Pandey and Eric Hanson. > > > Description > ------- > > Timestamp UDFs for vectorized long nanosecond timestamps - all of them > convert timestamp into a corresponding long/int value (year, month, > week-of-year, day-of-month, hour, minute, second, unix-timestamp). > > > This addresses bug HIVE-4608. > https://issues.apache.org/jira/browse/HIVE-4608 > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFDayOfMonthLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFHourLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMinuteLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMonthLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFSecondLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFUnixTimeStampLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java > PRE-CREATION > > ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/11530/diff/ > > > Testing > ------- > > Unit tests included which compare each UDF against its non-vectorized one's > output, with random data and year boundary data (+1,0,-1). > > > Thanks, > > Gopal V > >