Re: Review Request: Vectorized Timestamp functions for long nanosecond based timestamps

Eric Hanson Thu, 30 May 2013 14:30:04 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11530/#review21203
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44042>

    Sun Java style convention is to have a blank line before all comments. I 
don't personally mind but that's what I've been asked to do :-).



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44039>

    Please use full sentences starting with caps and ending with period.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44038>

    Can you confirm this expression will be constant-folded by the compiler? 
Otherwise this should be evaluated by hand in advance. 



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44040>

    If you know in what cases ms can be negative, can you add that to the 
comment? That seems unusual. 
    
    In think this because if the time is negative (before the epoch) you could 
get negative nanos. So you want to convert before creating the timestamp. Is 
that right? Please elaborate a little in the comment.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44041>

    Would it be possible to re-use a timestamp that belongs to the class here, 
rather than calling new? If so, please do that to speed this up. I think you 
can do what you need with setTime() and setNanos().
    
    Eliminating new() in the inner loop of vector processing tends to speed 
things up a lot.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java
<https://reviews.apache.org/r/11530/#comment44044>

    Please explain why constant 4 is correct here and what it means.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44049>

    Nice work! Good, compact code to initialize year boundaries. No use of 
new() or calls to heavy calendar methods in the inner loop. Awesome! :-)



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44045>

    can you use minYear and maxYear here instead of literals?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44047>

    can you use minYear + 1 and maxYear instead of literals?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44051>

    Given that this function is moderately heavy anyway (with the binary 
search) I think making it virtual will not slow things down much. 
    
    But if it gets faster we should seriously consider creating a separate 
evaluate method for VectorUDFYearLong and make this a static function to avoid 
virtual method call overhead.
    



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44050>

    Did you consider calculating approximate year using something like
    
    approxYear = yearBase + (int)(time / nanosPerYear) - 1; 
    
    and then linearly search forward to find year boundary? I wonder if that 
would be faster than binary search.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44063>

    Overall this is a good set of unit tests! It's quite comprehensive. Thanks.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44054>

    Can you comment this function to explain how you are using long[] inputs? I 
think I understand but a comment would help.
    
    



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44061>

    if inputs.length == 1 (which I think it often does in your tests), then I % 
1 is always 0. So you are always loading up input vectors with all 0. Is there 
a reason for this? If so, please explain, or if not, consider a wider range of 
inputs.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44052>

    what does /*begin-macro*/ mean?



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44053>

    I'm trying to standardize on using "batch" instead of "vrg" for batches. 
vrg is used in a lot of places but g stands for group not batch so it is 
confusing.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44060>

    here you know !(vrg.cols[in].noNulls || vrg.cols[in].isNull[i] == false) 
==> 
    !noNulls and isNull[I] == true
    
    But you are checking for the value of isNull[I] for "in" in the assert 
which is a bit misleading. 
    
    Maybe test against true in the else branch?


- Eric Hanson


On May 29, 2013, 11:36 p.m., Gopal V wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11530/
> -----------------------------------------------------------
> 
> (Updated May 29, 2013, 11:36 p.m.)
> 
> 
> Review request for hive, Jitendra Pandey and Eric Hanson.
> 
> 
> Description
> -------
> 
> Timestamp UDFs for vectorized long nanosecond timestamps - all of them 
> convert timestamp into a corresponding long/int value (year, month, 
> week-of-year, day-of-month, hour, minute, second, unix-timestamp).
> 
> 
> This addresses bug HIVE-4608.
>     https://issues.apache.org/jira/browse/HIVE-4608
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFDayOfMonthLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFHourLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMinuteLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMonthLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFSecondLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFUnixTimeStampLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11530/diff/
> 
> 
> Testing
> -------
> 
> Unit tests included which compare each UDF against its non-vectorized one's 
> output, with random data and year boundary data (+1,0,-1).
> 
> 
> Thanks,
> 
> Gopal V
> 
>

Re: Review Request: Vectorized Timestamp functions for long nanosecond based timestamps

Reply via email to