[ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842163#comment-13842163
 ] 

Gopal V commented on HIVE-5979:
-------------------------------

(Pasted from an email)

The nano second sql timestamp stuff in Java is horribly broken for usability.

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java#L52

Read my comments there on how it handles -ve timestamps and sub-second timings.

Because of the way integer division works in Java, you can end with rounding 
towards zero - this causes hell with the restriction that setNanos() has to 
always be positive.

On top of that it uses 1 integer and 1 long to store the time always 
(unix-epoch seconds + nanos) - the millisecond fraction is stored in the nanos 
field, so the setNanos() overwrites the millisecond fraction of time always, 
which is why the getNanos() is added to it.

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/sql/Timestamp.java#Timestamp.setTime%28long%29

That makes sense, until you realize that a negative millisecond timing is 
stored as a -1ve second + positive nanosecond time.

So when you mix that with the negative modulo in Java, you end up with a fairly 
ugly kludge which needs to take care of a several edge cases related to the 
java.sql.Timestamp implementation.

> Failure in cast to timestamps.
> ------------------------------
>
>                 Key: HIVE-5979
>                 URL: https://issues.apache.org/jira/browse/HIVE-5979
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>        cast(i as timestamp), cast(b as timestamp),
>        cast(f as string), cast(d as timestamp),
>        cast(bo as timestamp), cast(b * 0 as timestamp),
>        cast(ts as timestamp), cast(s as timestamp),
>        cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_000000, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_000000_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
>         at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>         at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
>         ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 999999999 or < 0
>         at java.sql.Timestamp.setNanos(Timestamp.java:383)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
>         at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
>         ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t                     tinyint                 from deserializer
> si                    smallint                from deserializer
> i                     int                     from deserializer
> b                     bigint                  from deserializer
> f                     float                   from deserializer
> d                     double                  from deserializer
> bo                    boolean                 from deserializer
> s                     string                  from deserializer
> s2                    string                  from deserializer
> ts                    timestamp               from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to