[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659128#comment-13659128
 ] 

Mikhail Bautin commented on HIVE-4525:
--------------------------------------

I am not quite sure how to solve the backward compatibility issue in the 
"writable" part of {{TimestampWritable}} code ({{write}}/{{readFields}}) by 
switching to a unified nanosecond-timestamp-as-long format. If {{readFields}} 
is presented with eight bytes, would it interpret them as a four-byte int 
followed by a VInt or as a long nanosecond timestamp? Would it attempt to do 
the former and revert to the latter if there are inconsistencies? What if the 
bytes of a long nanosecond timestamp also happen to represent a valid legacy 
(int/VInt) timestamp?

In my patch, I try to maintain backward compatibility as much as possible. If a 
timestamp is in the range that can be represented by the old format, it is 
serialized using the old format. The extended format I've proposed and 
implemented for the full timestamp range builds on top of the existing one and 
can be unambiguously distinguished from the old format by examining serialized 
bytes.
In addition, the included test, {{TestTimestampWritable}}, tests both the old 
and the new (extended format), as well as double/BigDecimal conversion, 
getters/setters/constructors and everything else I could test in 
{{TimestampWritable}}.

I am sure there is a way to handle vector optimizations for timestamps in a 
backward-compatible way, and I don't think this patch would make it much more 
complicated than it already is. However, vectorized computations are a 
performance optimization, while this issue is a correctness fix. Currently, 
timestamps outside of the ~1970-2038 range would be silently corrupted in some 
queries, and this patch successfully fixes that. It is also pretty small and 
immediately available.


                
> Support timestamps earlier than 1970 and later than 2038
> --------------------------------------------------------
>
>                 Key: HIVE-4525
>                 URL: https://issues.apache.org/jira/browse/HIVE-4525
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D10755.1.patch
>
>
> TimestampWritable currently serializes timestamps using the lower 31 bits of 
> an int. This does not allow to store timestamps earlier than 1970 or later 
> than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to