[ 
https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836108#comment-13836108
 ] 

Teddy Choi commented on HIVE-5761:
----------------------------------

I wrote a draft version.

{quote}
DATE shall be implemented within a LongColumnVector. HIVE-4055 represents a 
DATE value by a number of days since epoch. A vectorized DATE representation 
will contain this number and its optional cached parse result. A read operation 
result and a complex date function result, such as date_add and date_sub, will 
have an empty cache. During the first simple date function, such as year, month 
and day, it will cache its parse result. Then following simple functions will 
reuse its cache to avoid repeated parses. Its effect on performance will be 
small, since java.util.Date calculates all fields at once and caches their 
results. The first 32-bit set will represent a number of days since epoch as a 
signed integer. Its range is about from BC 2^31/365-1970 to AD 2^31/365+1970. A 
comparison between vectorized DATE values should consider only their first 
sets. The following 32-bit set will represent its cached parse result; cached 
state (1 bit; 0 for not cached, 1 for cached), era (1 bit; 0 for AD, 1 for BC), 
year (unsigned 21-bit integer), month (unsigned 4-bit integer) and day of month 
(unsigned 5-bit integer). A value without a cache will have only zero bits 
after its first set. A parsed year, month and day of month value will start 
from 1 to represent the exact number. Its range is from BC 2^21 to AD 2^21, 
which is shorter than the first set. If a date is not in the range, its cached 
state will remain false (0). The value 0xFFFFFFFF00000000L shall be reserved 
for future use to indicate data outside the standard range.
{quote}

> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>
>                 Key: HIVE-5761
>                 URL: https://issues.apache.org/jira/browse/HIVE-5761
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Teddy Choi
>
> Add support to allow queries referencing DATE columns and expression results 
> to run efficiently in vectorized mode. This should re-use the code for the 
> the integer/timestamp types to the extent possible and beneficial. Include 
> unit tests and end-to-end tests. Consider re-using or extending existing 
> end-to-end tests for vectorized integer and/or timestamp operations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to