[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676626#comment-13676626
 ] 

Jason Dere commented on HIVE-3910:
----------------------------------

HIVE-4055 already has a patch with an initial implementation of a DATE type, 
which has already done quite a bit of the work for DATE support. Took a look at 
this and I had a few proposed additions to this:

1. Use Joda Time rather than java.sql.Date
The existing patch uses java.sql.Date as the underlying data type (based on 
java.util.Date).  Thejas proposed using the Joda Time library as this is 
supposed to be a better datetime implementation, and is also used by Pig for 
datetime handling.  It does not appear that Joda Time is currently used by Hive 
and so this would need to be pulled in as a dependent library.

2. Storage of DATE values
In the existing patch, DateWritable writes out long value (8 bytes) 
representing seconds since the Unix epoch.  As mentioned in HIVE-3910, since 
DATE is in days, we could reduce the storage space by instead storing a 4-byte 
integer value representing days since some epoch (1970? 4713 BC?). The range of 
dates that we can represent with such an integer representation would be +/- 2 
billion days, or 5.8M years.

3. Considerations for Hive vectorization support
Talking to some folks who are concerned about Hive vectorization (HIVE-4160), 
and in the interests of vectorization support they want the date type to be 
represented as primitive values.  They are proposing that DateWritable would 
hold the integer value (rather than Date value) which will still be usable for 
comparison operations, which would be the most common operations that would be 
used on date types (group-by, sorting).  If an actual Date value is required, 
then DateWritable.get() will generate a Date object based on the 
days-since-epoch integer value.

4. SQL syntax compliance
The existing patch creates date values using a DATE() UDF - DATE('2013-01-01). 
The SQL standard actually has syntax to represent a date literal - DATE 
'2013-01-01'.  The Hive grammar would need to be extended to support the SQL 
syntax.

5. Operations on DATE types
The SQL standard (section 6.14) looks like it just supports DATE operations 
involving the INTERVAL type:
        <datetime value expression> ::=
                <datetime term>
              | <interval value expression> <plus sign> <datetime term>
              | <datetime value expression> <plus sign> <interval term>
              | <datetime value expression> <minus sign> <interval term>

There is currently no interval type support in Hive. Support for the interval 
type will be added as a later item.

6. Compatibility with other types
The existing patch allows a lot of implicit conversion to/from other types 
(numeric, string).  It does appear that TIMESTAMP has set a bit of a precedent 
in allowing a lot of implicit type conversion.  However, given the limited 
operations with other types as described in above from the SQL standard, I 
would propose limiting the amount of implicit conversion that is allowed.  
There are UDFs that the user can use to convert DATE into numeric/string 
values, which can then be used in arithmetic or aggregation functions.  
                
> Create a new DATE datatype
> --------------------------
>
>                 Key: HIVE-3910
>                 URL: https://issues.apache.org/jira/browse/HIVE-3910
>             Project: Hive
>          Issue Type: Task
>            Reporter: Namit Jain
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to