[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676626#comment-13676626 ]
Jason Dere commented on HIVE-3910: ---------------------------------- HIVE-4055 already has a patch with an initial implementation of a DATE type, which has already done quite a bit of the work for DATE support. Took a look at this and I had a few proposed additions to this: 1. Use Joda Time rather than java.sql.Date The existing patch uses java.sql.Date as the underlying data type (based on java.util.Date). Thejas proposed using the Joda Time library as this is supposed to be a better datetime implementation, and is also used by Pig for datetime handling. It does not appear that Joda Time is currently used by Hive and so this would need to be pulled in as a dependent library. 2. Storage of DATE values In the existing patch, DateWritable writes out long value (8 bytes) representing seconds since the Unix epoch. As mentioned in HIVE-3910, since DATE is in days, we could reduce the storage space by instead storing a 4-byte integer value representing days since some epoch (1970? 4713 BC?). The range of dates that we can represent with such an integer representation would be +/- 2 billion days, or 5.8M years. 3. Considerations for Hive vectorization support Talking to some folks who are concerned about Hive vectorization (HIVE-4160), and in the interests of vectorization support they want the date type to be represented as primitive values. They are proposing that DateWritable would hold the integer value (rather than Date value) which will still be usable for comparison operations, which would be the most common operations that would be used on date types (group-by, sorting). If an actual Date value is required, then DateWritable.get() will generate a Date object based on the days-since-epoch integer value. 4. SQL syntax compliance The existing patch creates date values using a DATE() UDF - DATE('2013-01-01). The SQL standard actually has syntax to represent a date literal - DATE '2013-01-01'. The Hive grammar would need to be extended to support the SQL syntax. 5. Operations on DATE types The SQL standard (section 6.14) looks like it just supports DATE operations involving the INTERVAL type: <datetime value expression> ::= <datetime term> | <interval value expression> <plus sign> <datetime term> | <datetime value expression> <plus sign> <interval term> | <datetime value expression> <minus sign> <interval term> There is currently no interval type support in Hive. Support for the interval type will be added as a later item. 6. Compatibility with other types The existing patch allows a lot of implicit conversion to/from other types (numeric, string). It does appear that TIMESTAMP has set a bit of a precedent in allowing a lot of implicit type conversion. However, given the limited operations with other types as described in above from the SQL standard, I would propose limiting the amount of implicit conversion that is allowed. There are UDFs that the user can use to convert DATE into numeric/string values, which can then be used in arithmetic or aggregation functions. > Create a new DATE datatype > -------------------------- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task > Reporter: Namit Jain > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira