[ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834489#comment-13834489 ]
Teddy Choi commented on HIVE-5761: ---------------------------------- Eric, I researched the history of Hive date data type. 1. DATE in ORC: HIVE-4055 already implemented it. It uses an integer variable DateWritable#daysSinceEpoch to represent a date. I think there is a hard chance to use the alternative representation, which I prefer. 1. Basic operations: We may need to use java.sql.Date every time. [~thejas] and [~jdere] already suggested JodaTime library, which is significantly faster. But there were negative opinions about additional dependencies in HIVE-3910. 1. Complex operations: Fortunately, they will benefit from DateWritable#daysSinceEpoch representation. 1. Vectorized plan: I'm not sure now. I will run some tests. The key point is, how to improve basic operations performance with DateWritable#daysSinceEpoch. I found that org.joda.time.Chronology does not create objects during repetitive calculations (http://stackoverflow.com/questions/6465330/any-good-high-performance-java-library-that-works-with-timestamp). It gives me an insight, but looks hard to implement. I'll start with a basic implementation with java.sql.Date, then I will find more ways to optimize it. Teddy > Implement vectorized support for the DATE data type > --------------------------------------------------- > > Key: HIVE-5761 > URL: https://issues.apache.org/jira/browse/HIVE-5761 > Project: Hive > Issue Type: Sub-task > Reporter: Eric Hanson > > Add support to allow queries referencing DATE columns and expression results > to run efficiently in vectorized mode. This should re-use the code for the > the integer/timestamp types to the extent possible and beneficial. Include > unit tests and end-to-end tests. Consider re-using or extending existing > end-to-end tests for vectorized integer and/or timestamp operations. -- This message was sent by Atlassian JIRA (v6.1#6144)