[ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833368#comment-13833368 ]
Eric Hanson commented on HIVE-5761: ----------------------------------- Teddy, Yes, this would be great! Before you start coding, please write a 1 page or so design document to talk about how Date will be represented and how you will operate on it. Please edit it in to the HIVE-4160 design doc in MS Word format and post the revised version as a draft. Then we can review that. One thing to consider is to store a date as an integer number of days in a LongColumnVector, e.g. 0000-01-01 is 1, 0000-01-02 is 2 etc. up to 9999-12-31 being some large integer. But before we conclude this is the way to go, we need to think about how operations will be performed on date data elements represented as longs. An alternative representation would be to embed YYYY, MM, DD as fields inside a long. E.g. put DD in byte 0, MM in byte 1, and YYYY as a 16 bit int in bytes 2 and 3. Then comparisons will work on these ints directly. And you can grab a YYYY, MM or DD field out of them with a simple shift and mask. You would not need to consult a calendar. Obviously we want to avoid using an internal Java Date type as much as possible to get all the standard benefits of vectorization. In your design, please think about these things: How to get data into column vectors from ORC in the vectorized InputFormat (does ORC support the DATE type? How is it stored?) How to perform basic operations (comparisons, simple get() functions etc) How to perform more complex operations (DateDiff, DateAdd etc.) How to output a date at the top of the vectorized plan. Hopefully you can make all of them fast. If some complex ones need to use Java Date for calendar management etc., that may be fine. This will need to be broken into several patches. The first one might just be to be able to read date data in vectors from ORC. The next one might be to output date data in an end-to-end query like "select date_column from table" in vectorized mode. Then you can implement all the comparisons that just piggy back on LongColumnVector VectorExpression operations that already have been build. Then you can do additional patches for different groups of related operations. Eric > Implement vectorized support for the DATE data type > --------------------------------------------------- > > Key: HIVE-5761 > URL: https://issues.apache.org/jira/browse/HIVE-5761 > Project: Hive > Issue Type: Sub-task > Reporter: Eric Hanson > > Add support to allow queries referencing DATE columns and expression results > to run efficiently in vectorized mode. This should re-use the code for the > the integer/timestamp types to the extent possible and beneficial. Include > unit tests and end-to-end tests. Consider re-using or extending existing > end-to-end tests for vectorized integer and/or timestamp operations. -- This message was sent by Atlassian JIRA (v6.1#6144)