[jira] [Work logged] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow

ASF GitHub Bot (Jira) Mon, 08 Feb 2021 08:54:07 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=549701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549701
 ]


ASF GitHub Bot logged work on HIVE-24693:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Feb/21 16:53
            Start Date: 08/Feb/21 16:53
    Worklog Time Spent: 10m 
      Work Description: klcopp commented on pull request #1938:
URL: https://github.com/apache/hive/pull/1938#issuecomment-775286514


   I've seen a couple users come back and ask why 0 shows up as 0001 but they 
seemed satisfied with the explanation that year 0 doesn't exist. I mean they 
don't represent all other users but... since it's not a real year I'm not sure 
we should let users use it.
   
   According to the wiki Hive doesn't support dates/timestamps outside of years 
0001–9999. AFAIK currently Hive accepts negative years (though I'm not sure 
they're displayed/stored correctly?) and auto-converts 0000 to 0001 since 0000 
doesn't exist. I think we should decide what we want and change either the wiki 
or Hive's behavior to not accept pre-0001 and post-9999 dates – don't know how 
feasible this is though.
   
   @jcamachor  I'd be interested in what you think too!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 549701)
    Time Spent: 2h 40m  (was: 2.5h)

> Parquet Timestamp Values Read/Write Very Slow
> ---------------------------------------------
>
>                 Key: HIVE-24693
>                 URL: https://issues.apache.org/jira/browse/HIVE-24693
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23.0000...}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
>     return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow

Reply via email to