Re: [PR] HIVE-29033: ORC reader should not assume that TimestampColumnVector is in UTC time zone [hive]

via GitHub Tue, 24 Jun 2025 15:51:11 -0700


vrozov commented on code in PR #5868:
URL: https://github.com/apache/hive/pull/5868#discussion_r2165063716



##########
ql/src/java/org/apache/hadoop/hive/ql/io/BatchToRowReader.java:
##########
@@ -518,7 +521,8 @@ public static TimestampWritableV2 
nextTimestamp(ColumnVector vector,
         result = (TimestampWritableV2) previous;
       }
       TimestampColumnVector tcv = (TimestampColumnVector) vector;
-      result.setInternal(tcv.time[row], tcv.nanos[row]);
+      result.set(Timestamp.ofEpochSecond(Math.floorDiv(tcv.time[row], 1000L), 
tcv.nanos[row],
+          tcv.isUTC() ? ZoneOffset.UTC : ZoneId.systemDefault()));

Review Comment:
   > the way you pass the UTC flag is hacky and not scalable for future use. 
updated the diff with all the changes, please take a look
   
   The PR uses `TimestampColumnVector` existing `isUTC` flag. It does not 
introduce any new flags. The bug is that Hive currently ignores that flag 
(incorrectly) and always assumes that it is `true` even when it is `false`. The 
proper long term solution is to use the existing flag and initialize it 
correctly, IMO.  I think that introducing a parallel flag is not the right long 
term solution.
   
   Please also see https://github.com/apache/orc/pull/2300



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29033: ORC reader should not assume that TimestampColumnVector is in UTC time zone [hive]

Reply via email to