[ https://issues.apache.org/jira/browse/HIVE-26270?focusedWorklogId=776969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776969 ]
ASF GitHub Bot logged work on HIVE-26270: ----------------------------------------- Author: ASF GitHub Bot Created on: 01/Jun/22 15:49 Start Date: 01/Jun/22 15:49 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request, #3338: URL: https://github.com/apache/hive/pull/3338 ### What changes were proposed in this pull request? 1. Extract legacy conversion derivation logic based on file metadata and configuration into separate method. 2. Use the same logic for determining the conversion in both vectorized and non-vectorized Parquet readers by exploiting the new method. ### Why are the changes needed? 1. Remedy "wrong" results when using the vectorized reader 2. Align behavior between vectorized/non-vectorized code ### Does this PR introduce _any_ user-facing change? Yes, result of the queries may be affected. ### How was this patch tested? `mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=parquet_timestamp_int96_compatibility_hive3_1_3.q` Compare wrong results in https://github.com/apache/hive/commit/5a1512ccf1619d744e65aa1a882326cb9df60dd8 with correct results https://github.com/apache/hive/commit/e38b4ec868043e897ca2cc9da8b40a4742cb4757 Issue Time Tracking ------------------- Worklog Id: (was: 776969) Remaining Estimate: 0h Time Spent: 10m > Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader > ----------------------------------------------------------------------------- > > Key: HIVE-26270 > URL: https://issues.apache.org/jira/browse/HIVE-26270 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Parquet > Reporter: Stamatis Zampetakis > Assignee: Stamatis Zampetakis > Priority: Major > Labels: compatibility, timestamp > Time Spent: 10m > Remaining Estimate: 0h > > Parquet files written in Hive 3.1.x onwards with timezone set to US/Pacific. > {code:sql} > CREATE TABLE employee (eid INT, birth timestamp) STORED AS PARQUET; > INSERT INTO employee VALUES > (1, '1880-01-01 00:00:00'), > (2, '1884-01-01 00:00:00'), > (3, '1990-01-01 00:00:00'); > {code} > Parquet files read with Hive 4.0.0-apha-1 onwards. > +Without vectorization+ results are correct. > {code:sql} > SELECT * FROM employee; > {code} > {noformat} > 1 1880-01-01 00:00:00 > 2 1884-01-01 00:00:00 > 3 1990-01-01 00:00:00 > {noformat} > +With vectorization+ some timestamps are shifted. > {code:sql} > -- Disable fetch task conversion to force vectorization kick in > set hive.fetch.task.conversion=none; > SELECT * FROM employee; > {code} > {noformat} > 1 1879-12-31 23:52:58 > 2 1884-01-01 00:00:00 > 3 1990-01-01 00:00:00 > {noformat} > The problem is the same reported under HIVE-24074. The data were written > using the new Date/Time APIs (java.time) in version Hive 3.1.3 and here they > were read using the old APIs (java.sql). > The difference with HIVE-24074 is that here the problem appears only for > vectorized execution while the non-vectorized reader is working fine so there > is some *inconsistency in the behavior* of vectorized and non vectorized > readers. > Non-vectorized reader works fine cause it derives automatically that it > should use the new JDK APIs to read back the timestamp value. This is > possible in this case cause there are metadata information in the file (i.e., > the presence of {{{}writer.time.zone{}}}) from where it can infer that the > timestamps were written using the new Date/Time APIs. > The inconsistent behavior between vectorized and non-vectorized reader is a > regression caused by HIVE-25104. This JIRA is an attempt to re-align the > behavior between vectorized and non-vectorized readers. > Note that if the file metadata are empty both vectorized and non-vectorized > reader cannot determine which APIs to use for the conversion and in this case > it is necessary the user to set the > {{hive.parquet.timestamp.legacy.conversion.enabled}} explicitly to get back > the correct results. -- This message was sent by Atlassian Jira (v8.20.7#820007)