Hi Ashish, Sankar, I am not sure if you both refer to the same problem.
As far as it concerns reading and writing to Parquet/Avro files the compatibility issues should be resolved as part of HIVE-25104 [1], and HIVE-25219 [2]. If I recall correctly we added some config properties to ease migration. Regarding the UNIX_TIMESTAMP function indeed I remember seeing many JIRA cases reporting problems. Let's find the relation with HIVE-25576 [3] and try to address them. We could opt for a new property but let's continue the discussion in the respective JIRA case. People who have an opinion about the topic can jump in there. Best, Stamatis [1] https://issues.apache.org/jira/browse/HIVE-25104 [2] https://issues.apache.org/jira/browse/HIVE-25219 [3] https://issues.apache.org/jira/browse/HIVE-25576 On Thu, Sep 30, 2021 at 3:29 PM Sankar Hariappan < sankar.hariap...@microsoft.com> wrote: > Hi @Stamatis Zampetakis <zabe...@gmail.com>, @David <dam6...@gmail.com>, > > > > Our current implementation using DateTimeFormatter is not backward > compatible and it leads to migration issues. > > One of our customer who have this use-case where we don’t have a better > options to migrate. > > > > *Hive 1.2/Spark 2.4 (Shared metastore):* > > Set VM time zone to Asia/Bangkok. > > INSERT values (“1400-01-01 00:00:00”) into parquet_table; // Here, parquet > writer converts the data into UTC (- 07:00:00) and stored it. > > > > *Migrate to Hive 3.x/Spark 3.x (Shared metastore)::* > > Set VM time zone to Asia/Bangkok. > > SELECT ts from parquet_table; // Hive returns different value whereas > Spark (spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00 > > > > It is not easy to change thousands of Hive scripts to handle this > difference and it adds to migration cost. > > I think, it is necessary to enable backward compatibility for smooth > migration. Pls share your thoughts. > > > > Thanks, > > Sankar > > > > *From:* Ashish Sharma <ashishkumarsharm...@gmail.com> > *Sent:* 29 September 2021 19:11 > *To:* dev@hive.apache.org; u...@hive.apache.org > *Cc:* sank...@apache.org > *Subject:* [EXTERNAL] Raise exception instead of silent change for new > DateTimeformatter > > > > *History* > > *Hive 1.2* - > > VM time zone set to Asia/Bangkok > > *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 > UTC','yyyy-MM-dd HH:mm:ss z')); > > *Result* - 1800-01-01 07:00:00 > > *Implementation details* - > > SimpleDateFormat formatter = new SimpleDateFormat(pattern); > Long unixtime = formatter.parse(textval).getTime() / 1000; > Date date = new Date(unixtime * 1000L); > > https://docs.oracle.com/javase/8/docs/api/java/util/Date.html > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0> > . > In official documentation they have mentioned that "Unfortunately, the API > for these functions was not amenable to internationalization and The > corresponding methods in Date are deprecated" . Due to that this is > producing wrong result > > *latest hive* - > > set hive.local.time.zone=Asia/Bangkok; > > *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 > UTC','yyyy-MM-dd HH:mm:ss z')); > > *Result* - 1800-01-01 06:42:04 > > *Implementation details* - > > DateTimeFormatter dtformatter = new DateTimeFormatterBuilder() > .parseCaseInsensitive() > .appendPattern(pattern) > .toFormatter(); > > ZonedDateTime zonedDateTime = > ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone)); > Long dttime = zonedDateTime.toInstant().getEpochSecond(); > > > > *Problem*- > > Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which > is not backward compatible. Causing issues at times for migration to the > new version. Because the older data written using Hive 1.x or 2.x is not > compatible with *DateTimeFormatter*. > > > > *Solution -* > > Introduce an config "hive.legacy.timeParserPolicy" with following values - > *1. EXCEPTION* - compare value of > both SimpleDateFormat & DateTimeFormatter raise exception if doesn't match > *2. LEGACY *- use SimpleDateFormat > *3. CORRECTED *- use DateTimeFormatter > > This will help hive user in the following manner - > 1. Migrate to new version using *LEGACY* > 2. Find values which are not compatible with the new version - *EXCEPTION* > 3. Use latest date apis - *CORRECTED* > > Note: apache spark also face the same issue > https://issues.apache.org/jira/browse/SPARK-30668 > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-30668&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yYKfiVbSW%2FbfD5V1leqB8cH349Qb6FzYtSn5ClcZrqc%3D&reserved=0> > > > > Hive jira - https://issues.apache.org/jira/browse/HIVE-25576 > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-25576&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136789283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W9ZZoPgtBeA69eF%2FonPtdXdp15PG4%2F1M6rc99G%2BErcc%3D&reserved=0> > > > > > Thanks > > Ashish Sharma >