Hi Ashish, Sankar,

I am not sure if you both refer to the same problem.

As far as it concerns reading and writing to Parquet/Avro files the
compatibility issues should be resolved as part of HIVE-25104 [1], and
HIVE-25219 [2].
If I recall correctly we added some config properties to ease migration.

Regarding the UNIX_TIMESTAMP function indeed I remember seeing many JIRA
cases reporting problems. Let's find the relation with HIVE-25576 [3] and
try to address them.
We could opt for a new property but let's continue the discussion in the
respective JIRA case. People who have an opinion about the topic can jump
in there.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25104
[2] https://issues.apache.org/jira/browse/HIVE-25219
[3] https://issues.apache.org/jira/browse/HIVE-25576


On Thu, Sep 30, 2021 at 3:29 PM Sankar Hariappan <
sankar.hariap...@microsoft.com> wrote:

> Hi @Stamatis Zampetakis <zabe...@gmail.com>, @David <dam6...@gmail.com>,
>
>
>
> Our current implementation using DateTimeFormatter is not backward
> compatible and it leads to migration issues.
>
> One of our customer who have this use-case where we don’t have a better
> options to migrate.
>
>
>
> *Hive 1.2/Spark 2.4 (Shared metastore):*
>
> Set VM time zone to Asia/Bangkok.
>
> INSERT values (“1400-01-01 00:00:00”) into parquet_table; // Here, parquet
> writer converts the data into UTC (- 07:00:00) and stored it.
>
>
>
> *Migrate to Hive 3.x/Spark 3.x (Shared metastore)::*
>
> Set VM time zone to Asia/Bangkok.
>
> SELECT ts from parquet_table; // Hive returns different value whereas
> Spark (spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00
>
>
>
> It is not easy to change thousands of Hive scripts to handle this
> difference and it adds to migration cost.
>
> I think, it is necessary to enable backward compatibility for smooth
> migration. Pls share your thoughts.
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Ashish Sharma <ashishkumarsharm...@gmail.com>
> *Sent:* 29 September 2021 19:11
> *To:* dev@hive.apache.org; u...@hive.apache.org
> *Cc:* sank...@apache.org
> *Subject:* [EXTERNAL] Raise exception instead of silent change for new
> DateTimeformatter
>
>
>
> *History*
>
> *Hive 1.2* -
>
> VM time zone set to Asia/Bangkok
>
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
> UTC','yyyy-MM-dd HH:mm:ss z'));
>
> *Result* - 1800-01-01 07:00:00
>
> *Implementation details* -
>
> SimpleDateFormat formatter = new SimpleDateFormat(pattern);
> Long unixtime = formatter.parse(textval).getTime() / 1000;
> Date date = new Date(unixtime * 1000L);
>
> https://docs.oracle.com/javase/8/docs/api/java/util/Date.html
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0>
>  .
> In official documentation they have mentioned that "Unfortunately, the API
> for these functions was not amenable to internationalization and The
> corresponding methods in Date are deprecated" . Due to that this is
> producing wrong result
>
> *latest hive* -
>
> set hive.local.time.zone=Asia/Bangkok;
>
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
> UTC','yyyy-MM-dd HH:mm:ss z'));
>
> *Result* - 1800-01-01 06:42:04
>
> *Implementation details* -
>
> DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
> .parseCaseInsensitive()
> .appendPattern(pattern)
> .toFormatter();
>
> ZonedDateTime zonedDateTime =
> ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
> Long dttime = zonedDateTime.toInstant().getEpochSecond();
>
>
>
> *Problem*-
>
> Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which
> is not backward compatible. Causing issues at times for migration to the
> new version. Because the older data written using Hive 1.x or 2.x is not
> compatible with *DateTimeFormatter*.
>
>
>
> *Solution -*
>
> Introduce an config "hive.legacy.timeParserPolicy" with following values -
> *1. EXCEPTION* - compare value of
> both SimpleDateFormat & DateTimeFormatter raise exception if doesn't match
> *2. LEGACY *- use SimpleDateFormat
> *3. CORRECTED *- use DateTimeFormatter
>
> This will help hive user in the following manner -
> 1. Migrate to new version using *LEGACY*
> 2. Find values which are not compatible with the new version - *EXCEPTION*
> 3. Use latest date apis - *CORRECTED*
>
> Note: apache spark also face the same issue
> https://issues.apache.org/jira/browse/SPARK-30668
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-30668&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yYKfiVbSW%2FbfD5V1leqB8cH349Qb6FzYtSn5ClcZrqc%3D&reserved=0>
>
>
>
> Hive jira - https://issues.apache.org/jira/browse/HIVE-25576
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-25576&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136789283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W9ZZoPgtBeA69eF%2FonPtdXdp15PG4%2F1M6rc99G%2BErcc%3D&reserved=0>
>
>
>
>
> Thanks
>
> Ashish Sharma
>

Reply via email to