Hi all,

I also feel that adding more config doesn't make sense in this as we are
tightening the date and timestamp format. We should decide upon a single
solution even if it break the compatibility. Below the comparison of HIVE
1.2, HIVE 3.2, MYSQL, PostgreSQL, Oracle


Query

Hive 1.2

Hive 3.2

Mysql

PostgreSQL

ORACLE

select cast('2020-20-20' as date);

NULL

2021-08-20

NULL

date/time field value out of range: "2020-20-20"

not a valid month

select cast(null as date);

NULL

NULL

NULL

NULL

NULL

select cast('2020-02-31' as date);

2020-03-02

2020-03-02

NULL

date/time field value out of range: "2020-02-31"

date format picture ends before converting entire input string

select cast('2020/02/20' as date);

NULL

NULL

2020-02-20

2020-02-20

literal does not match format string

select cast('0000-00-00' as date);

NULL

0002-11-30

NULL

date/time field value out of range: "0000-00-00"

literal does not match format string


>From the comparison it is quite clear that date and timestamp formatting
was much tighter in older versions of HIVE. For most of the wrong date
input *NULL *was the standard response instead of Exception.

Also when I went through the code I found that. While doing the Vector
implementation of some of the date related UDF like datediff etc. MySql was
taken as the gold standard
<https://issues.apache.org/jira/browse/HIVE-15338?focusedCommentId=15727553&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15727553>.
So it make more sense that  we should comply with MySql as we already refer
MySql as gold standard and returning NULL as result for wrong dates in cast
is also documented
<https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-CastingDates>


*So I propose to make NULL as the standard response for all parsing errors.*

Thanks
Ashish Sharma

On Tue, Jul 13, 2021 at 9:52 PM Stamatis Zampetakis <zabe...@gmail.com>
wrote:

> Hi all,
>
> Thanks for pushing this forward Ashish!
>
> Actually I am not in favor of creating a flag for this. Either we decide
> consciously to break backward compatibility in the hope that we are
> improving the expected results or we keep the current behavior.
> Adding another flag means that we maintain and support two variants that
> makes the problem of test coverage brought by David even worse.
>
> I second David's idea to run some tests over some well adopted DBMS (MySQL,
> Oracle, MSSQL, Postgres) to see what they return.
> I think Ashish already did some tests over MySQL and MSSQL but personally I
> would like to see some more (dates + engines) in order to express
> a preference.
> We shouldn't forget that since Hive is implemented in Java, having
> functions that are inline with the Java APIs is not such a bad idea.
> The last comment is slightly supportive of the current behavior.
>
> I am including user@ list in the discussion since we should definitely
> consider the feedback of people that are using Hive for real.
>
> Best,
> Stamatis
>
> On Tue, Jul 13, 2021 at 4:31 PM David <dam6...@gmail.com> wrote:
>
> > Hello,
> >
> > Is anyone able to try out a few different vendor RDBMS to see how they
> > handle invalid dates, or provide links to documentation, both for invalid
> > formatting and things like mm-dd-yyy 12-40-2021?
> >
> > Thanks.
> >
> > On Tue, Jul 13, 2021 at 5:14 AM Sankar Hariappan
> > <sankar.hariap...@microsoft.com.invalid> wrote:
> >
> >> I'm supporting this change to return "NULL" for invalid date/timestamp.
> >> In the interest of backward compatibility, can we make all these changes
> >> under a flag which can be enabled by default?
> >>
> >>
> >> Thanks,
> >> Sankar
> >> -----Original Message-----
> >> From: David <dam6...@gmail.com>
> >> Sent: 10 July 2021 07:35
> >> To: dev <d...@hive.apache.org>
> >> Cc: sank...@apache.org; Stamatis Zampetakis <zabe...@gmail.com>
> >> Subject: [EXTERNAL] Re: Move Date and Timestamp parsing from
> >> ResolverStyle.LENIENT to ResolverStyle.STRICT
> >>
> >> Hello,
> >>
> >> I too would be in favor of this. It drastically cuts down on the test
> >> matrix for Hive if we can clamp down on timestamp formats. With that
> being
> >> said, I've tried this and it's a big effort.  I put it down without
> getting
> >> consensus or buy-in or engagement on the effort. Please check out my
> work
> >> here:
> >>
> >>
> >>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FHIVE-24814&amp;data=04%7C01%7CSankar.Hariappan%40microsoft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=mvYgaG7liJOwUZmMvgwlo%2B1HvcUsrnzXA3Ltfz5yEYE%3D&amp;reserved=0
> >>
> >>
> >> On Fri, Jul 9, 2021, 9:49 PM Ashish Sharma <
> ashishkumarsharm...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > When casting incorrect date or timestamp literals to DATE or TIMESTAMP
> >> > data type hive returns wrong values
> >> >
> >> > hive> select cast('2020-20-20' as date);
> >> >
> >> > OK
> >> >
> >> > 2021-08-20
> >> >
> >> > Time taken: 4.436 seconds, Fetched: 1 row(s)
> >> >
> >> >
> >> > I have created a solution draft. Please review the draft and provide
> >> > your valuable feedback on the same.
> >> >
> >> >
> >> >
> >> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> >> > .google.com
> %2Fdocument%2Fd%2F1YTTPlNq3qyzlKfYVkSl3EFhVQ6-wa9WFRdkdIeCo
> >> > c1Y%2Fedit%3Fusp%3Dsharing&amp;data=04%7C01%7CSankar.Hariappan%40micro
> >> > soft.com
> %7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7c
> >> > d011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> >> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;
> >> > sdata=iihK9wJC%2B1uPktHSE9BpXADvbal1UT7vZ3rwigkgkIY%3D&amp;reserved=0
> >> >
> >> >
> >> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu
> >> > es.apache.org
> %2Fjira%2Fbrowse%2FHIVE-25306&amp;data=04%7C01%7CSankar.H
> >> > ariappan%40microsoft.com
> %7Cd47432b9d7654d66a46908d943472338%7C72f988bf
> >> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFp
> >> > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> >> > 0%3D%7C3000&amp;sdata=nW%2Bw%2B0AYn%2BbvOqRLrXghfH0FG%2B1cQW%2BzdrpT%2
> >> > B9R%2B6rA%3D&amp;reserved=0
> >> >
> >> >
> >> > Thank you
> >> >
> >> > Ashish Sharma
> >> >
> >>
> >
>

Reply via email to