I am +1 on returning NULL for seemingly "invalid" dates/timestamps. Invalid may not be the most appropriate term since the parsing adheres to JDK APIs but indeed the results may seem surprising.
I guess we can merge this PR when all the comments are addressed if nobody raises a concern in the meantime. Best, Stamatis On Tue, Jul 20, 2021 at 5:04 PM Sankar Hariappan < sankar.hariap...@microsoft.com> wrote: > +1 > > Thanks Ashish for the comparison! > > I talked to few Hive users (HDInsight) and they supported returning NULL > for invalid date/timestamp inputs instead of returning incorrect results or > exception. > > Can others pls share your thoughts? > > > > Thanks, > > Sankar > > > > *From:* Ashish Sharma <ashishkumarsharm...@gmail.com> > *Sent:* 20 July 2021 14:02 > *To:* d...@hive.apache.org > *Cc:* sankar.hariap...@microsoft.com.invalid; sank...@apache.org; > user@hive.apache.org; David <dam6...@gmail.com> > *Subject:* Re: [EXTERNAL] Re: Move Date and Timestamp parsing from > ResolverStyle.LENIENT to ResolverStyle.STRICT > > > > Hi all, > > > > I also feel that adding more config doesn't make sense in this as we are > tightening the date and timestamp format. We should decide upon a single > solution even if it break the compatibility. Below the comparison of HIVE > 1.2, HIVE 3.2, MYSQL, PostgreSQL, Oracle > > > > > > *Query* > > *Hive 1.2* > > *Hive 3.2* > > *Mysql* > > *PostgreSQL* > > *ORACLE * > > select cast('2020-20-20' as date); > > NULL > > 2021-08-20 > > NULL > > date/time field value out of range: "2020-20-20" > > not a valid month > > select cast(null as date); > > NULL > > NULL > > NULL > > NULL > > NULL > > select cast('2020-02-31' as date); > > 2020-03-02 > > 2020-03-02 > > NULL > > date/time field value out of range: "2020-02-31" > > date format picture ends before converting entire input string > > select cast('2020/02/20' as date); > > NULL > > NULL > > 2020-02-20 > > 2020-02-20 > > literal does not match format string > > select cast('0000-00-00' as date); > > NULL > > 0002-11-30 > > NULL > > date/time field value out of range: "0000-00-00" > > literal does not match format string > > > > > > From the comparison it is quite clear that date and timestamp formatting > was much tighter in older versions of HIVE. For most of the wrong date > input *NULL *was the standard response instead of Exception. > > > > Also when I went through the code I found that. While doing the Vector > implementation of some of the date related UDF like datediff etc. MySql was > taken as the gold standard > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-15338%3FfocusedCommentId%3D15727553%26page%3Dcom.atlassian.jira.plugin.system.issuetabpanels%253Acomment-tabpanel%23comment-15727553&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537409104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UjkFmuO8PHHkcojm1f%2FPJipguQ1JYMkbl%2F0XzcrvGBg%3D&reserved=0>. > So it make more sense that we should comply with MySql as we already refer > MySql as gold standard and returning NULL as result for wrong dates in cast > is also documented > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fhive%2Flanguagemanual%2Btypes%23LanguageManualTypes-CastingDates&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=69O6Ct8q%2FK8l31D4yK4eF1fQRFjtl3jAiooP3FWJTJ4%3D&reserved=0> > > > > > *So I propose to make NULL as the standard response for all parsing > errors.* > > > > Thanks > > Ashish Sharma > > > > On Tue, Jul 13, 2021 at 9:52 PM Stamatis Zampetakis <zabe...@gmail.com> > wrote: > > Hi all, > > Thanks for pushing this forward Ashish! > > Actually I am not in favor of creating a flag for this. Either we decide > consciously to break backward compatibility in the hope that we are > improving the expected results or we keep the current behavior. > Adding another flag means that we maintain and support two variants that > makes the problem of test coverage brought by David even worse. > > I second David's idea to run some tests over some well adopted DBMS (MySQL, > Oracle, MSSQL, Postgres) to see what they return. > I think Ashish already did some tests over MySQL and MSSQL but personally I > would like to see some more (dates + engines) in order to express > a preference. > We shouldn't forget that since Hive is implemented in Java, having > functions that are inline with the Java APIs is not such a bad idea. > The last comment is slightly supportive of the current behavior. > > I am including user@ list in the discussion since we should definitely > consider the feedback of people that are using Hive for real. > > Best, > Stamatis > > On Tue, Jul 13, 2021 at 4:31 PM David <dam6...@gmail.com> wrote: > > > Hello, > > > > Is anyone able to try out a few different vendor RDBMS to see how they > > handle invalid dates, or provide links to documentation, both for invalid > > formatting and things like mm-dd-yyy 12-40-2021? > > > > Thanks. > > > > On Tue, Jul 13, 2021 at 5:14 AM Sankar Hariappan > > <sankar.hariap...@microsoft.com.invalid> wrote: > > > >> I'm supporting this change to return "NULL" for invalid date/timestamp. > >> In the interest of backward compatibility, can we make all these changes > >> under a flag which can be enabled by default? > >> > >> > >> Thanks, > >> Sankar > >> -----Original Message----- > >> From: David <dam6...@gmail.com> > >> Sent: 10 July 2021 07:35 > >> To: dev <d...@hive.apache.org> > >> Cc: sank...@apache.org; Stamatis Zampetakis <zabe...@gmail.com> > >> Subject: [EXTERNAL] Re: Move Date and Timestamp parsing from > >> ResolverStyle.LENIENT to ResolverStyle.STRICT > >> > >> Hello, > >> > >> I too would be in favor of this. It drastically cuts down on the test > >> matrix for Hive if we can clamp down on timestamp formats. With that > being > >> said, I've tried this and it's a big effort. I put it down without > getting > >> consensus or buy-in or engagement on the effort. Please check out my > work > >> here: > >> > >> > >> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FHIVE-24814&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mvYgaG7liJOwUZmMvgwlo%2B1HvcUsrnzXA3Ltfz5yEYE%3D&reserved=0 > >> > >> > >> On Fri, Jul 9, 2021, 9:49 PM Ashish Sharma < > ashishkumarsharm...@gmail.com > >> > > >> wrote: > >> > >> > Hi, > >> > > >> > When casting incorrect date or timestamp literals to DATE or TIMESTAMP > >> > data type hive returns wrong values > >> > > >> > hive> select cast('2020-20-20' as date); > >> > > >> > OK > >> > > >> > 2021-08-20 > >> > > >> > Time taken: 4.436 seconds, Fetched: 1 row(s) > >> > > >> > > >> > I have created a solution draft. Please review the draft and provide > >> > your valuable feedback on the same. > >> > > >> > > >> > > >> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs > >> > .google.com > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgoogle.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=buVSLhJVTluXbYd0iy491oHq2LU2e%2FfKfKYRks1co30%3D&reserved=0> > %2Fdocument%2Fd%2F1YTTPlNq3qyzlKfYVkSl3EFhVQ6-wa9WFRdkdIeCo > >> > c1Y%2Fedit%3Fusp%3Dsharing&data=04%7C01%7CSankar.Hariappan%40micro > >> > soft.com > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsoft.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537429016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=no6XEmDCYcHNdjQduJdynmsEJpLww56lKSEze4LZ8Qo%3D&reserved=0> > %7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7c > >> > d011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > >> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000& > >> > sdata=iihK9wJC%2B1uPktHSE9BpXADvbal1UT7vZ3rwigkgkIY%3D&reserved=0 > >> > > >> > > >> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu > >> > es.apache.org > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fes.apache.org%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537429016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W4mMKQSceHXb1NFvkYwtBvy02B81WoX9mAyPHd0F77s%3D&reserved=0> > %2Fjira%2Fbrowse%2FHIVE-25306&data=04%7C01%7CSankar.H > >> > ariappan%40microsoft.com > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2F40microsoft.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537438972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a8wpFdw26oKHEDLE1XqTGVBCIgkHs%2Fj0r9BMnMkszOA%3D&reserved=0> > %7Cd47432b9d7654d66a46908d943472338%7C72f988bf > >> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFp > >> > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn > >> > 0%3D%7C3000&sdata=nW%2Bw%2B0AYn%2BbvOqRLrXghfH0FG%2B1cQW%2BzdrpT%2 > >> > B9R%2B6rA%3D&reserved=0 > >> > > >> > > >> > Thank you > >> > > >> > Ashish Sharma > >> > > >> > > > >