+1
Thanks Ashish for the comparison!
I talked to few Hive users (HDInsight) and they supported returning NULL for 
invalid date/timestamp inputs instead of returning incorrect results or 
exception.
Can others pls share your thoughts?

Thanks,
Sankar

From: Ashish Sharma <ashishkumarsharm...@gmail.com>
Sent: 20 July 2021 14:02
To: d...@hive.apache.org
Cc: sankar.hariap...@microsoft.com.invalid; sank...@apache.org; 
user@hive.apache.org; David <dam6...@gmail.com>
Subject: Re: [EXTERNAL] Re: Move Date and Timestamp parsing from 
ResolverStyle.LENIENT to ResolverStyle.STRICT

Hi all,

I also feel that adding more config doesn't make sense in this as we are 
tightening the date and timestamp format. We should decide upon a single 
solution even if it break the compatibility. Below the comparison of HIVE 1.2, 
HIVE 3.2, MYSQL, PostgreSQL, Oracle



Query

Hive 1.2

Hive 3.2

Mysql

PostgreSQL

ORACLE

select cast('2020-20-20' as date);

NULL

2021-08-20

NULL

date/time field value out of range: "2020-20-20"

not a valid month

select cast(null as date);

NULL

NULL

NULL

NULL

NULL

select cast('2020-02-31' as date);

2020-03-02

2020-03-02

NULL

date/time field value out of range: "2020-02-31"

date format picture ends before converting entire input string

select cast('2020/02/20' as date);

NULL

NULL

2020-02-20

2020-02-20

literal does not match format string

select cast('0000-00-00' as date);

NULL

0002-11-30

NULL

date/time field value out of range: "0000-00-00"

literal does not match format string


>From the comparison it is quite clear that date and timestamp formatting was 
>much tighter in older versions of HIVE. For most of the wrong date input NULL 
>was the standard response instead of Exception.

Also when I went through the code I found that. While doing the Vector 
implementation of some of the date related UDF like datediff etc. MySql was 
taken as the gold 
standard<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-15338%3FfocusedCommentId%3D15727553%26page%3Dcom.atlassian.jira.plugin.system.issuetabpanels%253Acomment-tabpanel%23comment-15727553&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537409104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UjkFmuO8PHHkcojm1f%2FPJipguQ1JYMkbl%2F0XzcrvGBg%3D&reserved=0>.
 So it make more sense that  we should comply with MySql as we already refer 
MySql as gold standard and returning NULL as result for wrong dates in cast is 
also 
documented<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fhive%2Flanguagemanual%2Btypes%23LanguageManualTypes-CastingDates&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=69O6Ct8q%2FK8l31D4yK4eF1fQRFjtl3jAiooP3FWJTJ4%3D&reserved=0>

So I propose to make NULL as the standard response for all parsing errors.

Thanks
Ashish Sharma

On Tue, Jul 13, 2021 at 9:52 PM Stamatis Zampetakis 
<zabe...@gmail.com<mailto:zabe...@gmail.com>> wrote:
Hi all,

Thanks for pushing this forward Ashish!

Actually I am not in favor of creating a flag for this. Either we decide
consciously to break backward compatibility in the hope that we are
improving the expected results or we keep the current behavior.
Adding another flag means that we maintain and support two variants that
makes the problem of test coverage brought by David even worse.

I second David's idea to run some tests over some well adopted DBMS (MySQL,
Oracle, MSSQL, Postgres) to see what they return.
I think Ashish already did some tests over MySQL and MSSQL but personally I
would like to see some more (dates + engines) in order to express
a preference.
We shouldn't forget that since Hive is implemented in Java, having
functions that are inline with the Java APIs is not such a bad idea.
The last comment is slightly supportive of the current behavior.

I am including user@ list in the discussion since we should definitely
consider the feedback of people that are using Hive for real.

Best,
Stamatis

On Tue, Jul 13, 2021 at 4:31 PM David 
<dam6...@gmail.com<mailto:dam6...@gmail.com>> wrote:

> Hello,
>
> Is anyone able to try out a few different vendor RDBMS to see how they
> handle invalid dates, or provide links to documentation, both for invalid
> formatting and things like mm-dd-yyy 12-40-2021?
>
> Thanks.
>
> On Tue, Jul 13, 2021 at 5:14 AM Sankar Hariappan
> <sankar.hariap...@microsoft.com.invalid<mailto:sankar.hariap...@microsoft.com.invalid>>
>  wrote:
>
>> I'm supporting this change to return "NULL" for invalid date/timestamp.
>> In the interest of backward compatibility, can we make all these changes
>> under a flag which can be enabled by default?
>>
>>
>> Thanks,
>> Sankar
>> -----Original Message-----
>> From: David <dam6...@gmail.com<mailto:dam6...@gmail.com>>
>> Sent: 10 July 2021 07:35
>> To: dev <d...@hive.apache.org<mailto:d...@hive.apache.org>>
>> Cc: sank...@apache.org<mailto:sank...@apache.org>; Stamatis Zampetakis 
>> <zabe...@gmail.com<mailto:zabe...@gmail.com>>
>> Subject: [EXTERNAL] Re: Move Date and Timestamp parsing from
>> ResolverStyle.LENIENT to ResolverStyle.STRICT
>>
>> Hello,
>>
>> I too would be in favor of this. It drastically cuts down on the test
>> matrix for Hive if we can clamp down on timestamp formats. With that being
>> said, I've tried this and it's a big effort.  I put it down without getting
>> consensus or buy-in or engagement on the effort. Please check out my work
>> here:
>>
>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FHIVE-24814&amp;data=04%7C01%7CSankar.Hariappan%40microsoft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=mvYgaG7liJOwUZmMvgwlo%2B1HvcUsrnzXA3Ltfz5yEYE%3D&amp;reserved=0
>>
>>
>> On Fri, Jul 9, 2021, 9:49 PM Ashish Sharma 
>> <ashishkumarsharm...@gmail.com<mailto:ashishkumarsharm...@gmail.com>
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > When casting incorrect date or timestamp literals to DATE or TIMESTAMP
>> > data type hive returns wrong values
>> >
>> > hive> select cast('2020-20-20' as date);
>> >
>> > OK
>> >
>> > 2021-08-20
>> >
>> > Time taken: 4.436 seconds, Fetched: 1 row(s)
>> >
>> >
>> > I have created a solution draft. Please review the draft and provide
>> > your valuable feedback on the same.
>> >
>> >
>> >
>> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
>> > .google.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgoogle.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=buVSLhJVTluXbYd0iy491oHq2LU2e%2FfKfKYRks1co30%3D&reserved=0>%2Fdocument%2Fd%2F1YTTPlNq3qyzlKfYVkSl3EFhVQ6-wa9WFRdkdIeCo
>> > c1Y%2Fedit%3Fusp%3Dsharing&amp;data=04%7C01%7CSankar.Hariappan%40micro
>> > soft.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsoft.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537429016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=no6XEmDCYcHNdjQduJdynmsEJpLww56lKSEze4LZ8Qo%3D&reserved=0>%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7c
>> > d011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
>> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;
>> > sdata=iihK9wJC%2B1uPktHSE9BpXADvbal1UT7vZ3rwigkgkIY%3D&amp;reserved=0
>> >
>> >
>> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu
>> > es.apache.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fes.apache.org%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537429016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W4mMKQSceHXb1NFvkYwtBvy02B81WoX9mAyPHd0F77s%3D&reserved=0>%2Fjira%2Fbrowse%2FHIVE-25306&amp;data=04%7C01%7CSankar.H
>> > ariappan%40microsoft.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2F40microsoft.com%2F&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537438972%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a8wpFdw26oKHEDLE1XqTGVBCIgkHs%2Fj0r9BMnMkszOA%3D&reserved=0>%7Cd47432b9d7654d66a46908d943472338%7C72f988bf
>> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFp
>> > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
>> > 0%3D%7C3000&amp;sdata=nW%2Bw%2B0AYn%2BbvOqRLrXghfH0FG%2B1cQW%2BzdrpT%2
>> > B9R%2B6rA%3D&amp;reserved=0
>> >
>> >
>> > Thank you
>> >
>> > Ashish Sharma
>> >
>>
>

Reply via email to