Many thanks to Zoltán for starting this thread and driving the
discussion forward. I am happy to see that many people chimed in and
gave feedback about the suggested changes in HIVE-28483; this is a
good indicator of a healthy community.

Since there were some comments/questions about the behavior in
previous Hive versions, I added a new set of tests under HIVE-27586
[1] to give a better overview of the situation. People who are
interested can give a look.

Personally, I am +0 with the changes in HIVE-28483; I like them but
don't know if all users will.

There is already a clear +1 from Okumin and I sense a general positive
sentiment from Attila and Ayush so I will proceed in merging the PR in
the following days unless the tide changes.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-27586


On Thu, Sep 19, 2024 at 8:43 AM Zoltán Rátkai <zrat...@cloudera.com> wrote:
>
> Hi Okumin,
>
> thank you very much to checking them I appreciate it!
> Now we have information on 5 different DB (MySQL, Oracle, Postgre, Spark, 
> Trino) and  3 different versions of Hive about this topic and as I see only 
> Hive 4.0 works like the No. 1 way.
>
>
> Regards,
>
> Zoltan
>
>
> On Wed, Sep 18, 2024 at 3:53 PM Okumin <m...@okumin.com> wrote:
>>
>> Hi,
>>
>> I put supplementary information in the ticket, such as examples of
>> Hive 2, Hive 3, Spark, Trino, or my research of the ANSI standard.
>>
>> Regards,
>> Okumin
>>
>> On Wed, Sep 18, 2024 at 7:17 PM Attila Turoczy
>> <aturo...@cloudera.com.invalid> wrote:
>> >
>> > Hi Folks,
>> >
>> > Let me share my opinion about this change and request. To have an option 
>> > for parse data from string is a very valuable feature especially in OLAP, 
>> > because many of the input data many times come from different formats. It 
>> > is easy and makes sense that the date follows the ISO format. But real 
>> > life is different. Especially on data lakes the sources could come from 
>> > many formats and the ETL job parses excel, json, csv etc to create tables, 
>> > insert rows. For those countries like Hungary where the ISO date format is 
>> > default everywhere the HIVE-27586 would not cause any issue. However, in 
>> > regions such as the US and UK, where date formats like mm/dd/yyyy and 
>> > dd/mm/yyyy are common (e.g., in Excel files), this could present 
>> > challenges. In these cases, users may encounter problems. In my mind 
>> > especially for these formats it has to be null or as an oracle throws an 
>> > error. If users fail to validate the output, this could lead to data 
>> > correctness issues or, worse, potential data loss if source files are 
>> > deleted. Yes, Hive users do not like the behavior change, but one type 
>> > (one of the most popular formats in the US) could lead to more problems.
>> >
>> > -Attila
>> >
>> > On Wed, Sep 18, 2024 at 9:43 AM Ayush Saxena <ayush...@gmail.com> wrote:
>> >>
>> >> Since this is landing in Spam for some reason for most people, I am
>> >> explicitly forwarding it rather than replying, so as people can get
>> >> it.
>> >>
>> >> I will vote for (2) considering it was returning `null` earlier as
>> >> well & the new behaviour came up recently, it would be a behaviour
>> >> change for folks migrating from earlier versions of hive to 4.x & the
>> >> new behaviour ain't that fancy either...
>> >>
>> >> -Ayush
>> >>
>> >> ---------- Forwarded message ---------
>> >> From: Zoltán Rátkai <zrat...@cloudera.com>
>> >> Date: Wed, 18 Sept 2024 at 12:55
>> >> Subject: Re: IMPORTANT: Hive date parsing issue, input needed
>> >> To: <u...@hive.apache.org>
>> >> Cc: dev <dev@hive.apache.org>
>> >>
>> >>
>> >> Hi Ayush,
>> >>
>> >> thanks for your thoughts!
>> >>
>> >> My answers:
>> >>
>> >> - On the official Hive page about types this is not documented at all:
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82706456#LanguageManualTypes-date
>> >>
>> >> "Dates:  DATE values describe a particular year/month/day, in the form
>> >> YYYY-­MM-­DD. For example, DATE '2013-­01-­01'. Date types do not have
>> >> a time of day component. The range of values supported for the Date
>> >> type is 0000-­01-­01 to 9999-­12-­31, dependent on support by the
>> >> primitive Java Date type."
>> >>
>> >> - Since the beginning of Hive it worked like returning null, this was
>> >> introduced one year ago as a side effect of this: HIVE-27586 "Parse
>> >> dates from strings ignoring trailing (potentialy) invalid chars", so
>> >> the goal of this ticket was to be able to handle string to date
>> >> parsing like : "2024-09-18 GARBAGE" and parse it to 2024-09-18 as
>> >> date.
>> >> So it is not a long standing behaviour.
>> >>
>> >> I made a comparison of other DB (Postre, MySQL, ORACLE) and none of
>> >> them handle it like Hive now. Please check it here:
>> >> https://issues.apache.org/jira/browse/HIVE-28483
>> >> I agree with you to align with those. This is why I asked this question.
>> >>
>> >> Thanks and regards,
>> >>
>> >> Zoltan Ratkai
>> >>
>> >>
>> >> On Wed, Sep 18, 2024 at 8:58 AM Ayush Saxena <ayush...@gmail.com> wrote:
>> >> >
>> >> > ++ Adding the dev mailing list, as this might be relevant to them as 
>> >> > well.
>> >> >
>> >> > I have a few additional questions:
>> >> >
>> >> > * Has this behavior been present since the inception of Hive, or did
>> >> > it start occurring more recently? If this has been the behavior for
>> >> > over a decade, it might be best to leave it unchanged. However, if
>> >> > this is a recent development, we should aim to restore the original
>> >> > behavior. In my opinion, any change that alters long-standing behavior
>> >> > could be considered incompatible unless it was introduced to prevent
>> >> > data loss or address a security issue.
>> >> > * How do other engines (e.g., Impala, Spark) handle similar scenarios,
>> >> > and how do other databases like MySQL and PostgreSQL behave in this
>> >> > regard? In the past, we've typically aimed to align with these
>> >> > systems, not necessarily exactly, but closely enough—especially in
>> >> > cases where there's ambiguity or conflict.
>> >> >
>> >> >
>> >> > -Ayush
>> >> >
>> >> >
>> >> > On Wed, 18 Sept 2024 at 12:00, Zoltán Rátkai <zrat...@cloudera.com> 
>> >> > wrote:
>> >> > >
>> >> > > Dear Hive User!
>> >> > >
>> >> > > In Hive the standard date format is YYYY-MM-DD, like “2024-09-17”.
>> >> > > So when casting a string which format is different, like DD-MM-YYYY 
>> >> > > e.g. “17-09-2024" to date, as a Hive user what would you expect as a 
>> >> > > result?
>> >> > >
>> >> > >
>> >> > > 1.  “0017-09-20”  => Year of 17 September 20
>> >> > > or
>> >> > > 2. null
>> >> > >
>> >> > > It is possible to give a date pattern CAST(<string-value> AS DATE 
>> >> > > FORMAT <date-pattern>)and cast with it like "DD-MM-YYYY".
>> >> > > Earlier it gave back null, but currently Hive gives back the first, 
>> >> > > which I think unexpected as a user and customers complain about that. 
>> >> > > There is a debate if it is a bug or feature, so I need your help!
>> >> > >
>> >> > > I would like to ask you to reply with a vote on 1 or 2 what you want 
>> >> > > as a user to happen!
>> >> > >
>> >> > > I really appreciate your input!
>> >> > >
>> >> > > Thank you,
>> >> > >
>> >> > > Zoltan Ratkai

Reply via email to