Hi, 1) As mentioned earlier in the jira and other forums based on the standard Hive shall throw an exception in case of casting a non-date string to date, including the case when the input string doesn't conform to the default date format. (I haven't found any recommendation for the default date format.) However, this would be a big behavior change and AFAIK Hive never worked this way. 2) The general behavior of Hive, when a cast fails, is returning null, so I understand the motivation to restore the behavior before it was HIVE-27586. 3) On the other hand, I guess the motivation behind HIVE-27586 was to make string-to-date casting more flexible since the source data can be in various formats.
I don't have a strong opinion about this but if we want to change the current behavior I'm leaning towards 2) Regards, Krisztian https://issues.apache.org/jira/browse/HIVE-27586 On Wed, Sep 18, 2024 at 3:53 PM Okumin <m...@okumin.com> wrote: > Hi, > > I put supplementary information in the ticket, such as examples of > Hive 2, Hive 3, Spark, Trino, or my research of the ANSI standard. > > Regards, > Okumin > > On Wed, Sep 18, 2024 at 7:17 PM Attila Turoczy > <aturo...@cloudera.com.invalid> wrote: > > > > Hi Folks, > > > > Let me share my opinion about this change and request. To have an option > for parse data from string is a very valuable feature especially in OLAP, > because many of the input data many times come from different formats. It > is easy and makes sense that the date follows the ISO format. But real life > is different. Especially on data lakes the sources could come from many > formats and the ETL job parses excel, json, csv etc to create tables, > insert rows. For those countries like Hungary where the ISO date format is > default everywhere the HIVE-27586 would not cause any issue. However, in > regions such as the US and UK, where date formats like mm/dd/yyyy and > dd/mm/yyyy are common (e.g., in Excel files), this could present > challenges. In these cases, users may encounter problems. In my mind > especially for these formats it has to be null or as an oracle throws an > error. If users fail to validate the output, this could lead to data > correctness issues or, worse, potential data loss if source files are > deleted. Yes, Hive users do not like the behavior change, but one type (one > of the most popular formats in the US) could lead to more problems. > > > > -Attila > > > > On Wed, Sep 18, 2024 at 9:43 AM Ayush Saxena <ayush...@gmail.com> wrote: > >> > >> Since this is landing in Spam for some reason for most people, I am > >> explicitly forwarding it rather than replying, so as people can get > >> it. > >> > >> I will vote for (2) considering it was returning `null` earlier as > >> well & the new behaviour came up recently, it would be a behaviour > >> change for folks migrating from earlier versions of hive to 4.x & the > >> new behaviour ain't that fancy either... > >> > >> -Ayush > >> > >> ---------- Forwarded message --------- > >> From: Zoltán Rátkai <zrat...@cloudera.com> > >> Date: Wed, 18 Sept 2024 at 12:55 > >> Subject: Re: IMPORTANT: Hive date parsing issue, input needed > >> To: <user@hive.apache.org> > >> Cc: dev <d...@hive.apache.org> > >> > >> > >> Hi Ayush, > >> > >> thanks for your thoughts! > >> > >> My answers: > >> > >> - On the official Hive page about types this is not documented at all: > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82706456#LanguageManualTypes-date > >> > >> "Dates: DATE values describe a particular year/month/day, in the form > >> YYYY-MM-DD. For example, DATE '2013-01-01'. Date types do not have > >> a time of day component. The range of values supported for the Date > >> type is 0000-01-01 to 9999-12-31, dependent on support by the > >> primitive Java Date type." > >> > >> - Since the beginning of Hive it worked like returning null, this was > >> introduced one year ago as a side effect of this: HIVE-27586 "Parse > >> dates from strings ignoring trailing (potentialy) invalid chars", so > >> the goal of this ticket was to be able to handle string to date > >> parsing like : "2024-09-18 GARBAGE" and parse it to 2024-09-18 as > >> date. > >> So it is not a long standing behaviour. > >> > >> I made a comparison of other DB (Postre, MySQL, ORACLE) and none of > >> them handle it like Hive now. Please check it here: > >> https://issues.apache.org/jira/browse/HIVE-28483 > >> I agree with you to align with those. This is why I asked this question. > >> > >> Thanks and regards, > >> > >> Zoltan Ratkai > >> > >> > >> On Wed, Sep 18, 2024 at 8:58 AM Ayush Saxena <ayush...@gmail.com> > wrote: > >> > > >> > ++ Adding the dev mailing list, as this might be relevant to them as > well. > >> > > >> > I have a few additional questions: > >> > > >> > * Has this behavior been present since the inception of Hive, or did > >> > it start occurring more recently? If this has been the behavior for > >> > over a decade, it might be best to leave it unchanged. However, if > >> > this is a recent development, we should aim to restore the original > >> > behavior. In my opinion, any change that alters long-standing behavior > >> > could be considered incompatible unless it was introduced to prevent > >> > data loss or address a security issue. > >> > * How do other engines (e.g., Impala, Spark) handle similar scenarios, > >> > and how do other databases like MySQL and PostgreSQL behave in this > >> > regard? In the past, we've typically aimed to align with these > >> > systems, not necessarily exactly, but closely enough—especially in > >> > cases where there's ambiguity or conflict. > >> > > >> > > >> > -Ayush > >> > > >> > > >> > On Wed, 18 Sept 2024 at 12:00, Zoltán Rátkai <zrat...@cloudera.com> > wrote: > >> > > > >> > > Dear Hive User! > >> > > > >> > > In Hive the standard date format is YYYY-MM-DD, like “2024-09-17”. > >> > > So when casting a string which format is different, like DD-MM-YYYY > e.g. “17-09-2024" to date, as a Hive user what would you expect as a result? > >> > > > >> > > > >> > > 1. “0017-09-20” => Year of 17 September 20 > >> > > or > >> > > 2. null > >> > > > >> > > It is possible to give a date pattern CAST(<string-value> AS DATE > FORMAT <date-pattern>)and cast with it like "DD-MM-YYYY". > >> > > Earlier it gave back null, but currently Hive gives back the first, > which I think unexpected as a user and customers complain about that. There > is a debate if it is a bug or feature, so I need your help! > >> > > > >> > > I would like to ask you to reply with a vote on 1 or 2 what you > want as a user to happen! > >> > > > >> > > I really appreciate your input! > >> > > > >> > > Thank you, > >> > > > >> > > Zoltan Ratkai >