Thanks Francesco, The two arguments you posted, further strengthen the need to make it DISABLED by default.
On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani < france...@ververica.com> wrote: > Hi all, > I'm +1 with what everything you said Marios. > > I'm gonna add another argument on top of that: the "legacy-cast-behavior" > has also a broken type inference, leading to incorrect results or further > errors down in the pipeline[1]. For example, take this: > > SELECT COALESCE(CAST('a' AS INT), 0) ... > > With the legacy cast behavior ENABLED, this is going to lead to the wrong > result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value is > inferred as INT NOT NULL, so the planner will drop COALESCE, and will never > return 0. Essentially, CAST will infer the wrong nullability leading to > assigning its result to a NOT NULL type, when its value can effectively be > NULL. > > > You introduce a deprecated flag to help users > using old versions of the software to smoothly transition to the new > version, while the new users experience the new features/behavior, > without the need to set a flag. > > This is IMO the major point on why we should disable the legacy cast > behavior by default. This is even more relevant with 1.15 and the upgrade > story, as per the problem described above, because users will now end up > generating plans with wrong type inference, which will be hard to migrate > in the next flink versions. > > FG > > [1] In case you're wondering why it wasn't fixed, the reason is that fixing > it means creating a breaking change, for details > https://github.com/apache/flink/pull/18611#issuecomment-1028174877 > > > On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <mat...@gmail.com> wrote: > > > Hello all! > > > > I would like to bring back the discussion regarding the > > *table.exec.legacy-cast-behaviour* > > configuration option which we are introducing with Flink *1.15*. This > > option provides the users > > with the flexibility to continue using the old (incorrect, according to > SQL > > standards) behaviour > > of *CAST.* > > > > With Flink *1.15* we have introduced a bunch of fixes, improvements and > new > > casting functionality > > between types, see https://issues.apache.org/jira/browse/FLINK-24403, > and > > some of them are > > guarded behind the legacy behaviour option: > > > > - Trimming and padding when casting to CHAR/VARCHAR types to respect > the > > specified length > > - Changes for casting various types to CHAR/VARCHAR/STRING > > - Runtime errors for CAST no longer emit *null *as result but > exceptions > > are thrown with a meaningful message for the cause, that fail the > > pipeline. *TRY_CAST > > *is introduced instead, which emits *null* results instead of throwing > > exceptions. > > > > Those changes become active if users set the > > *table.exec.legacy-cast-behaviour > > *option to *DISABLED*, otherwise > > they will continue to experience the old, *erroneous*, behaviour of > *CAST*. > > > > Currently, we have set the *table.exec.legacy-cast-behaviour *option > > to be *ENABLED > > *by default, so if users want > > to get the new correct behaviour, they are required to set explicitly the > > option to *DISABLED*. Moreover, the option > > itself is marked as deprecated, since the plan is to be removed in the > > future, so that the old, erroneous behaviour > > won't be an option, and the *TRY_CAST* would be the way to go if users > > don't want to have errors and failed pipelines, > > but have *null*s emitted in case of runtime errors when casting. > > > > I would like to start a discussion and maybe ask for voting, so that we > set > > the *table.exec.legacy-cast-behaviour* option > > to *DISABLED *by default, so that users that want to keep their old > > pipelines working the same way, without changing their > > SQL/TableAPI code, would need to explicitly set it to *ENABLED.* > > > > My main argument for changing the default value for the option, is that > the > > *DISABLED* value is the one that enables > > the *correct* behaviour for CAST which should be the default for all new > > users. This way, new FLINK users, or users which > > build new pipelines, from now on would get the correct behaviour by > default > > without the need of changing some flag in their > > configuration. It feels weird to me, especially for people very familiar > > with standard SQL, to be obliged to set some config > > flag, to be able to get the correct behaviour for CAST. On top, users > that > > won't read about this option in our docs, will, > > "blindly", experience the old incorrect behaviour for their new > pipelines, > > and issues that could cause the CAST to fail > > will remain hidden from them, since *nulls* would be emitted. IMHO, this > > last part is also very important during the development > > stages of an application/pipeline. Normally, developers would want to see > > all possible errors/scenarios during development > > stages , in order to build a robust production system. If errors, are > > hidden then, they can easily end up with those errors > > in the production system which would be even harder to discover and > debug, > > since no exception will ever be thrown. > > Imagine that there is a CAST which generates nulls because of runtime > > errors, and its result is used in an aggregate function: > > > > SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY .... > > > > If nulls are emitted by the CAST (let's say because some records have a > > quoted string value for col1, then simply the AVG result > > would be wrong and in would be extremely difficult to realise and fix the > > issue by simply wrapping col1 first with, for example, > > a REGEXP_REPLACE, to get rid of the quotes. > > > > My second argument is that, since we have marked > > *table.exec.legacy-cast-behaviour* as deprecated, and we want to > completely > > remove it in the future, if the default value is *DISABLED*, when it's > > removed we also make a breaking change, by changing the > > default behaviour for all users, which is against the common software > > practices. You introduce a deprecated flag to help users > > using old versions of the software to smoothly transition to the new > > version, while the new users experience the new features/behaviour, > > without the need to set a flag. So, when in the future this flag is > > completely removed, for those "new" users it would be completely > > transparent. If we continue with having the default value *ENABLED*, > those > > new user would experience an "unnatural" breaking change > > when this option is completely removed. > > > > Best, > > Marios > > > -- Marios