+1 Especially as SQL upgrades are right around the corner, it makes sense to get our defaults right.
Seth On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <mart...@ververica.com> wrote: > +1 for setting this option to disabled by default. I believe failures > should be brought forward as soon as possible, so they can be fixed as fast > as possible. It will also be less confusing for new users. Last but not > least, I believe the impact on existing users will be minimal (since it can > be changed by changing one flag). > > Best regards, > > Martijn > > On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <mat...@gmail.com> wrote: > > > Thanks Francesco, > > > > The two arguments you posted, further strengthen the need to make it > > DISABLED by default. > > > > On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani < > > france...@ververica.com> wrote: > > > > > Hi all, > > > I'm +1 with what everything you said Marios. > > > > > > I'm gonna add another argument on top of that: the > "legacy-cast-behavior" > > > has also a broken type inference, leading to incorrect results or > further > > > errors down in the pipeline[1]. For example, take this: > > > > > > SELECT COALESCE(CAST('a' AS INT), 0) ... > > > > > > With the legacy cast behavior ENABLED, this is going to lead to the > wrong > > > result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value > is > > > inferred as INT NOT NULL, so the planner will drop COALESCE, and will > > never > > > return 0. Essentially, CAST will infer the wrong nullability leading to > > > assigning its result to a NOT NULL type, when its value can effectively > > be > > > NULL. > > > > > > > You introduce a deprecated flag to help users > > > using old versions of the software to smoothly transition to the new > > > version, while the new users experience the new features/behavior, > > > without the need to set a flag. > > > > > > This is IMO the major point on why we should disable the legacy cast > > > behavior by default. This is even more relevant with 1.15 and the > upgrade > > > story, as per the problem described above, because users will now end > up > > > generating plans with wrong type inference, which will be hard to > migrate > > > in the next flink versions. > > > > > > FG > > > > > > [1] In case you're wondering why it wasn't fixed, the reason is that > > fixing > > > it means creating a breaking change, for details > > > https://github.com/apache/flink/pull/18611#issuecomment-1028174877 > > > > > > > > > On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <mat...@gmail.com> > > wrote: > > > > > > > Hello all! > > > > > > > > I would like to bring back the discussion regarding the > > > > *table.exec.legacy-cast-behaviour* > > > > configuration option which we are introducing with Flink *1.15*. This > > > > option provides the users > > > > with the flexibility to continue using the old (incorrect, according > to > > > SQL > > > > standards) behaviour > > > > of *CAST.* > > > > > > > > With Flink *1.15* we have introduced a bunch of fixes, improvements > and > > > new > > > > casting functionality > > > > between types, see https://issues.apache.org/jira/browse/FLINK-24403 > , > > > and > > > > some of them are > > > > guarded behind the legacy behaviour option: > > > > > > > > - Trimming and padding when casting to CHAR/VARCHAR types to > respect > > > the > > > > specified length > > > > - Changes for casting various types to CHAR/VARCHAR/STRING > > > > - Runtime errors for CAST no longer emit *null *as result but > > > exceptions > > > > are thrown with a meaningful message for the cause, that fail the > > > > pipeline. *TRY_CAST > > > > *is introduced instead, which emits *null* results instead of > > throwing > > > > exceptions. > > > > > > > > Those changes become active if users set the > > > > *table.exec.legacy-cast-behaviour > > > > *option to *DISABLED*, otherwise > > > > they will continue to experience the old, *erroneous*, behaviour of > > > *CAST*. > > > > > > > > Currently, we have set the *table.exec.legacy-cast-behaviour *option > > > > to be *ENABLED > > > > *by default, so if users want > > > > to get the new correct behaviour, they are required to set explicitly > > the > > > > option to *DISABLED*. Moreover, the option > > > > itself is marked as deprecated, since the plan is to be removed in > the > > > > future, so that the old, erroneous behaviour > > > > won't be an option, and the *TRY_CAST* would be the way to go if > users > > > > don't want to have errors and failed pipelines, > > > > but have *null*s emitted in case of runtime errors when casting. > > > > > > > > I would like to start a discussion and maybe ask for voting, so that > we > > > set > > > > the *table.exec.legacy-cast-behaviour* option > > > > to *DISABLED *by default, so that users that want to keep their old > > > > pipelines working the same way, without changing their > > > > SQL/TableAPI code, would need to explicitly set it to *ENABLED.* > > > > > > > > My main argument for changing the default value for the option, is > that > > > the > > > > *DISABLED* value is the one that enables > > > > the *correct* behaviour for CAST which should be the default for all > > new > > > > users. This way, new FLINK users, or users which > > > > build new pipelines, from now on would get the correct behaviour by > > > default > > > > without the need of changing some flag in their > > > > configuration. It feels weird to me, especially for people very > > familiar > > > > with standard SQL, to be obliged to set some config > > > > flag, to be able to get the correct behaviour for CAST. On top, users > > > that > > > > won't read about this option in our docs, will, > > > > "blindly", experience the old incorrect behaviour for their new > > > pipelines, > > > > and issues that could cause the CAST to fail > > > > will remain hidden from them, since *nulls* would be emitted. IMHO, > > this > > > > last part is also very important during the development > > > > stages of an application/pipeline. Normally, developers would want to > > see > > > > all possible errors/scenarios during development > > > > stages , in order to build a robust production system. If errors, are > > > > hidden then, they can easily end up with those errors > > > > in the production system which would be even harder to discover and > > > debug, > > > > since no exception will ever be thrown. > > > > Imagine that there is a CAST which generates nulls because of runtime > > > > errors, and its result is used in an aggregate function: > > > > > > > > SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY .... > > > > > > > > If nulls are emitted by the CAST (let's say because some records > have a > > > > quoted string value for col1, then simply the AVG result > > > > would be wrong and in would be extremely difficult to realise and fix > > the > > > > issue by simply wrapping col1 first with, for example, > > > > a REGEXP_REPLACE, to get rid of the quotes. > > > > > > > > My second argument is that, since we have marked > > > > *table.exec.legacy-cast-behaviour* as deprecated, and we want to > > > completely > > > > remove it in the future, if the default value is *DISABLED*, when > it's > > > > removed we also make a breaking change, by changing the > > > > default behaviour for all users, which is against the common software > > > > practices. You introduce a deprecated flag to help users > > > > using old versions of the software to smoothly transition to the new > > > > version, while the new users experience the new features/behaviour, > > > > without the need to set a flag. So, when in the future this flag is > > > > completely removed, for those "new" users it would be completely > > > > transparent. If we continue with having the default value *ENABLED*, > > > those > > > > new user would experience an "unnatural" breaking change > > > > when this option is completely removed. > > > > > > > > Best, > > > > Marios > > > > > > > > > > > > > -- > > Marios > > >