Indeed, if we manage to use the configuration from *flink-conf.yaml* down the stack, it would be easy for everyone to configure a "system-wide" legacy cast behaviour.
Best regards, Marios On Tue, Mar 1, 2022 at 2:52 PM Timo Walther <twal...@apache.org> wrote: > +1 > > Thanks for bringing up this discussion one more time Marios. > > I strongly support enabling the new behavior in 1.15. It definitely has > implications on existing users, but as Seth said, thinking about the > upcoming upgrade story we need to make sure that at least the core/basic > operations are correct. Otherwise we will have to maintain multiple > versions of functions with broken semantics. > > I since we also try to fix various issues around configuration, maybe it > might still be possible to configure the legacy cast behavior globally > via flink-conf.yaml. This should make the transitioning period easier in > production. > > Regards, > Timo > > Am 28.02.22 um 19:04 schrieb Seth Wiesman: > > +1 > > > > Especially as SQL upgrades are right around the corner, it makes sense to > > get our defaults right. > > > > Seth > > > > On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <mart...@ververica.com> > > wrote: > > > >> +1 for setting this option to disabled by default. I believe failures > >> should be brought forward as soon as possible, so they can be fixed as > fast > >> as possible. It will also be less confusing for new users. Last but not > >> least, I believe the impact on existing users will be minimal (since it > can > >> be changed by changing one flag). > >> > >> Best regards, > >> > >> Martijn > >> > >> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <mat...@gmail.com> wrote: > >> > >>> Thanks Francesco, > >>> > >>> The two arguments you posted, further strengthen the need to make it > >>> DISABLED by default. > >>> > >>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani < > >>> france...@ververica.com> wrote: > >>> > >>>> Hi all, > >>>> I'm +1 with what everything you said Marios. > >>>> > >>>> I'm gonna add another argument on top of that: the > >> "legacy-cast-behavior" > >>>> has also a broken type inference, leading to incorrect results or > >> further > >>>> errors down in the pipeline[1]. For example, take this: > >>>> > >>>> SELECT COALESCE(CAST('a' AS INT), 0) ... > >>>> > >>>> With the legacy cast behavior ENABLED, this is going to lead to the > >> wrong > >>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value > >> is > >>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and will > >>> never > >>>> return 0. Essentially, CAST will infer the wrong nullability leading > to > >>>> assigning its result to a NOT NULL type, when its value can > effectively > >>> be > >>>> NULL. > >>>> > >>>>> You introduce a deprecated flag to help users > >>>> using old versions of the software to smoothly transition to the new > >>>> version, while the new users experience the new features/behavior, > >>>> without the need to set a flag. > >>>> > >>>> This is IMO the major point on why we should disable the legacy cast > >>>> behavior by default. This is even more relevant with 1.15 and the > >> upgrade > >>>> story, as per the problem described above, because users will now end > >> up > >>>> generating plans with wrong type inference, which will be hard to > >> migrate > >>>> in the next flink versions. > >>>> > >>>> FG > >>>> > >>>> [1] In case you're wondering why it wasn't fixed, the reason is that > >>> fixing > >>>> it means creating a breaking change, for details > >>>> https://github.com/apache/flink/pull/18611#issuecomment-1028174877 > >>>> > >>>> > >>>> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <mat...@gmail.com> > >>> wrote: > >>>>> Hello all! > >>>>> > >>>>> I would like to bring back the discussion regarding the > >>>>> *table.exec.legacy-cast-behaviour* > >>>>> configuration option which we are introducing with Flink *1.15*. This > >>>>> option provides the users > >>>>> with the flexibility to continue using the old (incorrect, according > >> to > >>>> SQL > >>>>> standards) behaviour > >>>>> of *CAST.* > >>>>> > >>>>> With Flink *1.15* we have introduced a bunch of fixes, improvements > >> and > >>>> new > >>>>> casting functionality > >>>>> between types, see https://issues.apache.org/jira/browse/FLINK-24403 > >> , > >>>> and > >>>>> some of them are > >>>>> guarded behind the legacy behaviour option: > >>>>> > >>>>> - Trimming and padding when casting to CHAR/VARCHAR types to > >> respect > >>>> the > >>>>> specified length > >>>>> - Changes for casting various types to CHAR/VARCHAR/STRING > >>>>> - Runtime errors for CAST no longer emit *null *as result but > >>>> exceptions > >>>>> are thrown with a meaningful message for the cause, that fail the > >>>>> pipeline. *TRY_CAST > >>>>> *is introduced instead, which emits *null* results instead of > >>> throwing > >>>>> exceptions. > >>>>> > >>>>> Those changes become active if users set the > >>>>> *table.exec.legacy-cast-behaviour > >>>>> *option to *DISABLED*, otherwise > >>>>> they will continue to experience the old, *erroneous*, behaviour of > >>>> *CAST*. > >>>>> Currently, we have set the *table.exec.legacy-cast-behaviour *option > >>>>> to be *ENABLED > >>>>> *by default, so if users want > >>>>> to get the new correct behaviour, they are required to set explicitly > >>> the > >>>>> option to *DISABLED*. Moreover, the option > >>>>> itself is marked as deprecated, since the plan is to be removed in > >> the > >>>>> future, so that the old, erroneous behaviour > >>>>> won't be an option, and the *TRY_CAST* would be the way to go if > >> users > >>>>> don't want to have errors and failed pipelines, > >>>>> but have *null*s emitted in case of runtime errors when casting. > >>>>> > >>>>> I would like to start a discussion and maybe ask for voting, so that > >> we > >>>> set > >>>>> the *table.exec.legacy-cast-behaviour* option > >>>>> to *DISABLED *by default, so that users that want to keep their old > >>>>> pipelines working the same way, without changing their > >>>>> SQL/TableAPI code, would need to explicitly set it to *ENABLED.* > >>>>> > >>>>> My main argument for changing the default value for the option, is > >> that > >>>> the > >>>>> *DISABLED* value is the one that enables > >>>>> the *correct* behaviour for CAST which should be the default for all > >>> new > >>>>> users. This way, new FLINK users, or users which > >>>>> build new pipelines, from now on would get the correct behaviour by > >>>> default > >>>>> without the need of changing some flag in their > >>>>> configuration. It feels weird to me, especially for people very > >>> familiar > >>>>> with standard SQL, to be obliged to set some config > >>>>> flag, to be able to get the correct behaviour for CAST. On top, users > >>>> that > >>>>> won't read about this option in our docs, will, > >>>>> "blindly", experience the old incorrect behaviour for their new > >>>> pipelines, > >>>>> and issues that could cause the CAST to fail > >>>>> will remain hidden from them, since *nulls* would be emitted. IMHO, > >>> this > >>>>> last part is also very important during the development > >>>>> stages of an application/pipeline. Normally, developers would want to > >>> see > >>>>> all possible errors/scenarios during development > >>>>> stages , in order to build a robust production system. If errors, are > >>>>> hidden then, they can easily end up with those errors > >>>>> in the production system which would be even harder to discover and > >>>> debug, > >>>>> since no exception will ever be thrown. > >>>>> Imagine that there is a CAST which generates nulls because of runtime > >>>>> errors, and its result is used in an aggregate function: > >>>>> > >>>>> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY .... > >>>>> > >>>>> If nulls are emitted by the CAST (let's say because some records > >> have a > >>>>> quoted string value for col1, then simply the AVG result > >>>>> would be wrong and in would be extremely difficult to realise and fix > >>> the > >>>>> issue by simply wrapping col1 first with, for example, > >>>>> a REGEXP_REPLACE, to get rid of the quotes. > >>>>> > >>>>> My second argument is that, since we have marked > >>>>> *table.exec.legacy-cast-behaviour* as deprecated, and we want to > >>>> completely > >>>>> remove it in the future, if the default value is *DISABLED*, when > >> it's > >>>>> removed we also make a breaking change, by changing the > >>>>> default behaviour for all users, which is against the common software > >>>>> practices. You introduce a deprecated flag to help users > >>>>> using old versions of the software to smoothly transition to the new > >>>>> version, while the new users experience the new features/behaviour, > >>>>> without the need to set a flag. So, when in the future this flag is > >>>>> completely removed, for those "new" users it would be completely > >>>>> transparent. If we continue with having the default value *ENABLED*, > >>>> those > >>>>> new user would experience an "unnatural" breaking change > >>>>> when this option is completely removed. > >>>>> > >>>>> Best, > >>>>> Marios > >>>>> > >>> > >>> -- > >>> Marios > >>> > > -- Marios