Hello all!

I would like to bring back the discussion regarding the
*table.exec.legacy-cast-behaviour*
configuration option which we are introducing with Flink *1.15*. This
option provides the users
with the flexibility to continue using the old (incorrect, according to SQL
standards) behaviour
of *CAST.*

With Flink *1.15* we have introduced a bunch of fixes, improvements and new
casting functionality
between types, see https://issues.apache.org/jira/browse/FLINK-24403, and
some of them are
guarded behind the legacy behaviour option:

   - Trimming and padding when casting to CHAR/VARCHAR types to respect the
   specified length
   - Changes for casting various types to CHAR/VARCHAR/STRING
   - Runtime errors for CAST no longer emit *null *as result but exceptions
   are thrown with a meaningful message for the cause, that fail the
pipeline. *TRY_CAST
   *is introduced instead, which emits *null* results instead of throwing
   exceptions.

Those changes become active if users set the *table.exec.legacy-cast-behaviour
*option to *DISABLED*, otherwise
they will continue to experience the old, *erroneous*, behaviour of *CAST*.

Currently, we have set the *table.exec.legacy-cast-behaviour *option
to be *ENABLED
*by default, so if users want
to get the new correct behaviour, they are required to set explicitly the
option to *DISABLED*. Moreover, the option
itself is marked as deprecated, since the plan is to be removed in the
future, so that the old, erroneous behaviour
won't be an option, and the *TRY_CAST* would be the way to go if users
don't want to have errors and failed pipelines,
but have *null*s emitted in case of runtime errors when casting.

I would like to start a discussion and maybe ask for voting, so that we set
the *table.exec.legacy-cast-behaviour* option
to *DISABLED *by default, so that users that want to keep their old
pipelines working the same way, without changing their
SQL/TableAPI code, would need to explicitly set it to *ENABLED.*

My main argument for changing the default value for the option, is that the
*DISABLED* value is the one that enables
the *correct* behaviour for CAST which should be the default for all new
users. This way, new FLINK users, or users which
build new pipelines, from now on would get the correct behaviour by default
without the need of changing some flag in their
configuration. It feels weird to me, especially for people very familiar
with standard SQL, to be obliged to set some config
flag, to be able to get the correct behaviour for CAST. On top, users that
won't read about this option in our docs, will,
"blindly", experience the old incorrect behaviour for their new pipelines,
and issues that could cause the CAST to fail
will remain hidden from them, since *nulls* would be emitted. IMHO, this
last part is also very important during the development
stages of an application/pipeline. Normally, developers would want to see
all possible errors/scenarios during development
stages , in order to build a robust production system. If errors, are
hidden then, they can easily end up with those errors
in the production system which would be even harder to discover and debug,
since no exception will ever be thrown.
Imagine that there is a CAST which generates nulls because of runtime
errors, and its result is used in an aggregate function:

SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....

If nulls are emitted by the CAST (let's say because some records have a
quoted string value for col1, then simply the AVG result
would be wrong and in would be extremely difficult to realise and fix the
issue by simply wrapping col1 first with, for example,
a REGEXP_REPLACE, to get rid of the quotes.

My second argument is that, since we have marked
*table.exec.legacy-cast-behaviour* as deprecated, and we want to completely
remove it in the future, if the default value is *DISABLED*, when it's
removed we also make a breaking change, by changing the
default behaviour for all users, which is against the common software
practices. You introduce a deprecated flag to help users
using old versions of the software to smoothly transition to the new
version, while the new users experience the new features/behaviour,
without the need to set a flag. So, when in the future this flag is
completely removed, for those "new" users it would be completely
transparent. If we continue with having the default value *ENABLED*, those
new user would experience an "unnatural" breaking change
when this option is completely removed.

Best,
Marios

Reply via email to