Sorry I forgot to add user ML. I also would like to gather some users
feedback on this thing.
Since I didn't get any feedback on this topic before from users.

Best,
Kurt


On Thu, Nov 18, 2021 at 6:33 PM Kurt Young <ykt...@gmail.com> wrote:

> (added user ML to this thread)
>
> HI all,
>
> I would like to raise a different opinion about this change. I agree with
> Ingo that
> we should not just break some existing behavior, and even if we introduce
> an
> option to control the behavior, i would propose to set the default value
> to current
> behavior.
>
> I want to mention one angle to assess whether we should change it or not,
> which
> is "what could users benefit from the changes". To me, it looks like:
>
> * new users: happy about the behavior
> * existing users: suffer from the change, it either cause them to modify
> the SQL or
> got a call in late night reporting his online job got crashed and couldn't
> be able to
> restart.
>
> I would like to quote another breaking change we did when we adjust the
> time-related
> function in FLIP-162 [1]. In that case, both new users and existing users
> are suffered
> from *incorrectly* implemented time function behavior, and we saw a lots
> of feedbacks and
> complains from various channels. After we fixed that, we never saw related
> problems again.
>
> Back to this topic, do we ever seen a user complain about current CAST
> behavior? Form my
> side, no.
>
> To summarize:
>
> +1 to introduce TRY_CAST to better prepare for the future.
> -1 to modify the default behavior.
> +0 to introduce a config option, but with the default value to existing
> behavior. it's +0 because it
> seems not necessary if i'm -1 to change the default behavior and also
> don't see an urgent to modify.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
>
> Best,
> Kurt
>
>
> On Thu, Nov 18, 2021 at 4:26 PM Ingo Bürk <i...@ververica.com> wrote:
>
>> Hi,
>>
>> first of all, thanks for the summary of both sides, and for bringing up
>> the
>> discussion on this.
>> I think it is obvious that this is not something we can just "break", so
>> the config option seems mandatory to me.
>>
>> Overall I agree with Martijn and Till that throwing errors is the more
>> expected behavior. I mostly think this is valuable default behavior
>> because
>> it allows developers to find mistakes early and diagnose them much easier
>> compare to having to "work backwards" and figure out that it is the CAST
>> that failed. It also means that pipelines using TRY_CAST are
>> self-documenting because using that can signal "we might receive broken
>> data here".
>>
>>
>> Best
>> Ingo
>>
>> On Thu, Nov 18, 2021 at 9:11 AM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>> > Hi everyone,
>> >
>> > personally I would also prefer the system telling me that something is
>> > wrong instead of silently ignoring records. If there is a TRY_CAST
>> function
>> > that has the old behaviour, people can still get the old behaviour. For
>> > backwards compatibility reasons it is a good idea to introduce a switch
>> to
>> > get back the old behaviour. By default we could set it to the new
>> > behaviour, though. Of course, we should explicitly document this new
>> > behaviour so that people are aware of it before running their jobs for
>> days
>> > and then encountering an invalid input.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Nov 18, 2021 at 9:02 AM Martijn Visser <mart...@ververica.com>
>> > wrote:
>> >
>> > > Hi Caizhi,
>> > >
>> > > Thanks for bringing this up for discussion. I think the important
>> part is
>> > > what do developers expect as the default behaviour of a CAST function
>> > when
>> > > casting fails. If I look at Postgres [1] or MSSQL [2], the default
>> > > behaviour of a CAST failing would be to return an error, which would
>> be
>> > the
>> > > new behaviour. Returning a value when a CAST fails can lead to users
>> not
>> > > understanding immediately where that value comes from. So, I would be
>> in
>> > > favor of the new behaviour by default, but including a configuration
>> flag
>> > > to maintain the old behaviour to avoid that you need to rewrite all
>> these
>> > > jobs.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > [1] https://www.postgresql.org/docs/current/sql-createcast.html
>> > > [2]
>> > >
>> > >
>> >
>> https://docs.microsoft.com/en-us/sql/t-sql/functions/try-cast-transact-sql?view=sql-server-ver15
>> > >
>> > > On Thu, 18 Nov 2021 at 03:17, Caizhi Weng <tsreape...@gmail.com>
>> wrote:
>> > >
>> > > > Hi devs!
>> > > >
>> > > > We're discussing the behavior of casting functions (including cast,
>> > > > to_timestamp, to_date, etc.) for invalid input in
>> > > > https://issues.apache.org/jira/browse/FLINK-24924. As this topic is
>> > > > crucial
>> > > > to compatibility and usability we'd like to continue discussing this
>> > > > publicly in the mailing list.
>> > > >
>> > > > The main topic is to discuss that shall casting functions return
>> null
>> > > (keep
>> > > > its current behavior) or throw exceptions (introduce a new
>> behavior).
>> > I'm
>> > > > trying to conclude the ideas on both sides. Correct me if I miss
>> > > something.
>> > > >
>> > > > *From the devs who support throwing exceptions (new behavior):*
>> > > >
>> > > > The main concern is that if we silently return null then unexpected
>> > > results
>> > > > or exceptions (mainly NullPointerException) may be produced.
>> However,
>> > it
>> > > > will be hard for users to reason for this because there is no
>> detailed
>> > > > message. If we throw exceptions in the first place, then it's much
>> > easier
>> > > > to catch the exception with nice detailed messages explaining what
>> is
>> > > going
>> > > > wrong. Especially for this case of DATE/TIME/TIMESTAMP it's very
>> > helpful
>> > > to
>> > > > have a detailed error and see where and why the parsing broke.
>> > > >
>> > > > For compatibility concerns, we can provide a TRY_CAST function
>> which is
>> > > > exactly the same as the current CAST function by returning nulls for
>> > > > invalid input.
>> > > >
>> > > > *From the devs who support return null (current behavior):*
>> > > >
>> > > > The main concern is compatibility and usability.
>> > > >
>> > > > On usability: The upstream system may occasionally produce invalid
>> data
>> > > and
>> > > > if we throw exceptions when seeing this it will fail the job again
>> and
>> > > > again even after restart (because the invalid data is always
>> > > > there). Streaming computing is a resident program and users do not
>> want
>> > > it
>> > > > to frequently fail and cannot automatically recover. Most users are
>> > > willing
>> > > > to just skip that record and continue processing. Imagine an online
>> job
>> > > > running for a couple of weeks and suddenly fails due to some
>> unexpected
>> > > > dirty data. What choices do users have to quickly resume the job?
>> > > >
>> > > > On compatibility: There are currently thousands of users and tens of
>> > > > thousands of jobs relying on the current behavior to filter out
>> invalid
>> > > > input. If we change the behavior it will be a disaster for users as
>> > they
>> > > > have to rewrite and check their SQL very carefully.
>> > > >
>> > > >
>> > > > What do you think? We're looking forward to your feedback.
>> > > >
>> > >
>> >
>>
>

Reply via email to