Hi Martijn,

Sorry I didn't see your response! Basically we had a bad event that was
blowing up our python UDF, so we wanted to change the SQL to add a where
clause that filters out the event to mitigate the issue. Our job happens to
be stateless, so we're okay this time, but if we had used state (like
joining two streams or something) we would end up losing data to fix this
bug. Is the only solution to just use the DataStream API?

But my main concern is that when an error occurs, if we haven't savepointed
recently, we're not going to be able to change anything about the job
without losing state. Is the answer just... always savepoint frequently so
you never have this issue? Just a little concerned about small oversights
causing us to have to dump state in our jobs later.

Thanks,

Tim

On Fri, Dec 16, 2022 at 3:13 AM Martijn Visser <martijnvis...@apache.org>
wrote:

> Hi Tim,
>
> If I understand correctly, you need to deploy a new SQL statement in order
> to fix your issue? If so, the problem is that a new SQL statement might
> lead to a different execution plan which can't be restored. See
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/overview/#state-management
> for more details on this topic.
>
> Best regards,
>
> Martijn
>
> On Fri, Dec 16, 2022 at 12:34 AM Timothy Bess <tdbga...@gmail.com> wrote:
>
>> Hi there,
>>
>> We have a pyflink/SQL job that has a bug that we fixed and are trying to
>> deploy. Here's the issue though. The job successfully restores from the
>> checkpoint, but has no recent savepoints. We can't seem to get it to accept
>> our new SQL unless we savepoint/restore, but we can't trigger a savepoint
>> since our bug is crashing the job.
>>
>> How do people generally get around issues like this without losing all
>> Flink state? It seems weird that I'd have to lose my Flink state
>> considering that I can successfully restore the checkpoint. I must be
>> missing something here.
>>
>> Thanks,
>>
>> Tim
>>
>

Reply via email to