State is null after recovery from savepoint

Ariel Duek Wed, 19 Apr 2023 00:39:19 -0700

Hi community,


*General setup*We are running flink standalone job on k8s,
We start our job manager and task manager with jar immediately with the
following command:

> /docker-entrypoint.sh standalone-job --host $1 --fromSavepoint
> /opt/flink/shared/savepoints/${SAVEPOINT}/ --allowNonRestoredState $2

(meaning our task/job manager pods are combined with our job/service and
not running separately)

Were using k8s prestop.sh script that saving savepoint before stopping our
job manager:
#!/bin/bash
/opt/flink/bin/flink stop --savepointPath /opt/flink/shared/savepoints/
000000006e6b13320000000000000000
Thus when we deploy our job or remove the job manager pod the savepoint is
created and after start-up our flink service is recovering from the
savepoint,
in case our app is restarted the job will recover from the checkpoint.

Remark - a few days ago we moved to flink 1.17 but the problem existing
also on flink 1.16


*Test scenario*Running traffic incoming from kafka to our service which is
suppose to write a file and removing the task and job manager pods, before
the service is able to complete to write the file, and verify that after
flink recovery the service complete his work thus the file completely
written


*Problem encountered *The test scenario work perfectly when recovering from
checkpoint,
in case of trying to recover from savepoint, the service load from the
savepoint without any error but the state came with null value -
state = transcodingState.value();
if (state == null) {
log.info("unable to pull state, creating new");
state = new TranscodingState();
transcodingState.update(state);
}
and the log above is written meaning the state is null.

We also tried to change the command to recover from savepoint to -s
instead of --fromSavepoint but the result was the same.

Appreciate if someone can assist for come up with any idea

Best Regards
Ariel

State is null after recovery from savepoint

Reply via email to