Hi all.
I would like to ask if what I am seeing is good or not. We are running Flink as
Kubernetes session cluster and have checkpoints enabled. When I inspect a
checkpoint, I can see only one file: „_metadata“. As I understand it, that is
OK, if the state in question is sufficiently small to fi tinto some limit. If
not, additional files would be created. I would exepct they are referenced in
the _metadata file.
Now, when I peek into the metadata file I can see a lot of segment slike this
one:
aced000570
aced000570/XXX_MIDDLEWARE_TEST.dbo.TestTable:0M{"commit_lsn":"00009292:0000bf80:00d9","change_lsn":"00009292:0000bf80:00d9"}-XXX_MIDDLEWARE_TEST.dbo.TestTable?{"type":"CREATE
…
,"comment":null}SourceReaderState?xfile:/var/opt/flink-state/flink-checkpoints/3f6599027678003866f9413bbc0bb705/chk-23/4f068ed4-d186-439d-b056-80a22f0c4653nSourceReaderState
OPERATOR_STATE_DISTRIBUTION_MODESPLIT_DISTRIBUTEVALUE_SERIALIZERrorg.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer$By
tePrimitiveArraySerializerSnapshot?x^Lstream-splitM{"commit_lsn":"00009292:0000bf80:00d9","change_lsn":"00009292:0000bf80:00d9"}{"change_lsn":"7f"}-XXX_MIDDLEWARE_TEST.dbo.TestTable/XXX_MIDDLEWARE_TEST.dbo.TestTable:0
So, in there is a reference to a
xfile:/var/opt/flink-state/flink-checkpoints/3f6599027678003866f9413bbc0bb705/chk-23/4f068ed4-d186-439d-b056-80a22f0c4653
Well, there is no such file. Is this a correct checkpoint or is it damaged?
Just for context, this is a CDC job reading MS SQL DB and sending CDC records
to Kafka. Our checkpoint paramters are:
execution.checkpointing.num-retained, 3
execution.checkpointing.storage, filesystem
execution.checkpointing.dir, file:///var/opt/flink-state/flink-checkpoints
execution.checkpointing.mode, EXACTLY_ONCE
execution.checkpointing.incremental, true
execution.checkpointing.interval, 300000
execution.checkpointing.timeout, 600000
execution.checkpointing.externalized-checkpoint-retention,
RETAIN_ON_CANCELLATION
execution.checkpointing.min-pause, 120000
Nix.