[ https://issues.apache.org/jira/browse/FLINK-37069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927631#comment-17927631 ]
Weijie Guo commented on FLINK-37069: ------------------------------------ Hi [~Zakelly], I have tested this according to the instruction. 1. Checkout and compile flink in commit has: dd4bd434 2. Start a standalone flink cluster 3. Set `execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION` in flink conf 4. Run flink example {code:java} ./bin/flink run ./examples/streaming/StateMachineExample.jar \ --backend forst \ --checkpoint-dir file:///cp \ --incremental-checkpoints true {code} 5. Confirm checkpoint is triggered and completed, cancel this job 6. Restart from the latest cp {code:java} ./bin/flink run -s file:///cp/ac252d10cfd0e70bc1142557f08132f4/chk-8 ./examples/streaming/StateMachineExample.jar \ --backend forst \ --checkpoint-dir file:///cp \ --incremental-checkpoints true {code} But the job failed with the following exception: {code:java} Caused by: java.lang.IllegalArgumentException: Unsupported sharing files strategy for org.apache.flink.state.forst.snapshot.ForStIncrementalSnapshotStrategy : FORWARD at org.apache.flink.state.forst.snapshot.ForStIncrementalSnapshotStrategy.asyncSnapshot(ForStIncrementalSnapshotStrategy.java:146) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.state.forst.snapshot.ForStIncrementalSnapshotStrategy.asyncSnapshot(ForStIncrementalSnapshotStrategy.java:70) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.runtime.state.SnapshotStrategyRunner.snapshot(SnapshotStrategyRunner.java:80) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.state.forst.ForStKeyedStateBackend.snapshot(ForStKeyedStateBackend.java:484) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:281) ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT] {code} > Cross-team verification for "Disaggregated State Management" > ------------------------------------------------------------ > > Key: FLINK-37069 > URL: https://issues.apache.org/jira/browse/FLINK-37069 > Project: Flink > Issue Type: Sub-task > Reporter: Xintong Song > Assignee: Weijie Guo > Priority: Blocker > Fix For: 2.0.0 > > > Instructions: > First of all, please read the related documents briefly (still under review, > will replace with formal links if merged): > * Disaggregated State Management: > [https://github.com/apache/flink/pull/26107/files#diff-bfa19e04bb5c3487c3e9bf514d61c0fa8bb973950fb0ad0e3d4a6898a99b83e3] > * State V2: > [https://github.com/apache/flink/pull/26107/files#diff-5d1147987fecbda329132403c1d92384575be220092995c4be491e12b8c50cc9] > * ForSt State Backend: > [https://github.com/apache/flink/pull/26107/files#diff-b7c52c06f6ed4d5af6f230d11ba23ea051bf4a08c589d98392143f080c468a87] > For the SQL part, verification goes in FLINK-37068, we mainly focus on > Datastream jobs and APIs here. > 1. Make sure you are verifying this on release-2.0 branch, since we have > fixed several bugs since the rc0 package. > 2. Choose one example in `flink-examples-streaming`. Most of the jobs has > been rewritten using new API. Here we take `StateMachineExample` as an > example. > 3. Compile and run `StateMachineExample` in proper environment (I suggest a > standalone session cluster or yarn), make sure you have the following command > line params: > {code:bash} > ./flink run xxxxxxxxx \ > --backend forst \ > --checkpoint-dir s3://your/cp/dir \ > --incremental-checkpoints true > {code} > Or set via `config.yaml`. > {code:yaml} > state.backend.type: forst > execution.checkpointing.incremental: true > execution.checkpointing.dir: s3://your-bucket/flink-checkpoints > {code} > 4. Check the job is running smoothly, the periodic checkpoints are > successfully taken. > 5. Stop the job and restart from the latest checkpoint. > It would be great if you could write your own job using State V2 API, and > follow the above Step 3~5. It is important to check whether there is any bug > in new State APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)