Re: How can we read checkpoint data for debugging state

2025-02-26 Thread Gabor Somogyi
The plan is to support everything but the question is when. Related short term options I would suggest to turn off changelog in the reader and try to read the state as-is. G On Wed, Feb 26, 2025 at 1:22 PM Sachin Mittal wrote: > Hi, > Looks like this class is very minimally implemented. What e

Re: How can we read checkpoint data for debugging state

2025-02-26 Thread Sachin Mittal
Hi, Looks like this class is very minimally implemented. What exactly does the State Processor API support ? Looking at that class it seems to only support prioritizedOperatorSubtaskState. Are there any plans to support changelogs in future. In view of this, what are my options? I wanted to debu

Re: How can we read checkpoint data for debugging state

2025-02-26 Thread Gabor Somogyi
State processor API is now not supporting changelog. Please see here [1]. [1] https://github.com/apache/flink/blob/69559fb5d231d704633fed807773cd1853601862/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/runtime/SavepointTaskStateManager.java#L127 G On Wed, F

Re: How can we read checkpoint data for debugging state

2025-02-26 Thread Sachin Mittal
Hello Folks, I am trying to read the flink checkpointed state and I am getting this error: Caused by: java.io.IOException: Failed to restore state backend at org.apache.flink.state.api.input.StreamOperatorContextBuilder.build( StreamOperatorContextBuilder.java:140) at org.apache.flink.state.api.in

Re: How can we read checkpoint data for debugging state

2025-02-21 Thread Gabor Somogyi
All in all Flink generates and stores uid hashes in state which is a cryptic value. Savepoint metadata can be read the following way: var metadata = SavepointLoader.loadSavepointMetadata(path); for (var operatorState: metadata.getOperatorStates()) { operatorState.getOperatorID(); } The task

Re: How can we read checkpoint data for debugging state

2025-02-21 Thread Sachin Mittal
Hi, I think now I realize the importance of setting uid. Perhaps in Flink docs they should lay emphasis on always setting up uid to your operator, just like we set name. Unfortunately for my current pipeline which is in production, there is no uid set. Is there a way I can find these uids? Thanks

Re: How can we read checkpoint data for debugging state

2025-02-21 Thread Gabor Somogyi
The UID must match in the Flink app `events.uid("my-uid")` and in the reader app `forUid("my-uid")`. In general it's a good practice to set uid in the Flink app for each and every operator, otherwise Flink generates an almost random number for it. When you don't know the generated uid then no worr

Re: How can we read checkpoint data for debugging state

2025-02-21 Thread Gabor Somogyi
Hi Sachin, You're on the right track in general and this should work :) I know that version upgrade can be complex but it worth to do so because we've just added a ~95% performance boost for the state processor api in 1.20 [1]. As you mentioned the SQL connector is on 2.x which is not yet feasible

Re: How can we read checkpoint data for debugging state

2025-02-21 Thread Nikola Milutinovic
.description("This is my first Proc"); Nix,. From: Sachin Mittal Date: Friday, February 21, 2025 at 8:40 AM To: Xuyang Cc: user Subject: Re: How can we read checkpoint data for debugging state Hi, I am working on Flink 1.19.1, so I guess I cannot use the SQL connector as that&#

Re: How can we read checkpoint data for debugging state

2025-02-20 Thread Sachin Mittal
Hi, I am working on Flink 1.19.1, so I guess I cannot use the SQL connector as that's to be released for 2.1.0 If I directly try to use the flink-state-processor-api, my very first question is how do I know the uuid for each keyed state ? Is it the step name? as in events .keyBy(new MyKeySelector(

How can we read checkpoint data for debugging state

2025-02-20 Thread Sachin Mittal
Hi, So I have a flink application which stores state (RocksDB state backend) in S3 with the following directory structure: s3://{bucket}/flink-checkpoints//{job-id}|+ --shared/+ --taskowned/+ --chk-/ I have my job pipeline defined like: final DataStream e