Thank you so much for your reply. I apologise I did not mention multiple savepoint files in my last question.
I understand the part. I did not ask the question (only for one savepoint file) exactly. When we run a job, we have obviously many savepoint files (by using a manual command repeatedly) I am asking: is it possible to extract all savepoint files data? Thank you again On Fri, Apr 30, 2021 at 12:42 PM Abdullah bin Omar < abdullahbinoma...@gmail.com> wrote: > Thank you so much for your reply. > > I apologise I did not mention multiple savepoint files in my last > question. > > I understand the part. I did not ask the question (only for one savepoint > file) exactly. When we run a job, we have obviously many savepoint files > (by using a manual command repeatedly) > > I am asking: is it possible to extract all savepoint files data? > > Thank you again > > On Fri, Apr 30, 2021 at 12:01 PM David Anderson <dander...@apache.org> > wrote: > >> So, can't we extract all previous savepoint data by using >>> ExistingSavepoint? >> >> >> You can extract all of the data from any specific savepoint. Or nearly >> all data, anyway. There is at least one corner case that isn't covered -- >> ListCheckpointed state -- which has been deprecated and isn't supported by >> the savepoint API. >> >> David >> >> On Fri, Apr 30, 2021 at 5:42 PM Abdullah bin Omar < >> abdullahbinoma...@gmail.com> wrote: >> >>> Hi, >>> >>> So, can't we extract all previous savepoint data by using >>> ExistingSavepoint? >>> >>> >>> Thank you >>> >>> >>> >>> >>> >>> >>> On Fri, Apr 30, 2021 at 10:25 AM David Anderson <dander...@apache.org> >>> wrote: >>> >>>> Abdullah, >>>> >>>> The example you are studying -- the one using the state processor API >>>> -- can be used with any retained checkpoint or savepoint created while >>>> running the RidesAndFaresSolution job. But this is a very special use of >>>> checkpoints and savepoints that shows how to extract data from them. >>>> >>>> Normally the state processor API is used with savepoints, and not with >>>> checkpoints. This example uses checkpoints so that the example can be >>>> easily run from the IDE, without requiring a local flink installation. >>>> >>>> The normal use for checkpoints is for failure recovery, while >>>> savepoints are typically used for redeployments and rescaling -- and in >>>> these cases the state processor API is not involved. You would use "flink >>>> run -s ..." on the command line to manually resume from a checkpoint or >>>> savepoint, and in the case of a job failure, the restart will happen >>>> automatically. >>>> >>>> The flink operations playground [1] is a great way to gain more >>>> understanding of these aspects of flink. >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-stable/try-flink/flink-operations-playground.html >>>> >>>> Best regards, >>>> David >>>> >>>> On Fri, Apr 30, 2021 at 1:56 PM Abdullah bin Omar < >>>> abdullahbinoma...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Please answer me some of my below question whether my understanding >>>>> correct or not, and please answer the direct ask questions. >>>>> >>>>> *Question no 1 (about dependency):* >>>>> >>>>> *What is dependency (in pom.xml) for the org.apache.flink.training?* >>>>> >>>>> I am trying to *import* >>>>> org.apache.flink.training.exercises.common.sources.TaxiFareGenerator; >>>>> However, it can not resolve. >>>>> >>>>> [note that, I am using the group id: <groupId>org.apache.flink</ >>>>> groupId> >>>>> >>>>> >>>>> *Question No 2 (which one is being load to an existing savepoint):* >>>>> >>>>> >>>>> According to my understanding after reading [1], the name >>>>> "ExistingSavepoint" looks like that it will restore all previous >>>>> savepoint. >>>>> However, according to [2], the input file is only a checkpointed file. >>>>> >>>>> >>>>> *(i)* *is that mean that we can only load the last checkpointed file >>>>> (in case of job failure) by using the ExistingSavepoint to restart the job >>>>> where it fails?* >>>>> >>>>> >>>>> *(ii)* *and there is no option to load all previous savepoint. is >>>>> this correct?* >>>>> >>>>> >>>>> >>>>> *Question No 3 (about loading an existing savepoint):* >>>>> >>>>> ExecutionEnvironment bEnv = ExecutionEnvironment. >>>>> *getExecutionEnvironment*(); >>>>> >>>>> ExistingSavepoint sp = Savepoint.*load*(bEnv, "hdfs://path/", new >>>>> MemoryStateBackend); >>>>> >>>>> >>>>> >>>>> This is the code for loading an existing savepoint. However, I >>>>> configure a file location in flink conf to save the savepoint. So then, >>>>> each time the job is running. I use a command in the terminal, ./bin/flink >>>>> savepoint jobid >>>>> >>>>> and the savepointed file saved in the file location (that is set up in >>>>> flink conf). >>>>> >>>>> >>>>> In this case, to load the savepoint, file location will be the >>>>> location that set up in the flink conf and FileSystemBackend will have to >>>>> use instead of MemoryStateBackend. *is this correct?* >>>>> >>>>> >>>>> >>>>> >>>>> [1] >>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html >>>>> >>>>> [2] >>>>> https://github.com/ververica/flink-training/blob/master/state-processor/src/main/java/com/ververica/flink/training/exercises/ReadRidesAndFaresSnapshot.java >>>>> >>>>> >>>>> >>>>> >>>>> Thank you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Apr 23, 2021 at 10:10 AM David Anderson <dander...@apache.org> >>>>> wrote: >>>>> >>>>>> Abdullah, >>>>>> >>>>>> ReadRidesAndFaresSnapshot [1] is an example that shows how to use the >>>>>> State Processor API to display the contents of a snapshot taken while >>>>>> running RidesAndFaresSolution [2]. >>>>>> >>>>>> Hopefully that will help you get started. >>>>>> >>>>>> [1] >>>>>> https://github.com/ververica/flink-training/blob/master/state-processor/src/main/java/com/ververica/flink/training/exercises/ReadRidesAndFaresSnapshot.java >>>>>> [2] >>>>>> https://github.com/ververica/flink-training/blob/master/rides-and-fares/src/solution/java/org/apache/flink/training/solutions/ridesandfares/RidesAndFaresSolution.java >>>>>> >>>>>> Best regards, >>>>>> David >>>>>> >>>>>> On Fri, Apr 23, 2021 at 3:32 PM Abdullah bin Omar < >>>>>> abdullahbinoma...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Thank you for your reply. >>>>>>> >>>>>>> I want to read the previous snapshot (if needed) at the time of >>>>>>> operation. In [1], there is a portion: >>>>>>> >>>>>>> DataSet<Integer> listState = savepoint.readListState<>( >>>>>>> "my-uid", >>>>>>> "list-state", >>>>>>> Types.INT); >>>>>>> >>>>>>> >>>>>>> here, will the function savepoint.readliststate<> () work to read >>>>>>> the previous snapshot? If it is, then is the filename of a savepoint >>>>>>> file >>>>>>> similar to my-uid? >>>>>>> >>>>>>> [1] >>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 23, 2021 at 1:11 AM Matthias Pohl < >>>>>>> matth...@ververica.com> wrote: >>>>>>> >>>>>>>> What is it you're trying to achieve in general? The JavaDoc of >>>>>>>> MetadataV2V3SerializerBase provides a description on the format of the >>>>>>>> file. Theoretically, you could come up with custom code using the Flink >>>>>>>> sources to parse the content of the file. But maybe, there's another >>>>>>>> way to >>>>>>>> accomplish what you're trying to do. >>>>>>>> >>>>>>>> Matthias >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/apache/flink/blob/adaaed426c2e637b8e5ffa3f0d051326038d30aa/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV2V3SerializerBase.java#L83 >>>>>>>> >>>>>>>> On Thu, Apr 22, 2021 at 7:53 PM Abdullah bin Omar < >>>>>>>> abdullahbinoma...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have a savepoint or checkpointed file from my task. However, the >>>>>>>>> file is binary. I want to see what the file contains. >>>>>>>>> >>>>>>>>> How is it possible to see what information the file has (or how it >>>>>>>>> is possible to make it human readable?) >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> On Thu, Apr 22, 2021 at 10:19 AM Matthias Pohl < >>>>>>>>> matth...@ververica.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Abdullah, >>>>>>>>>> the metadata file contains handles to the operator states of the >>>>>>>>>> checkpoint [1]. You might want to have a look into the State >>>>>>>>>> Processor API >>>>>>>>>> [2]. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Matthias >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://github.com/apache/flink/blob/adaaed426c2e637b8e5ffa3f0d051326038d30aa/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV2V3SerializerBase.java#L83 >>>>>>>>>> [2] >>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html >>>>>>>>>> >>>>>>>>>> On Thu, Apr 22, 2021 at 4:57 PM Abdullah bin Omar < >>>>>>>>>> abdullahbinoma...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> (1) what 's the snapshot metadata file (binary) contains ? is it >>>>>>>>>>> possible to read the snapshot metadata file by using Flink >>>>>>>>>>> Deserialization? >>>>>>>>>>> >>>>>>>>>>> (2) is there any function that can be used to see the previous >>>>>>>>>>> states on time of operation? >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>> >>>>>>>>