Hi As Gordon said, the metadata will contain the ByteStreamStateHandle, when writing out the ByteStreamStateHandle, will write out the handle name -- which is a path(as you saw). The ByteStreamStateHandle will be created when state size is small than `state.backend.fs.memory-threshold`(default is 1024).
If you want to verify this, you can ref the unit test `CheckpointMetadataLoadingTest#testLoadAndValidateSavepoint` and load the metadata, you can find out that there are many `ByteStreamStateHandle`, and their names are the strings you saw in the metadata. Best, Congxian Jacob Sevart <jsev...@uber.com> 于2020年3月6日周五 上午3:57写道: > Thanks, I will monitor that thread. > > I'm having a hard time following the serialization code, but if you know > anything about the layout, tell me if this makes sense. What I see in the > hex editor is, first, many HDFS paths. Then gigabytes of unreadable data. > Then finally another HDFS path at the end. > > If it is putting state in there, under normal circumstances, does it make > sense that it would be interleaved with metadata? I would expect all the > metadata to come first, and then state. > > Jacob > > > > Jacob > > On Thu, Mar 5, 2020 at 10:53 AM Kostas Kloudas <kklou...@gmail.com> wrote: > >> Hi Jacob, >> >> As I said previously I am not 100% sure what can be causing this >> behavior, but this is a related thread here: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_r3bfa2a3368a9c7850cba778e4decfe4f6dba9607f32addb69814f43d-2540-253Cuser.flink.apache.org-253E&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=lTq5mEceM-U-tVfWzKBngg&m=awEv6FqKY6dZ8NIA4KEFc_qQ6aadR_jTAWnO17wtAus&s=P3Xd0IFKJTDIG2MMeP-hOSfY4ohoCEUMQEJhvGecSlI&e= >> >> Which you can re-post your problem and monitor for answers. >> >> Cheers, >> Kostas >> >> On Wed, Mar 4, 2020 at 7:02 PM Jacob Sevart <jsev...@uber.com> wrote: >> > >> > Kostas and Gordon, >> > >> > Thanks for the suggestions! I'm on RocksDB. We don't have that setting >> configured so it should be at the default 1024b. This is the full "state.*" >> section showing in the JobManager UI. >> > >> > >> > >> > Jacob >> > >> > On Wed, Mar 4, 2020 at 2:45 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> >> wrote: >> >> >> >> Hi Jacob, >> >> >> >> Apart from what Klou already mentioned, one slightly possible reason: >> >> >> >> If you are using the FsStateBackend, it is also possible that your >> state is small enough to be considered to be stored inline within the >> metadata file. >> >> That is governed by the "state.backend.fs.memory-threshold" >> configuration, with a default value of 1024 bytes, or can also be >> configured with the `fileStateSizeThreshold` argument when constructing the >> `FsStateBackend`. >> >> The purpose of that threshold is to ensure that the backend does not >> create a large amount of very small files, where potentially the file >> pointers are actually larger than the state itself. >> >> >> >> Cheers, >> >> Gordon >> >> >> >> >> >> >> >> On Wed, Mar 4, 2020 at 6:17 PM Kostas Kloudas <kklou...@gmail.com> >> wrote: >> >>> >> >>> Hi Jacob, >> >>> >> >>> Could you specify which StateBackend you are using? >> >>> >> >>> The reason I am asking is that, from the documentation in [1]: >> >>> >> >>> "Note that if you use the MemoryStateBackend, metadata and savepoint >> >>> state will be stored in the _metadata file. Since it is >> >>> self-contained, you may move the file and restore from any location." >> >>> >> >>> I am also cc'ing Gordon who may know a bit more about state formats. >> >>> >> >>> I hope this helps, >> >>> Kostas >> >>> >> >>> [1] >> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Drelease-2D1.6_ops_state_savepoints.html&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=lTq5mEceM-U-tVfWzKBngg&m=awEv6FqKY6dZ8NIA4KEFc_qQ6aadR_jTAWnO17wtAus&s=fw0c-Ct21HHJv4MzZRicIaltqHLQOrNvqchzNgCdwkA&e= >> >>> >> >>> On Wed, Mar 4, 2020 at 1:25 AM Jacob Sevart <jsev...@uber.com> wrote: >> >>> > >> >>> > Per the documentation: >> >>> > >> >>> > "The meta data file of a Savepoint contains (primarily) pointers to >> all files on stable storage that are part of the Savepoint, in form of >> absolute paths." >> >>> > >> >>> > I somehow have a _metadata file that's 1.9GB. Running strings on it >> I find 962 strings, most of which look like HDFS paths, which leaves a lot >> of that file-size unexplained. What else is in there, and how exactly could >> this be happening? >> >>> > >> >>> > We're running 1.6. >> >>> > >> >>> > Jacob >> > >> > >> > >> > -- >> > Jacob Sevart >> > Software Engineer, Safety >> > > > -- > Jacob Sevart > Software Engineer, Safety >