Hi

As Gordon said, the metadata will contain the ByteStreamStateHandle, when
writing out the ByteStreamStateHandle, will write out the handle name --
which is a path(as you saw). The ByteStreamStateHandle will be created when
state size is small than `state.backend.fs.memory-threshold`(default is
1024).

If you want to verify this, you can ref the unit test
`CheckpointMetadataLoadingTest#testLoadAndValidateSavepoint` and load the
metadata, you can find out that there are many `ByteStreamStateHandle`, and
their names are the strings you saw in the metadata.

Best,
Congxian


Jacob Sevart <jsev...@uber.com> 于2020年3月6日周五 上午3:57写道:

> Thanks, I will monitor that thread.
>
> I'm having a hard time following the serialization code, but if you know
> anything about the layout, tell me if this makes sense. What I see in the
> hex editor is, first, many HDFS paths. Then gigabytes of unreadable data.
> Then finally another HDFS path at the end.
>
> If it is putting state in there, under normal circumstances, does it make
> sense that it would be interleaved with metadata? I would expect all the
> metadata to come first, and then state.
>
> Jacob
>
>
>
> Jacob
>
> On Thu, Mar 5, 2020 at 10:53 AM Kostas Kloudas <kklou...@gmail.com> wrote:
>
>> Hi Jacob,
>>
>> As I said previously I am not 100% sure what can be causing this
>> behavior, but this is a related thread here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_r3bfa2a3368a9c7850cba778e4decfe4f6dba9607f32addb69814f43d-2540-253Cuser.flink.apache.org-253E&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=lTq5mEceM-U-tVfWzKBngg&m=awEv6FqKY6dZ8NIA4KEFc_qQ6aadR_jTAWnO17wtAus&s=P3Xd0IFKJTDIG2MMeP-hOSfY4ohoCEUMQEJhvGecSlI&e=
>>
>> Which you can re-post your problem and monitor for answers.
>>
>> Cheers,
>> Kostas
>>
>> On Wed, Mar 4, 2020 at 7:02 PM Jacob Sevart <jsev...@uber.com> wrote:
>> >
>> > Kostas and Gordon,
>> >
>> > Thanks for the suggestions! I'm on RocksDB. We don't have that setting
>> configured so it should be at the default 1024b. This is the full "state.*"
>> section showing in the JobManager UI.
>> >
>> >
>> >
>> > Jacob
>> >
>> > On Wed, Mar 4, 2020 at 2:45 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org>
>> wrote:
>> >>
>> >> Hi Jacob,
>> >>
>> >> Apart from what Klou already mentioned, one slightly possible reason:
>> >>
>> >> If you are using the FsStateBackend, it is also possible that your
>> state is small enough to be considered to be stored inline within the
>> metadata file.
>> >> That is governed by the "state.backend.fs.memory-threshold"
>> configuration, with a default value of 1024 bytes, or can also be
>> configured with the `fileStateSizeThreshold` argument when constructing the
>> `FsStateBackend`.
>> >> The purpose of that threshold is to ensure that the backend does not
>> create a large amount of very small files, where potentially the file
>> pointers are actually larger than the state itself.
>> >>
>> >> Cheers,
>> >> Gordon
>> >>
>> >>
>> >>
>> >> On Wed, Mar 4, 2020 at 6:17 PM Kostas Kloudas <kklou...@gmail.com>
>> wrote:
>> >>>
>> >>> Hi Jacob,
>> >>>
>> >>> Could you specify which StateBackend you are using?
>> >>>
>> >>> The reason I am asking is that, from the documentation in [1]:
>> >>>
>> >>> "Note that if you use the MemoryStateBackend, metadata and savepoint
>> >>> state will be stored in the _metadata file. Since it is
>> >>> self-contained, you may move the file and restore from any location."
>> >>>
>> >>> I am also cc'ing Gordon who may know a bit more about state formats.
>> >>>
>> >>> I hope this helps,
>> >>> Kostas
>> >>>
>> >>> [1]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Drelease-2D1.6_ops_state_savepoints.html&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=lTq5mEceM-U-tVfWzKBngg&m=awEv6FqKY6dZ8NIA4KEFc_qQ6aadR_jTAWnO17wtAus&s=fw0c-Ct21HHJv4MzZRicIaltqHLQOrNvqchzNgCdwkA&e=
>> >>>
>> >>> On Wed, Mar 4, 2020 at 1:25 AM Jacob Sevart <jsev...@uber.com> wrote:
>> >>> >
>> >>> > Per the documentation:
>> >>> >
>> >>> > "The meta data file of a Savepoint contains (primarily) pointers to
>> all files on stable storage that are part of the Savepoint, in form of
>> absolute paths."
>> >>> >
>> >>> > I somehow have a _metadata file that's 1.9GB. Running strings on it
>> I find 962 strings, most of which look like HDFS paths, which leaves a lot
>> of that file-size unexplained. What else is in there, and how exactly could
>> this be happening?
>> >>> >
>> >>> > We're running 1.6.
>> >>> >
>> >>> > Jacob
>> >
>> >
>> >
>> > --
>> > Jacob Sevart
>> > Software Engineer, Safety
>>
>
>
> --
> Jacob Sevart
> Software Engineer, Safety
>

Reply via email to