> How did you find "90% of S3 paths of objects are missing" ? I download the _metadata file of the last checkpoint, extract all the paths to the objects in the shared directory with a regular expression, and after each such object I try to get their bucket information (creation date and size). About 90% of links lead to non-existent objects.
________________________________ От: Hangxiang Yu <master...@gmail.com> Отправлено: 22 декабря 2022 г. 14:21:46 Кому: Evgeniy Lyutikov; user@flink.apache.org Тема: Re: Parse checkpoint _metadata file Hi, > Is there some way to deserialize the checkpoint _metadata file? You could use some methods like SavepointLoader#loadSavepointMetadata in the State processor api to load it. > If i try to process the file with regular expressions, then approximately 90% > of S3 paths of objects are actually missing in the bucket and I would like to > understand how it works and how it is restored if there are links to missing > files. I may miss something. How did you find "90% of S3 paths of objects are missing" ? If you have stopped the job, you could find all files related to the checkpoint using the above method. After you list all files in the checkpoint dir and compared with above, you may also get the remaining files list. On Wed, Dec 21, 2022 at 9:56 PM Evgeniy Lyutikov <eblyuti...@avito.ru<mailto:eblyuti...@avito.ru>> wrote: Hello All Is there some way to deserialize the checkpint _metadata file? I want to understand what the checkpoint saves and how the occupied space is distributed. If i try to process the file with regular expressions, then approximately 90% of S3 paths of objects are actually missing in the bucket and I would like to understand how it works and how it is restored if there are links to missing files. We use Flink 1.14.4 Thanks ________________________________ “This message contains confidential information/commercial secret. If you are not the intended addressee of this message you may not copy, save, print or forward it to any third party and you are kindly requested to destroy this message and notify the sender thereof by email. Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом отправителя электронным письмом.” -- Best, Hangxiang.