> How did you find "90% of S3 paths of objects are missing" ?

I download the _metadata file of the last checkpoint, extract all the paths to 
the objects in the shared directory with a regular expression, and after each 
such object I try to get their bucket information (creation date and size). 
About 90% of links lead to non-existent objects.

________________________________
От: Hangxiang Yu <master...@gmail.com>
Отправлено: 22 декабря 2022 г. 14:21:46
Кому: Evgeniy Lyutikov; user@flink.apache.org
Тема: Re: Parse checkpoint _metadata file

Hi,
> Is there some way to deserialize the checkpoint _metadata file?
You could use some methods like SavepointLoader#loadSavepointMetadata in the 
State processor api to load it.

> If i try to process the file with regular expressions, then approximately 90% 
> of S3 paths of objects are actually missing in the bucket and I would like to 
> understand how it works and how it is restored if there are links to missing 
> files.
I may miss something.
How did you find "90% of S3 paths of objects are missing" ?
If you have stopped the job, you could find all files related to the checkpoint 
using the above method.
After you list all files in the checkpoint dir and compared with above, you may 
also get the remaining files list.



On Wed, Dec 21, 2022 at 9:56 PM Evgeniy Lyutikov 
<eblyuti...@avito.ru<mailto:eblyuti...@avito.ru>> wrote:

Hello All
Is there some way to deserialize the checkpint _metadata file?

I want to understand what the checkpoint saves and how the occupied space is 
distributed.

If i try to process the file with regular expressions, then approximately 90% 
of S3 paths of objects are actually missing in the bucket and I would like to 
understand how it works and how it is restored if there are links to missing 
files.

We use Flink 1.14.4

Thanks


________________________________
“This message contains confidential information/commercial secret. If you are 
not the intended addressee of this message you may not copy, save, print or 
forward it to any third party and you are kindly requested to destroy this 
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся 
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного 
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его 
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом 
отправителя электронным письмом.”


--
Best,
Hangxiang.

Reply via email to