[ https://issues.apache.org/jira/browse/FLINK-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kostas Kloudas closed FLINK-10963. ---------------------------------- Resolution: Fixed Merged to master with bf5dc6c8bac73e7f8d54e983be2080cf6ce48a94 and to release-1.7 with 5de507835b2b9a93376820b79a435b8efe53b8a6 > Cleanup small objects uploaded to S3 as independent objects > ----------------------------------------------------------- > > Key: FLINK-10963 > URL: https://issues.apache.org/jira/browse/FLINK-10963 > Project: Flink > Issue Type: Sub-task > Components: filesystem-connector > Affects Versions: 1.7.0 > Reporter: Kostas Kloudas > Assignee: Kostas Kloudas > Priority: Major > Labels: pull-request-available > Fix For: 1.7.1 > > > The S3 {{RecoverableWriter}} uses the Multipart Upload (MPU) Feature of S3 in > order to upload the different part files. This means that a large part is > split in chunks of at least 5MB which are uploaded independently, whenever > each one of them is ready. > This 5MB minimum size requires special handling of parts that are less than > 5MB when a checkpoint barrier arrives. These small files are uploaded as > independent objects (not associated with an active MPU). This way, when Flink > needs to restore, it simply downloads them and resumes writing to them. > These small objects are currently not cleaned up, thus leading to wasted > space on S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)