Re: Disk usage during savepoints

Rex Fenley Sat, 12 Dec 2020 15:42:39 -0800

Our job just crashed while running a savepoint, it ran out of disk space. I
inspected the disk and found the following:
-rw-------  1 yarn   yarn   10139680768 Dec 12 22:14
presto-s3-10125099138119182412.tmp
-rw-------  1 yarn   yarn   10071916544 Dec 12 22:14
presto-s3-10363672991943897408.tmp
-rw-------  1 yarn   yarn   10276716544 Dec 12 22:14
presto-s3-12109236276406796165.tmp
-rw-------  1 yarn   yarn    9420505088 Dec 12 22:14
presto-s3-12584127250588531727.tmp
-rw-------  1 yarn   yarn   10282295296 Dec 12 22:14
presto-s3-14352553379340277827.tmp
-rw-------  1 yarn   yarn    9463644160 Dec 12 22:14
presto-s3-14552162277341829612.tmp
-rw-------  1 yarn   yarn   10447626240 Dec 12 22:14
presto-s3-14660072789354472725.tmp
-rw-------  1 yarn   yarn    9420906496 Dec 12 22:14
presto-s3-15982235495935827021.tmp
-rw-------  1 yarn   yarn   10268663808 Dec 12 22:14
presto-s3-16188204950210407933.tmp
-rw-------  1 yarn   yarn    9309986816 Dec 12 22:14
presto-s3-17905518564307248197.tmp
-rw-------  1 yarn   yarn    9491578880 Dec 12 22:14
presto-s3-1839692230976299010.tmp
-rw-------  1 yarn   yarn    9308168192 Dec 12 22:14
presto-s3-2488279210497334939.tmp
-rw-------  1 yarn   yarn    9496961024 Dec 12 22:14
presto-s3-3559445453885492666.tmp
-rw-------  1 yarn   yarn    9467682816 Dec 12 22:14
presto-s3-4932415031914708987.tmp
-rw-------  1 yarn   yarn   10042425344 Dec 12 22:14
presto-s3-5619769647590893462.tmp


So it appears that everything is being written, on one of our disks first,
locally before being written to S3.

Is there a way to tell flink or the os to divide this work up across
mounted disks so it's not all up to 1 disk?

Thanks!

On Sat, Dec 12, 2020 at 10:12 AM Rex Fenley <r...@remind101.com> wrote:

> Also, small correction from earlier, there are 4 volumes of 256 GiB so
> that's 1 TiB total.
>
> On Sat, Dec 12, 2020 at 10:08 AM Rex Fenley <r...@remind101.com> wrote:
>
>> Our first big test run we wanted to eliminate as many variables as
>> possible, so this is on 1 machine with 1 task manager and 1 parallelism.
>> The machine has 4 disks though, and as you can see, they mostly all use
>> around the same space for storage until a savepoint is triggered.
>>
>> Could it be that given a parallelism of 1, certain operator's states are
>> pinned to specific drives and as it's doing compaction it's moving
>> everything over to that drive into a single file?
>> In which case, would greater parallelism distribute the work more evenly?
>>
>> Thanks!
>>
>>
>> On Sat, Dec 12, 2020 at 2:35 AM David Anderson <dander...@apache.org>
>> wrote:
>>
>>> RocksDB does do compaction in the background, and incremental
>>> checkpoints simply mirror to S3 the set of RocksDB SST files needed by the
>>> current set of checkpoints.
>>>
>>> However, unlike checkpoints, which can be incremental, savepoints are
>>> always full snapshots. As for why one host would have much more state than
>>> the others, perhaps you have significant key skew, and one task manager is
>>> ending up with more than its share of state to manage.
>>>
>>> Best,
>>> David
>>>
>>> On Sat, Dec 12, 2020 at 12:31 AM Rex Fenley <r...@remind101.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We're using the Rocks state backend with incremental checkpoints and
>>>> savepoints setup for S3. We notice that every time we trigger a savepoint,
>>>> one of the local disks on our host explodes in disk usage.
>>>> What is it that savepoints are doing which would cause so much disk to
>>>> be used?
>>>> Our checkpoints are a few GiB in size, is the savepoint combining all
>>>> the checkpoints together at once on disk?
>>>> I figured that incremental checkpoints would compact over time in the
>>>> background, is that correct?
>>>>
>>>> Thanks
>>>>
>>>> Graph here. Parallelism is 1 and volume size is 256 GiB.
>>>> [image: Screen Shot 2020-12-11 at 2.59.59 PM.png]
>>>>
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Disk usage during savepoints

Reply via email to