Hey folks, Is it possible to add some sort of jitter to the checkpointing logic for massively parallel jobs to mitigate the burst impact on the durable storage when a checkpoint is triggered?
Thanks, Matyas
Hey folks, Is it possible to add some sort of jitter to the checkpointing logic for massively parallel jobs to mitigate the burst impact on the durable storage when a checkpoint is triggered?
Thanks, Matyas