On 25/07/16 12:55, Lachlan Musicman wrote:

>  - how does slurm put jobs into suspended mode given that some may have
> large amounts of data in memory?

I suspect it depends on how you've configured Slurm for checkpointing.

CheckpointType
    The system-initiated checkpoint method to be used for user jobs.

BLCR should support resuming it on another node, but if it's a restart
type then the job might have started again from scratch - or from its
own internal checkpoint system if it has one.

Best of luck!
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to