Hi Chris,
On Sunday, September 23, 2018 09:34 AM, Chris Samuel wrote:
On Saturday, 22 September 2018 4:19:09 PM AEST Raymond Wan wrote:
SLURM's ability to suspend jobs must be storing the state in a
location outside of this 512 GB. So, you're not helping this by
allocating more swap.
I don't believe that's the case. My understanding is that in this mode it's
just sending processes SIGSTOP and then launching the incoming job so you
should really have enough swap for the previous job to get swapped out to in
order to free up RAM for the incoming job.
Hmmmmmm, I'm way out of my comfort zone but I am curious
about what happens. Unfortunately, I don't think I'm able
to read kernel code, but someone here
(https://stackoverflow.com/questions/31946854/how-does-sigstop-work-in-linux-kernel)
seems to suggest that SIGSTOP and SIGCONT moves a process
between the runnable and waiting queues.
I'm not sure if I did the correct test, but I wrote a C
program that allocated a lot of memory:
-----
#include <stdlib.h>
#define memsize 160000000
int main () {
char *foo = NULL;
foo = (char *) malloc (sizeof (char) * memsize);
for (int i = 0; i < memsize; i++) {
foo[i] = 0;
}
do {
} while (1);
}
-----
Then, I ran it and sent a SIGSTOP to it. According to htop
(I don't know if it's correct), it seems to still be
occupying memory, but just not any CPU cycles.
Perhaps I've done something wrong? I did read elsewhere
that how SIGSTOP is treated can vary from system to
system... I happen to be on an Ubuntu system.
Ray