Hello Cristóbal,
I think you might have a slight misunderstanding of how Slurm
works, which can cause this difference in expectation.
The MaxMemPerNode is there to allow the scheduler to plan job
placement according to resources. It does not enforc
Hi,
I am trying to enable gang scheduling on a server with a CPU with 32 cores
and 4 GPUs.
However, using Gang sched, the cpu jobs (or gpu jobs) are not being
preempted after the time slice, which is set to 30 secs.
Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. The
first
Hello,
I have a small 2 compute node GPU cluster, where each node as 2 GPUs.
$ sinfo -o "%20N %10c %10m %25f %30G "
NODELIST CPUS MEMORY AVAIL_FEATURES GRES
o186i[126-127] 128 64000 (null)
Hello,
another workaround could be to use the InitScript=/path/to/script.sh option of
the plugin.
For example, if user's home directory is under autofs:
script.sh:
uid=$(squeue -h -O username -j $SLURM_JOB_ID)
cd /home/$uid
Best regards
Gizo
> Hi there,
> we excitedly found the job_container
Hi Magnus,
We had the same challenge some time ago. A long description of solutions
is in my Wiki page at
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories
The issue may have been solved in
https://bugs.schedmd.com/show_bug.cgi?id=12567 which will be i
We had the same issue when we switched to job_container plugin. We ended up
running cvmfs_cpnfig probe as part of the health check tool so that the
cvmfs repos stay mounted. However after we switched on power saving we ran
into some race conditions (job landed on a node before the cvmfs was
mounted
In my opinion, the problem is with autofs, not with tmpfs. Autofs
simply doesn't work well when you are using detached fs name spaces and
bind mounting. We ran into this problem years ago (with an inhouse
spank plugin doing more or less what tmpfs does), and ended up simply
not using autofs.
I g