date:20230112

Re: [slurm-users] Jobs can grow in RAM usage surpassing MaxMemPerNode

2023-01-12 Thread Daniel Letai

Hello Cristóbal, I think you might have a slight misunderstanding of how Slurm works, which can cause this difference in expectation. The MaxMemPerNode is there to allow the scheduler to plan job placement according to resources. It does not enforc

[slurm-users] Cannot enable Gang scheduling

2023-01-12 Thread Helder Daniel

Hi, I am trying to enable gang scheduling on a server with a CPU with 32 cores and 4 GPUs. However, using Gang sched, the cpu jobs (or gpu jobs) are not being preempted after the time slice, which is set to 30 secs. Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. The first

[slurm-users] Regression from slurm-22.05.2 to slurm-22.05.7 when using "--gpus=N" option.

2023-01-12 Thread Rigoberto Corujo

Hello, I have a small 2 compute node GPU cluster, where each node as 2 GPUs. $ sinfo -o "%20N %10c %10m %25f %30G " NODELIST CPUS MEMORY AVAIL_FEATURES GRES o186i[126-127] 128 64000 (null)

Re: [slurm-users] job_container/tmpfs and autofs

2023-01-12 Thread Gizo Nanava

Hello, another workaround could be to use the InitScript=/path/to/script.sh option of the plugin. For example, if user's home directory is under autofs: script.sh: uid=$(squeue -h -O username -j $SLURM_JOB_ID) cd /home/$uid Best regards Gizo > Hi there, > we excitedly found the job_container

Re: [slurm-users] job_container/tmpfs and autofs

2023-01-12 Thread Ole Holm Nielsen

Hi Magnus, We had the same challenge some time ago. A long description of solutions is in my Wiki page at https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories The issue may have been solved in https://bugs.schedmd.com/show_bug.cgi?id=12567 which will be i

Re: [slurm-users] job_container/tmpfs and autofs

2023-01-12 Thread Ümit Seren

We had the same issue when we switched to job_container plugin. We ended up running cvmfs_cpnfig probe as part of the health check tool so that the cvmfs repos stay mounted. However after we switched on power saving we ran into some race conditions (job landed on a node before the cvmfs was mounted

Re: [slurm-users] job_container/tmpfs and autofs

2023-01-12 Thread Bjørn-Helge Mevik

In my opinion, the problem is with autofs, not with tmpfs. Autofs simply doesn't work well when you are using detached fs name spaces and bind mounting. We ran into this problem years ago (with an inhouse spank plugin doing more or less what tmpfs does), and ended up simply not using autofs. I g

Re: [slurm-users] Jobs can grow in RAM usage surpassing MaxMemPerNode

[slurm-users] Cannot enable Gang scheduling

[slurm-users] Regression from slurm-22.05.2 to slurm-22.05.7 when using "--gpus=N" option.

Re: [slurm-users] job_container/tmpfs and autofs

Re: [slurm-users] job_container/tmpfs and autofs

Re: [slurm-users] job_container/tmpfs and autofs

Re: [slurm-users] job_container/tmpfs and autofs

7 matches

Site Navigation

Mail list logo

Footer information