Dear all
I still haven’t found the cause to the problem I raised last week where srun -w
xx runs for some nodes but not for others — thanks for the ideas.
One intriguing result I’ve had trying to pursue this which I thought I’d share
in case it sparks some ideas. If I give the full path for s
Hi;
Are there some typo errors or they are really different paths:
/opt/exp_soft/slurm/bin/srun
vs.
which srun
/opt/exp_soft/bin/srun
Ahmet Mercan
13.11.2018 11:24 tarihinde Scott Hazelhurst yazdı:
Dear all
I still haven’t found the cause to the problem I raised last week where srun -w
Dear all,
I am wondering if that is the same issue we are having here as well.
When I am adding users in the secondary group some time *after* the
initial user installation, the user cannot access the slurm partition it
suppose to. We found two remedies here, more or less by chance:
- rebooting bo
Are you sure this isn't working as designed?
I remember there is something annoying about groups in the manual. Here it
is. This is why I prefer accounts.
*NOTE:* For performance reasons, Slurm maintains a list of user IDs allowed
to use each partition and this is checked at job submission time.
Dear Mercan
Thank you! — yes different paths so different behaviour. Amazing how you can
spend so much time looking at something and not seeing it.
On Sunday did an upgrade from 17.11.10 to 17.11.12 to try to fix the problem
but had left old binaries in a directory I should not have, so kept
Hi,
I can submit heterogeneous jobs using packjob likes
#SBATCH -p high_mem
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH packjob
#SBATCH -p log_mem
#SBATCH -N 2
#SBATCH --exclusive
i.e. specify 1 fat node and two thin nodes for one jobs.
If I use "squeue/scontrol" to check the job, it i
See the documentation at
https://slurm.schedmd.com/heterogeneous_jobs.html#env_var
There are *_PACK_* environment variables in the job env that describe the
heterogeneous allocation. The batch step of the job (corresponding to your
script) executes on the first node of the first part
After being up since the second week in Oct or so, yesterday our slurm
controller started segfaultings. It was compiled/run on ubuntu 16.04.1.
Nov 12 14:31:48 nas-11-1 kernel: [2838306.311552] srvcn[9111]: segfault at 58 ip
004b51fa sp 7fbe270efb70 error 4 in slurmctld[40+eb000
Hi Bill,
On Tue, Nov 13, 2018 at 5:35 PM Bill Broadley wrote:
> (gdb) bt
> #0 _step_dealloc_lps (step_ptr=0x555787af0f70) at step_mgr.c:2092
> #1 post_job_step (step_ptr=step_ptr@entry=0x555787af0f70) at step_mgr.c:4720
> #2 0x55578571d1d8 in _post_job_step (step_ptr=0x555787af0f70) at
>