Re: [slurm-users] srun problem -- Can't find an address, check slurm.conf

2018-11-13 Thread Scott Hazelhurst
Dear all I still haven’t found the cause to the problem I raised last week where srun -w xx runs for some nodes but not for others — thanks for the ideas. One intriguing result I’ve had trying to pursue this which I thought I’d share in case it sparks some ideas. If I give the full path for s

Re: [slurm-users] srun problem -- Can't find an address, check slurm.conf

2018-11-13 Thread mercan
Hi; Are there some typo errors or they are really different paths: /opt/exp_soft/slurm/bin/srun vs. which srun /opt/exp_soft/bin/srun Ahmet Mercan 13.11.2018 11:24 tarihinde Scott Hazelhurst yazdı: Dear all I still haven’t found the cause to the problem I raised last week where srun -w

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-13 Thread Joerg Sassmannshausen
Dear all, I am wondering if that is the same issue we are having here as well. When I am adding users in the secondary group some time *after* the initial user installation, the user cannot access the slurm partition it suppose to. We found two remedies here, more or less by chance: - rebooting bo

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-13 Thread Antony Cleave
Are you sure this isn't working as designed? I remember there is something annoying about groups in the manual. Here it is. This is why I prefer accounts. *NOTE:* For performance reasons, Slurm maintains a list of user IDs allowed to use each partition and this is checked at job submission time.

Re: [slurm-users] srun problem -- Can't find an address, check slurm.conf

2018-11-13 Thread Scott Hazelhurst
Dear Mercan Thank you! — yes different paths so different behaviour. Amazing how you can spend so much time looking at something and not seeing it. On Sunday did an upgrade from 17.11.10 to 17.11.12 to try to fix the problem but had left old binaries in a directory I should not have, so kept

[slurm-users] heterogeneous jobs using packjob

2018-11-13 Thread Jing Gong
Hi, I can submit heterogeneous jobs using packjob likes #SBATCH -p high_mem #SBATCH -N 1 #SBATCH --exclusive #SBATCH packjob #SBATCH -p log_mem #SBATCH -N 2 #SBATCH --exclusive i.e. specify 1 fat node and two thin nodes for one jobs. If I use "squeue/scontrol" to check the job, it i

Re: [slurm-users] heterogeneous jobs using packjob

2018-11-13 Thread Jeffrey Frey
See the documentation at https://slurm.schedmd.com/heterogeneous_jobs.html#env_var There are *_PACK_* environment variables in the job env that describe the heterogeneous allocation. The batch step of the job (corresponding to your script) executes on the first node of the first part

[slurm-users] Slurmctld 18.08.1 and 18.08.3 segfault

2018-11-13 Thread Bill Broadley
After being up since the second week in Oct or so, yesterday our slurm controller started segfaultings. It was compiled/run on ubuntu 16.04.1. Nov 12 14:31:48 nas-11-1 kernel: [2838306.311552] srvcn[9111]: segfault at 58 ip 004b51fa sp 7fbe270efb70 error 4 in slurmctld[40+eb000

Re: [slurm-users] Slurmctld 18.08.1 and 18.08.3 segfault

2018-11-13 Thread Kilian Cavalotti
Hi Bill, On Tue, Nov 13, 2018 at 5:35 PM Bill Broadley wrote: > (gdb) bt > #0 _step_dealloc_lps (step_ptr=0x555787af0f70) at step_mgr.c:2092 > #1 post_job_step (step_ptr=step_ptr@entry=0x555787af0f70) at step_mgr.c:4720 > #2 0x55578571d1d8 in _post_job_step (step_ptr=0x555787af0f70) at >