Re: [slurm-users] Segfault with 32 processes, OK with 30 ???

2020-10-07 Thread Chris Samuel
On Tuesday, 6 October 2020 12:12:41 AM PDT Diego Zuccato wrote: > At least I couldn't replicate launching manually (it always says "no > slots available" unless I use mpirun -np 16 ...). I'm no MPI expert > (actually less than a noob!) so I can't rule out it's unrelated to > Slurm. I mostly hope t

Re: [slurm-users] unable to run on all the logical cores

2020-10-07 Thread Diego Zuccato
Il 08/10/20 08:19, David Bellot ha scritto: > good spot. At least, scontrol show job is now saying that each job only > requires one "CPU", so it seems all the cores are treated the same way now. > Though I still have the problem of not using more than half the cores. > So I suppose it might be du

Re: [slurm-users] Segfault with 32 processes, OK with 30 ???

2020-10-07 Thread Diego Zuccato
Il 06/10/20 13:45, Riebs, Andy ha scritto: Well, the cluster is quite heterogeneus, and node bl0-02 only have 24 threads available: str957-bl0-02:~$ lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bi

Re: [slurm-users] unable to run on all the logical cores

2020-10-07 Thread David Bellot
Hi Rodrigo, good spot. At least, scontrol show job is now saying that each job only requires one "CPU", so it seems all the cores are treated the same way now. Though I still have the problem of not using more than half the cores. So I suppose it might be due to the way I submit (batchtools in thi

Re: [slurm-users] unable to run on all the logical cores

2020-10-07 Thread Rodrigo Santibáñez
Hi David, I had the same problem time ago when configuring my first server. Could you try SelectTypeParameters=CR_CPU instead of SelectTypeParameters=CR_Core? Best regards, Rodrigo. On Thu, Oct 8, 2020, 02:16 David Bellot wrote: > Hi, > > my Slurm cluster has a dozen machines configured as fo

[slurm-users] unable to run on all the logical cores

2020-10-07 Thread David Bellot
Hi, my Slurm cluster has a dozen machines configured as follows: NodeName=foobar01 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=257243 State=UNKNOWN and scheduling is: # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=

Re: [slurm-users] Simple free for all cluster

2020-10-07 Thread Marcus Wagner
Hi Jason, we intend to have a maximum wallclock time of 5 days. We chose this, to have the possibility to do a timely maintenance without disturbing and killing the users jobs. Yet we see that some users and / or codes need a longer runtime. That is why we set the maxtime for the partitions to

Re: [slurm-users] Simple free for all cluster

2020-10-07 Thread Diego Zuccato
Il 06/10/20 16:53, Jason Simms ha scritto: > FWIW, I define the DefaultTime as 5 minutes, which effectively means for > any "real" job that users must actually define a time. It helps users > get into that habit, because in the absence of a DefaultTime, most will > not even bother to think critical