Re: [slurm-users] [EXT] incorrect number of cpu's being reported in srun job

2021-06-17 Thread Sean Crosby
Hi Sid, On our cluster, it performs just like your PBS cluster. $ srun -N 1 --cpus-per-task 8 --time 01:00:00 --mem 2g --partition physicaltest -q hpcadmin --pty python3 srun: job 27060036 queued and waiting for resources srun: job 27060036 has been allocated resources Python 3.6.8 (default, Aug

[slurm-users] incorrect number of cpu's being reported in srun job

2021-06-17 Thread Sid Young
G'Day all, I've had a question from a user of our new HPC, the following should explain it: ➜ srun -N 1 --cpus-per-task 8 --time 01:00:00 --mem 2g --pty python3 Python 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux Type "help", "copyright", "credits" or "l

Re: [slurm-users] [External] Different max number of jobs in individual and array jobs

2021-06-17 Thread Paul Brunk
Hi: If you (Shaohao) mean you want to limit all running jobs to the sum of up to N non-array-jobs and up to M array jobs, could you have N "local" licenses of LicenseName 'nonarray' (e.g.) and M "local" licenses of LicenseName 'array', and cause job_submit lua to add a request for a license of the

Re: [slurm-users] 答复: how to check what slurm is doing when job pending with reason=none?

2021-06-17 Thread Fulcomer, Samuel
You can specify a partition priority in the partition line in slurm.conf, e.g. Priority=65000 (I forget what the max is...) On Thu, Jun 17, 2021 at 10:31 PM wrote: > Thanks for the help. We tried to reduce the sched_interval and the pending > time decreased as expected. > > But the influence of

[slurm-users] 答复: how to check what slurm is doing when job pending with reason=none?

2021-06-17 Thread taleintervenor
Thanks for the help. We tried to reduce the sched_interval and the pending time decreased as expected. But the influence of 'sched_interval' is global, setting it too small may put pressure on slurmctld server. Since we only want quick response on debug partition (which is designed to let user fre

Re: [slurm-users] [External] sbatch: error: memory allocation failure

2021-06-17 Thread Prentice Bisbal
Mike, You don't include your entire sbatch script, so it's really hard to say what's going wrong when we only have a single line to work with. Based on what you have told us, I'm guessing you are specifying a memory requirement per node greater than 128000. When you specify a nodelist, Slurm

Re: [slurm-users] [External] Re: nodes going to down* and getting stuck in, that state

2021-06-17 Thread Prentice Bisbal
Did you every get this resolved? If so, what was the issue? I see this error: Can't open PID file /var/run/slurmd.pid (yet?)...ory I know systemctl shows slurmd running, but I've had some issues with 'systemctl status' and always like to confirm a daemon is running with 'ps'. Prentice On 6

Re: [slurm-users] [External] Different max number of jobs in individual and array jobs

2021-06-17 Thread Prentice Bisbal
... to complete my thought, I don't think what you want to do is possible. If M is the number of job steps, and N is total jobs, M cannot be greater than N. Prentice On 6/17/21 3:24 PM, Prentice Bisbal wrote: II know I'm a few weeks late with this response. I actually looked into this 4-6 w

Re: [slurm-users] [External] Different max number of jobs in individual and array jobs

2021-06-17 Thread Prentice Bisbal
II know I'm a few weeks late with this response. I actually looked into this 4-6 weeks ago. According to the Slurm documenntation, an individual job step counts as a job when evaluating job limits. Pay attention to the note in the documenation below. From https://slurm.schedmd.com/slurm.conf.ht

Re: [slurm-users] [External] Re: DMTCP or MANA with Slurm?

2021-06-17 Thread Prentice Bisbal
Still no reply to any of my e-mails to the mailing list. I have looked through the archives, and while traffic there is very light, it's all questions from people asking for help who never get it. I'm not the only one who thinks this project is dead: https://sourceforge.net/p/dmtcp/mailman/dmt

[slurm-users] Is anyone running the slurmctld and slurmdbd services from within a container?

2021-06-17 Thread Lee Reynolds
(I apologize if this is a double post, there is conflicting information online for how to send messages to this list). Our current cluster is running Centos 7.9 and we are anticipating setting up a new cluster by the end of the year that will most likely be running one of the Centos 8.x alterna

[slurm-users] Is anyone running the slurmctld and slurmdbd services from within a container?

2021-06-17 Thread Lee Reynolds
Our current cluster is running Centos 7.9 and we are anticipating setting up a new cluster by the end of the year that will most likely be running one of the Centos 8.x alternatives (Rocky/Alma/???) with the latest version of Slurm. Our team is investigating whether it would be appropriate to ru