Re: [slurm-users] Regarding multiple slurm server on one machine

2021-07-27 Thread Valerio Bellizzomi
If you use qemu-kvm beware: qemu-kvm doesn't allow communication of virtual machines with the host, therefore your slurm servers must be all virtual machines. On Wed, 2021-07-28 at 13:55 +1000, Sid Young wrote: > Why not spin them up as Virtual machines... then you could build real > (separate) cl

Re: [slurm-users] Regarding multiple slurm server on one machine

2021-07-27 Thread Sid Young
Why not spin them up as Virtual machines... then you could build real (separate) clusters. Sid Young W: https://off-grid-engineering.com W: (personal) https://sidyoung.com/ W: (personal) https://z900collector.wordpress.com/ On Wed, Jul 28, 2021 at 12:07 AM Brian Andrus wrote: > You can run mul

Re: [slurm-users] Weird one - deleting a user

2021-07-27 Thread Bill Wichser
The cluster doesn't exist though. This was what I tried first. [root@della5 bill]# sacctmgr show RunawayJobs cluster=tukey sacctmgr: error: Slurmctld running on cluster tukey is not up, can't check running jobs Bill On 7/27/21 4:59 PM, Carlos Fenoy wrote: Hi, You can cleanup those jobs w

Re: [slurm-users] Weird one - deleting a user

2021-07-27 Thread Douglas Jacobsen
Try running `sacctmgr show runawayjobs` (or similar see manual to be sure), my bet is that the user has a job apparently running according to the database and this will at least tell you about them. Doug Jacobsen, Ph.D. NERSC Senior Computing Engineer Group Lead, Computational Systems Group Na

Re: [slurm-users] Weird one - deleting a user

2021-07-27 Thread Carlos Fenoy
Hi, You can cleanup those jobs with sacctmgr. https://slurm.schedmd.com/sacctmgr.html sacctmgr show RunawayJobs This will list the runaway jobs, and if any will ask if you want to fix them. Regards, Carlos On Tue, 27 Jul 2021 at 22:49, Bill Wichser wrote: > [root@della5 bill]# sacctmgr -i de

[slurm-users] Weird one - deleting a user

2021-07-27 Thread Bill Wichser
[root@della5 bill]# sacctmgr -i delete user mable Error with request: Job(s) active, cancel job(s) before remove JobID = 602995 C = tukey A = politics U = mable Yup, when a user has an active job they cannot be deleted from the database. The thing is, this cluster tukey has been o

Re: [slurm-users] Regarding multiple slurm server on one machine

2021-07-27 Thread Brian Andrus
You can run multiple slurmctld on one machine, but they have to be on different ports. What you are asking to be able to do, however would not work like you may think. They do not talk to each other. You can run squeue and point to a different config file (one for each master) and get the in

Re: [slurm-users] [EXT] slurmctld.log over 500 MB

2021-07-27 Thread Sean Crosby
Hi Felix, >From one of the recent Slurm user group meetings, the recommended way to >logrotate the Slurm logs is to send SIGUSR2. My logrotate entry is /var/log/slurm/slurmctld.log { compress missingok nocopytruncate nocreate delaycompress nomail notifempty noolddir rotate 5

[slurm-users] slurmctld.log over 500 MB

2021-07-27 Thread Felix
Hello my slurmctld.log is 600 MB. I am looking for a functional method to have log rotate system for slurmctld.log. There is none for slurm now on my system. I have slurm 20.02 on my system. Is there any possibility? Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research

[slurm-users] Regarding multiple slurm server on one machine

2021-07-27 Thread pravin
Hello all, I have a question regarding the slurm. Is it possible for multiple slurm servers to show on one machine. I have three different machines (master, naster1, master2) running with their own slurmctld and compute nodes but I want to show all the slurm information on master3. for example n