[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Dan Healy via slurm-users
Following up on this in case anyone can provide some insight, please. On Thu, May 16, 2024 at 8:32 AM Dan Healy wrote: > Hi there, SLURM community, > > I swear I've done this before, but now it's failing on a new cluster I'm > deploying. We have 6 compute nodes with 64 cpu each (384 CPU total).

[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Diego Zuccato via slurm-users
IIUC you can't do that. You either allow overcommit or you split your job in multiple, smaller jobs that fit. The resources you're requesting must be available at the same time: if your job needs 2 CPUs and you want to run it in parallel, just use a job array. If you request 500 CPUs it mean

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Thank you Ahmet, I dont have a firewall active. And because slurmdbd cannot connect to the database I am not able to getting it to be activated through systemctl I will share the output for slurmdbd -D -vvv shortly but overall it is always saying trying to connect to the db and then retries a coupl

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread mercan via slurm-users
Did you try to connect database using mysql command? mysql --user=slurm --password=slurmdbpass  slurm_acct_db C. Ahmet Mercan On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote: Thank you Ahmet, I dont have a firewall active. And because slurmdbd cannot connect to the database I am no

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Yes I can connect to my database using mysql --user=slurm --password=slurmdbpass slurm_acct_db and there is no firewall blocking mysql after checking the firewall question ALso here is the output of slurmdbd -D -vvv (note I can only run this as sudo ) sudo slurmdbd -D -vvv slurmdbd: debug: Log f

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread mercan via slurm-users
You should fix this error, this not a warning. It is an error: "slurmdbd: error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout" error. You can find info at slurm documentation: https://slurm.schedmd.com/accounting.html#slurm-accounting-configuratio

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Brian Andrus via slurm-users
That SIGTERM message means something is telling slurmdbd to quit. Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told to shutdown. If you are running in the foreground, a ^C does that. If you run a kill or killall on it, you will get that same message. Brian Andrus On 5/30

[slurm-users] Slurm version 24.05.0 is now available

2024-05-30 Thread Marshall Garey via slurm-users
We are pleased to announce the availability of Slurm 24.05.0. To highlight some new features in 24.05: - Isolated Job Step management. Enabled on a job-by-job basis with the --stepmgr option, or globally through SlurmctldParameters=enable_stepmgr. - Federation - Allow for client command operati

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Thank you Ahmet and Brian, Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all the cnf files for mysql and I cannot find the data it is displaying here slurmdbd: debug2: Attempting to connect to localhost:3306 slurmdbd: debug2: innodb_buffer_pool_size: 134217728 slurmdbd: debu

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Ok I made some progress here. I removed and purged slurmdbd mysql mariadb etc .. and started from scratch. I added the recommended mysqld requirements Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything worked well When I tried to start the service sudo systemctl start slur

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
manually running it through sudo slurmdbd -D /path/to/conf is very quick on my fresh install trying to start the slurmdbd through systemctl take 3 minutes and then crashes and fail Is there an alternative to systemctl to start the slurmdbd in the background ? But most importantly I wanted to kno

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Ryan Novosielski via slurm-users
Are you looking at the log/what appears on the screen, and do you know for a fact that it is all the way up (should say "version started” at the end)? If that’s not it, you could have a permissions thing or something. I do not expect you’d need to extend the timeout for a normal run. I suspect

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Yes when I run it manually it says something like this [2024-05-31T00:20:01.142] Accounting storage MYSQL plugin loaded [2024-05-31T00:20:01.146] slurmdbd version 19.05.5 started But when I try to do it through systemctl [2024-05-31T00:21:30.953] Terminate signal (SIGINT or SIGTERM) received [20

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
I also run both commands using sudo so I am assuming permission should not be the issue ? my cluster user is root (i know not good, but im testing things out) On Fri, May 31, 2024 at 12:03 AM Radhouane Aniba wrote: > Yes when I run it manually it says something like this > > [2024-05-31T00:20:0