Following up on this in case anyone can provide some insight, please.
On Thu, May 16, 2024 at 8:32 AM Dan Healy wrote:
> Hi there, SLURM community,
>
> I swear I've done this before, but now it's failing on a new cluster I'm
> deploying. We have 6 compute nodes with 64 cpu each (384 CPU total).
IIUC you can't do that.
You either allow overcommit or you split your job in multiple, smaller
jobs that fit.
The resources you're requesting must be available at the same time: if
your job needs 2 CPUs and you want to run it in parallel, just use a job
array. If you request 500 CPUs it mean
Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able to
getting it to be activated through systemctl I will share the output for
slurmdbd -D -vvv shortly but overall it is always saying trying to connect
to the db and then retries a coupl
Did you try to connect database using mysql command?
mysql --user=slurm --password=slurmdbpass slurm_acct_db
C. Ahmet Mercan
On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:
Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am no
Yes I can connect to my database using mysql --user=slurm
--password=slurmdbpass slurm_acct_db and there is no firewall blocking
mysql after checking the firewall question
ALso here is the output of slurmdbd -D -vvv (note I can only run this as
sudo )
sudo slurmdbd -D -vvv
slurmdbd: debug: Log f
You should fix this error, this not a warning. It is an error:
"slurmdbd: error: Database settings not recommended values:
innodb_buffer_pool_size innodb_lock_wait_timeout"
error. You can find info at slurm documentation:
https://slurm.schedmd.com/accounting.html#slurm-accounting-configuratio
That SIGTERM message means something is telling slurmdbd to quit.
Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told
to shutdown. If you are running in the foreground, a ^C does that. If
you run a kill or killall on it, you will get that same message.
Brian Andrus
On 5/30
We are pleased to announce the availability of Slurm 24.05.0.
To highlight some new features in 24.05:
- Isolated Job Step management. Enabled on a job-by-job basis with the
--stepmgr option, or globally through SlurmctldParameters=enable_stepmgr.
- Federation - Allow for client command operati
Thank you Ahmet and Brian,
Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all the
cnf files for mysql and I cannot find the data it is displaying here
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debu
Ok I made some progress here.
I removed and purged slurmdbd mysql mariadb etc .. and started from scratch.
I added the recommended mysqld requirements
Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything
worked well
When I tried to start the service sudo systemctl start slur
manually running it through sudo slurmdbd -D /path/to/conf is very quick on
my fresh install
trying to start the slurmdbd through systemctl take 3 minutes and then
crashes and fail
Is there an alternative to systemctl to start the slurmdbd in the
background ?
But most importantly I wanted to kno
Are you looking at the log/what appears on the screen, and do you know for a
fact that it is all the way up (should say "version started” at the
end)?
If that’s not it, you could have a permissions thing or something.
I do not expect you’d need to extend the timeout for a normal run. I suspect
Yes when I run it manually it says something like this
[2024-05-31T00:20:01.142] Accounting storage MYSQL plugin loaded
[2024-05-31T00:20:01.146] slurmdbd version 19.05.5 started
But when I try to do it through systemctl
[2024-05-31T00:21:30.953] Terminate signal (SIGINT or SIGTERM) received
[20
I also run both commands using sudo so I am assuming permission should not
be the issue ? my cluster user is root (i know not good, but im testing
things out)
On Fri, May 31, 2024 at 12:03 AM Radhouane Aniba wrote:
> Yes when I run it manually it says something like this
>
> [2024-05-31T00:20:0
14 matches
Mail list logo