[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Dan Healy via slurm-users
Following up on this in case anyone can provide some insight, please.

On Thu, May 16, 2024 at 8:32 AM Dan Healy  wrote:

> Hi there, SLURM community,
>
> I swear I've done this before, but now it's failing on a new cluster I'm
> deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I
> run `srun -n 500 hostname`, the task gets queued since there's not 500
> available CPU.
>
> Wasn't there an option that allows for this to be run where the first 384
> tasks execute, and then the remaining execute when resources free up?
>
> Here's my conf:
>
> # Slurm Cgroup Configs used on controllers and workersslurm_cgroup_config:  
> CgroupAutomount: yes  ConstrainCores: yes  ConstrainRAMSpace: yes  
> ConstrainSwapSpace: yes  ConstrainDevices: yes# Slurm conf file 
> settingsslurm_config:  AccountingStorageType: "accounting_storage/slurmdbd"  
> AccountingStorageEnforce: "limits"  AuthAltTypes: "auth/jwt"  ClusterName: 
> "cluster"  AccountingStorageHost : "{{ 
> hostvars[groups['controller'][0]].ansible_hostname }}"  DefMemPerCPU: 1024  
> InactiveLimit: 120  JobAcctGatherType: "jobacct_gather/cgroup"  JobCompType: 
> "jobcomp/none"  MailProg: "/usr/bin/mail"  MaxArraySize: 4  MaxJobCount: 
> 10  MinJobAge: 3600  ProctrackType: "proctrack/cgroup"  ReturnToService: 
> 2  SelectType: "select/cons_tres"  SelectTypeParameters: "CR_Core_Memory"  
> SlurmctldTimeout: 30  SlurmctldLogFile: "/var/log/slurm/slurmctld.log"  
> SlurmdLogFile: "/var/log/slurm/slurmd.log"  SlurmdSpoolDir: 
> "/var/spool/slurm/d"  SlurmUser: "{{ slurm_user.name }}"  SrunPortRange: 
> "6-61000"  StateSaveLocation: "/var/spool/slurm/ctld"  TaskPlugin: 
> "task/affinity,task/cgroup"  UnkillableStepTimeout: 120
>
>
> --
> Thanks,
>
> Daniel Healy
>


-- 
Thanks,

Daniel Healy

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Diego Zuccato via slurm-users

IIUC you can't do that.

You either allow overcommit or you split your job in multiple, smaller 
jobs that fit.


The resources you're requesting must be available at the same time: if 
your job needs 2 CPUs and you want to run it in parallel, just use a job 
array. If you request 500 CPUs it means your job can not run with just 384.


Diego

Il 30/05/2024 11:41, Dan Healy via slurm-users ha scritto:

Following up on this in case anyone can provide some insight, please.

On Thu, May 16, 2024 at 8:32 AM Dan Healy > wrote:


Hi there, SLURM community,

I swear I've done this before, but now it's failing on a new cluster
I'm deploying. We have 6 compute nodes with 64 cpu each (384 CPU
total). When I run `srun -n 500 hostname`, the task gets queued
since there's not 500 available CPU.

Wasn't there an option that allows for this to be run where the
first 384 tasks execute, and then the remaining execute when
resources free up?

Here's my conf:

# Slurm Cgroup Configs used on controllers and workers
slurm_cgroup_config:
CgroupAutomount: yes
ConstrainCores: yes
ConstrainRAMSpace: yes
ConstrainSwapSpace: yes
ConstrainDevices: yes

# Slurm conf file settings
slurm_config:
AccountingStorageType: "accounting_storage/slurmdbd"
AccountingStorageEnforce: "limits"
AuthAltTypes: "auth/jwt"
ClusterName: "cluster"
AccountingStorageHost : "{{
hostvars[groups['controller'][0]].ansible_hostname }}"
DefMemPerCPU: 1024
InactiveLimit: 120
JobAcctGatherType: "jobacct_gather/cgroup"
JobCompType: "jobcomp/none"
MailProg: "/usr/bin/mail"
MaxArraySize: 4
MaxJobCount: 10
MinJobAge: 3600
ProctrackType: "proctrack/cgroup"
ReturnToService: 2
SelectType: "select/cons_tres"
SelectTypeParameters: "CR_Core_Memory"
SlurmctldTimeout: 30
SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdSpoolDir: "/var/spool/slurm/d"
SlurmUser: "{{ slurm_user.name  }}"
SrunPortRange: "6-61000"
StateSaveLocation: "/var/spool/slurm/ctld"
TaskPlugin: "task/affinity,task/cgroup"
UnkillableStepTimeout: 120


-- 
Thanks,


Daniel Healy



--
Thanks,

Daniel Healy




--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able to
getting it to be activated through systemctl I will share the output for
slurmdbd -D -vvv shortly but overall it is always saying trying to connect
to the db and then retries a couple of times and crashes

R.




On Thu, May 30, 2024 at 2:51 AM mercan  wrote:

> Hi;
>
> Did you check can you connect db with your conf parameters from head-node:
>
> mysql --user=slurm --password=slurmdbpass  slurm_acct_db
>
> Also, check and stop firewall and selinux, if they are running.
>
> Last, you can stop slurmdbd, then run run terminal with:
>
> slurmdbd -D -vvv
>
> Regards;
>
> C. Ahmet Mercan
>
> On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:
>
> Hi everyone
> I am trying to get slurmdbd to run on my local home server but I am really
> struggling.
> Note : am a novice slurm user
> my slurmdbd always times out even though all the details in the conf file
> are correct
>
> My log looks like this
>
> [2024-05-29T20:51:30.088] Accounting storage MYSQL plugin loaded
> [2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
> [2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
> [2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
> [2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
> [2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
> [2024-05-29T20:51:30.088] debug2: CommitDelay = 0
> [2024-05-29T20:51:30.088] debug2: DbdAddr = localhost
> [2024-05-29T20:51:30.088] debug2: DbdBackupHost = (null)
> [2024-05-29T20:51:30.088] debug2: DbdHost = head-node
> [2024-05-29T20:51:30.088] debug2: DbdPort = 7032
> [2024-05-29T20:51:30.088] debug2: DebugFlags = (null)
> [2024-05-29T20:51:30.088] debug2: DebugLevel = 6
> [2024-05-29T20:51:30.088] debug2: DebugLevelSyslog = 10
> [2024-05-29T20:51:30.088] debug2: DefaultQOS = (null)
> [2024-05-29T20:51:30.088] debug2: LogFile = /var/log/slurmdbd.log
> [2024-05-29T20:51:30.088] debug2: MessageTimeout = 100
> [2024-05-29T20:51:30.088] debug2: Parameters = (null)
> [2024-05-29T20:51:30.088] debug2: PidFile = /run/slurmdbd.pid
> [2024-05-29T20:51:30.088] debug2: PluginDir =
> /usr/lib/x86_64-linux-gnu/slurm-wlm
> [2024-05-29T20:51:30.088] debug2: PrivateData = none
> [2024-05-29T20:51:30.088] debug2: PurgeEventAfter = 1 months*
> [2024-05-29T20:51:30.088] debug2: PurgeJobAfter = 12 months*
> [2024-05-29T20:51:30.088] debug2: PurgeResvAfter = 1 months*
> [2024-05-29T20:51:30.088] debug2: PurgeStepAfter = 1 months
> [2024-05-29T20:51:30.088] debug2: PurgeSuspendAfter = 1 months
> [2024-05-29T20:51:30.088] debug2: PurgeTXNAfter = 12 months
> [2024-05-29T20:51:30.088] debug2: PurgeUsageAfter = 24 months
> [2024-05-29T20:51:30.088] debug2: SlurmUser = root(0)
> [2024-05-29T20:51:30.089] debug2: StorageBackupHost = (null)
> [2024-05-29T20:51:30.089] debug2: StorageHost = localhost
> [2024-05-29T20:51:30.089] debug2: StorageLoc = slurm_acct_db
> [2024-05-29T20:51:30.089] debug2: StoragePort = 3306
> [2024-05-29T20:51:30.089] debug2: StorageType = accounting_storage/mysql
> [2024-05-29T20:51:30.089] debug2: StorageUser = slurm
> [2024-05-29T20:51:30.089] debug2: TCPTimeout = 2
> [2024-05-29T20:51:30.089] debug2: TrackWCKey = 0
> [2024-05-29T20:51:30.089] debug2: TrackSlurmctldDown= 0
> [2024-05-29T20:51:30.089] debug2: acct_storage_p_get_connection: request
> new connection 1
> [2024-05-29T20:51:30.089] debug2: Attempting to connect to localhost:3306
> [2024-05-29T20:51:30.090] slurmdbd version 19.05.5 started
> [2024-05-29T20:51:30.090] debug2: running rollup at Wed May 29 20:51:30
> 2024
> [2024-05-29T20:51:30.091] debug2: Everything rolled up
> [2024-05-29T20:51:49.673] Terminate signal (SIGINT or SIGTERM) received
> [2024-05-29T20:51:49.673] debug: rpc_mgr shutting down
>
>
>
> my config file looks like this
>
> ArchiveEvents=yes
> ArchiveJobs=yes
> ArchiveResvs=yes
> ArchiveSteps=no
> ArchiveSuspend=no
> ArchiveTXN=no
> ArchiveUsage=no
> PurgeEventAfter=1month
> PurgeJobAfter=12month
> PurgeResvAfter=1month
> PurgeStepAfter=1month
> PurgeSuspendAfter=1month
> PurgeTXNAfter=12month
> PurgeUsageAfter=24month
> # Authentication info
> AuthType=auth/munge
> # slurmDBD info
> DbdAddr=localhost
> DbdHost=head-node
> DbdPort=7032
> SlurmUser=root
> MessageTimeout=100
> DebugLevel=5
> #DefaultQOS=normal,standby
> LogFile=/var/log/slurmdbd.log
> PidFile=/run/slurmdbd.pid
> #PrivateData=accounts,users,usage,jobs
> #TrackWCKey=yes
> #
> # Database info
> StorageType=accounting_storage/mysql
> StorageHost=localhost
> StoragePort=3306
> StoragePass=slurmdbpass
> StorageUser=slurm
> StorageLoc=slurm_acct_db
> I used standard names and passwords to get started and I will change later
>
> but everytime I try to start slurmdbd.service it crashes and I have that
> log that I shared with you
>
> I use these versions
>
> slurmdbd -V
> slurm-wlm 19.05.5
> mysql Ver 15.1 Distrib 10.3.39-MariaDB, for debian-linux-gnu (x86_64)
> using readline 5.2
> Everything else Is working p

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread mercan via slurm-users

Did you try to connect database using mysql command?

mysql --user=slurm --password=slurmdbpass  slurm_acct_db


C. Ahmet Mercan

On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:

Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able to 
getting it to be activated through systemctl I will share the output 
for slurmdbd -D -vvv shortly but overall it is always saying trying to 
connect to the db and then retries a couple of times and crashes


R.




On Thu, May 30, 2024 at 2:51 AM mercan  
wrote:


Hi;

Did you check can you connect db with your conf parameters from
head-node:

mysql --user=slurm --password=slurmdbpass  slurm_acct_db

Also, check and stop firewall and selinux, if they are running.

Last, you can stop slurmdbd, then run run terminal with:

slurmdbd -D -vvv

Regards;

C. Ahmet Mercan

On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:

Hi everyone
I am trying to get slurmdbd to run on my local home server but I
am really struggling.
Note : am a novice slurm user
my slurmdbd always times out even though all the details in the
conf file are correct

My log looks like this

[2024-05-29T20:51:30.088] Accounting storage MYSQL plugin loaded
[2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
[2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
[2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
[2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
[2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
[2024-05-29T20:51:30.088] debug2: CommitDelay = 0
[2024-05-29T20:51:30.088] debug2: DbdAddr = localhost
[2024-05-29T20:51:30.088] debug2: DbdBackupHost = (null)
[2024-05-29T20:51:30.088] debug2: DbdHost = head-node
[2024-05-29T20:51:30.088] debug2: DbdPort = 7032
[2024-05-29T20:51:30.088] debug2: DebugFlags = (null)
[2024-05-29T20:51:30.088] debug2: DebugLevel = 6
[2024-05-29T20:51:30.088] debug2: DebugLevelSyslog = 10
[2024-05-29T20:51:30.088] debug2: DefaultQOS = (null)
[2024-05-29T20:51:30.088] debug2: LogFile = /var/log/slurmdbd.log
[2024-05-29T20:51:30.088] debug2: MessageTimeout = 100
[2024-05-29T20:51:30.088] debug2: Parameters = (null)
[2024-05-29T20:51:30.088] debug2: PidFile = /run/slurmdbd.pid
[2024-05-29T20:51:30.088] debug2: PluginDir =
/usr/lib/x86_64-linux-gnu/slurm-wlm
[2024-05-29T20:51:30.088] debug2: PrivateData = none
[2024-05-29T20:51:30.088] debug2: PurgeEventAfter = 1 months*
[2024-05-29T20:51:30.088] debug2: PurgeJobAfter = 12 months*
[2024-05-29T20:51:30.088] debug2: PurgeResvAfter = 1 months*
[2024-05-29T20:51:30.088] debug2: PurgeStepAfter = 1 months
[2024-05-29T20:51:30.088] debug2: PurgeSuspendAfter = 1 months
[2024-05-29T20:51:30.088] debug2: PurgeTXNAfter = 12 months
[2024-05-29T20:51:30.088] debug2: PurgeUsageAfter = 24 months
[2024-05-29T20:51:30.088] debug2: SlurmUser = root(0)
[2024-05-29T20:51:30.089] debug2: StorageBackupHost = (null)
[2024-05-29T20:51:30.089] debug2: StorageHost = localhost
[2024-05-29T20:51:30.089] debug2: StorageLoc = slurm_acct_db
[2024-05-29T20:51:30.089] debug2: StoragePort = 3306
[2024-05-29T20:51:30.089] debug2: StorageType =
accounting_storage/mysql
[2024-05-29T20:51:30.089] debug2: StorageUser = slurm
[2024-05-29T20:51:30.089] debug2: TCPTimeout = 2
[2024-05-29T20:51:30.089] debug2: TrackWCKey = 0
[2024-05-29T20:51:30.089] debug2: TrackSlurmctldDown= 0
[2024-05-29T20:51:30.089] debug2: acct_storage_p_get_connection:
request new connection 1
[2024-05-29T20:51:30.089] debug2: Attempting to connect to
localhost:3306
[2024-05-29T20:51:30.090] slurmdbd version 19.05.5 started
[2024-05-29T20:51:30.090] debug2: running rollup at Wed May 29
20:51:30 2024
[2024-05-29T20:51:30.091] debug2: Everything rolled up
[2024-05-29T20:51:49.673] Terminate signal (SIGINT or SIGTERM)
received
[2024-05-29T20:51:49.673] debug: rpc_mgr shutting down



my config file looks like this

ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
ArchiveSuspend=no
ArchiveTXN=no
ArchiveUsage=no
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
# Authentication info
AuthType=auth/munge
# slurmDBD info
DbdAddr=localhost
DbdHost=head-node
DbdPort=7032
SlurmUser=root
MessageTimeout=100
DebugLevel=5
#DefaultQOS=normal,standby
LogFile=/var/log/slurmdbd.log
PidFile=/run/slurmdbd.pid
#PrivateData=accounts,users,usage,jobs
#TrackWCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePort=3306
StoragePass=slurmdbpass
StorageUser=s

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Yes I can connect to my database using mysql --user=slurm
--password=slurmdbpass  slurm_acct_db and there is no firewall blocking
mysql after checking the firewall question

ALso here is the output of slurmdbd -D -vvv (note I can only run this as
sudo )

sudo slurmdbd -D -vvv
slurmdbd: debug: Log file re-opened
slurmdbd: debug: Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values:
innodb_buffer_pool_size innodb_lock_wait_timeout
slurmdbd: Accounting storage MYSQL plugin loaded
slurmdbd: debug2: ArchiveDir = /tmp
slurmdbd: debug2: ArchiveScript = (null)
slurmdbd: debug2: AuthAltTypes = (null)
slurmdbd: debug2: AuthInfo = (null)
slurmdbd: debug2: AuthType = auth/munge
slurmdbd: debug2: CommitDelay = 0
slurmdbd: debug2: DbdAddr = localhost
slurmdbd: debug2: DbdBackupHost = (null)
slurmdbd: debug2: DbdHost = hannibal-hn
slurmdbd: debug2: DbdPort = 7032
slurmdbd: debug2: DebugFlags = (null)
slurmdbd: debug2: DebugLevel = 6
slurmdbd: debug2: DebugLevelSyslog = 10
slurmdbd: debug2: DefaultQOS = (null)
slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
slurmdbd: debug2: MessageTimeout = 100
slurmdbd: debug2: Parameters = (null)
slurmdbd: debug2: PidFile = /run/slurmdbd.pid
slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
slurmdbd: debug2: PrivateData = none
slurmdbd: debug2: PurgeEventAfter = 1 months*
slurmdbd: debug2: PurgeJobAfter = 12 months*
slurmdbd: debug2: PurgeResvAfter = 1 months*
slurmdbd: debug2: PurgeStepAfter = 1 months
slurmdbd: debug2: PurgeSuspendAfter = 1 months
slurmdbd: debug2: PurgeTXNAfter = 12 months
slurmdbd: debug2: PurgeUsageAfter = 24 months
slurmdbd: debug2: SlurmUser = root(0)
slurmdbd: debug2: StorageBackupHost = (null)
slurmdbd: debug2: StorageHost = localhost
slurmdbd: debug2: StorageLoc = slurm_acct_db
slurmdbd: debug2: StoragePort = 3306
slurmdbd: debug2: StorageType = accounting_storage/mysql
slurmdbd: debug2: StorageUser = slurm
slurmdbd: debug2: TCPTimeout = 2
slurmdbd: debug2: TrackWCKey = 0
slurmdbd: debug2: TrackSlurmctldDown= 0
slurmdbd: debug2: acct_storage_p_get_connection: request new connection 1
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: slurmdbd version 19.05.5 started
slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
slurmdbd: debug2: Everything rolled up


It goes like this for some time and then it crashes with this message

slurmdbd: Terminate signal (SIGINT or SIGTERM) received
slurmdbd: debug: rpc_mgr shutting down


On Thu, May 30, 2024 at 8:18 AM mercan  wrote:

> Did you try to connect database using mysql command?
>
> mysql --user=slurm --password=slurmdbpass  slurm_acct_db
>
> C. Ahmet Mercan
>
> On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:
>
> Thank you Ahmet,
> I dont have a firewall active.
> And because slurmdbd cannot connect to the database I am not able to
> getting it to be activated through systemctl I will share the output for
> slurmdbd -D -vvv shortly but overall it is always saying trying to connect
> to the db and then retries a couple of times and crashes
>
> R.
>
>
>
>
> On Thu, May 30, 2024 at 2:51 AM mercan 
> wrote:
>
>> Hi;
>>
>> Did you check can you connect db with your conf parameters from head-node:
>>
>> mysql --user=slurm --password=slurmdbpass  slurm_acct_db
>>
>> Also, check and stop firewall and selinux, if they are running.
>>
>> Last, you can stop slurmdbd, then run run terminal with:
>>
>> slurmdbd -D -vvv
>>
>> Regards;
>>
>> C. Ahmet Mercan
>>
>> On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:
>>
>> Hi everyone
>> I am trying to get slurmdbd to run on my local home server but I am
>> really struggling.
>> Note : am a novice slurm user
>> my slurmdbd always times out even though all the details in the conf file
>> are correct
>>
>> My log looks like this
>>
>> [2024-05-29T20:51:30.088] Accounting storage MYSQL plugin loaded
>> [2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
>> [2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
>> [2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
>> [2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
>> [2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
>> [2024-05-29T20:51:30.088] debug2: CommitDelay = 0
>> [2024-05-29T20:51:30.088] debug2: DbdAddr = localhost
>> [2024-05-29T20:51:30.088] debug2: DbdBackupHost = (null)
>> [2024-05-29T20:51:30.088] debug2: DbdHost = head-node
>> [2024-05-29T20:51:30.088] debug2: DbdPort = 7032
>> [2024-05-29T20:51:30.088] debug2: DebugFlags = (null)
>> [2024-05-29T20:51:30.088] debug2: DebugLevel = 6
>> [2024-05-29T20:51:30.088] debug2: DebugLevelSyslog = 10
>> [2024-05-29T20:51:30.088] debug2: DefaultQOS = (null)
>> [2024-05-29T20:51:30.088] debug2: LogFile

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread mercan via slurm-users

You should fix this error, this not a warning. It is an error:

"slurmdbd: error: Database settings not recommended values: 
innodb_buffer_pool_size innodb_lock_wait_timeout"


error. You can find info at slurm documentation:

https://slurm.schedmd.com/accounting.html#slurm-accounting-configuration-before-build


C. Ahmet Mercan


30.05.2024 16:53 tarihinde Radhouane Aniba via slurm-users yazdı:
Yes I can connect to my database using mysql --user=slurm 
--password=slurmdbpass  slurm_acct_db and there is no firewall 
blocking mysql after checking the firewall question


ALso here is the output of slurmdbd -D -vvv (note I can only run this 
as sudo )


sudo slurmdbd -D -vvv
slurmdbd: debug: Log file re-opened
slurmdbd: debug: Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: 
innodb_buffer_pool_size innodb_lock_wait_timeout

slurmdbd: Accounting storage MYSQL plugin loaded
slurmdbd: debug2: ArchiveDir = /tmp
slurmdbd: debug2: ArchiveScript = (null)
slurmdbd: debug2: AuthAltTypes = (null)
slurmdbd: debug2: AuthInfo = (null)
slurmdbd: debug2: AuthType = auth/munge
slurmdbd: debug2: CommitDelay = 0
slurmdbd: debug2: DbdAddr = localhost
slurmdbd: debug2: DbdBackupHost = (null)
slurmdbd: debug2: DbdHost = hannibal-hn
slurmdbd: debug2: DbdPort = 7032
slurmdbd: debug2: DebugFlags = (null)
slurmdbd: debug2: DebugLevel = 6
slurmdbd: debug2: DebugLevelSyslog = 10
slurmdbd: debug2: DefaultQOS = (null)
slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
slurmdbd: debug2: MessageTimeout = 100
slurmdbd: debug2: Parameters = (null)
slurmdbd: debug2: PidFile = /run/slurmdbd.pid
slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
slurmdbd: debug2: PrivateData = none
slurmdbd: debug2: PurgeEventAfter = 1 months*
slurmdbd: debug2: PurgeJobAfter = 12 months*
slurmdbd: debug2: PurgeResvAfter = 1 months*
slurmdbd: debug2: PurgeStepAfter = 1 months
slurmdbd: debug2: PurgeSuspendAfter = 1 months
slurmdbd: debug2: PurgeTXNAfter = 12 months
slurmdbd: debug2: PurgeUsageAfter = 24 months
slurmdbd: debug2: SlurmUser = root(0)
slurmdbd: debug2: StorageBackupHost = (null)
slurmdbd: debug2: StorageHost = localhost
slurmdbd: debug2: StorageLoc = slurm_acct_db
slurmdbd: debug2: StoragePort = 3306
slurmdbd: debug2: StorageType = accounting_storage/mysql
slurmdbd: debug2: StorageUser = slurm
slurmdbd: debug2: TCPTimeout = 2
slurmdbd: debug2: TrackWCKey = 0
slurmdbd: debug2: TrackSlurmctldDown= 0
slurmdbd: debug2: acct_storage_p_get_connection: request new connection 1
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: slurmdbd version 19.05.5 started
slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
slurmdbd: debug2: Everything rolled up


It goes like this for some time and then it crashes with this message

slurmdbd: Terminate signal (SIGINT or SIGTERM) received
slurmdbd: debug: rpc_mgr shutting down


On Thu, May 30, 2024 at 8:18 AM mercan  
wrote:


Did you try to connect database using mysql command?

mysql --user=slurm --password=slurmdbpass slurm_acct_db

C. Ahmet Mercan

On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:

Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able
to getting it to be activated through systemctl I will share the
output for slurmdbd -D -vvv shortly but overall it is always
saying trying to connect to the db and then retries a couple of
times and crashes

R.




On Thu, May 30, 2024 at 2:51 AM mercan
 wrote:

Hi;

Did you check can you connect db with your conf parameters
from head-node:

mysql --user=slurm --password=slurmdbpass slurm_acct_db

Also, check and stop firewall and selinux, if they are running.

Last, you can stop slurmdbd, then run run terminal with:

slurmdbd -D -vvv

Regards;

C. Ahmet Mercan

On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:

Hi everyone
I am trying to get slurmdbd to run on my local home server
but I am really struggling.
Note : am a novice slurm user
my slurmdbd always times out even though all the details in
the conf file are correct

My log looks like this

[2024-05-29T20:51:30.088] Accounting storage MYSQL plugin
loaded
[2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
[2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
[2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
[2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
[2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
[2024-05-

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Brian Andrus via slurm-users

That SIGTERM message means something is telling slurmdbd to quit.

Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told 
to shutdown. If you are running in the foreground, a ^C does that. If 
you run a kill or killall on it, you will get that same message.


Brian Andrus

On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote:
Yes I can connect to my database using mysql --user=slurm 
--password=slurmdbpass  slurm_acct_db and there is no firewall 
blocking mysql after checking the firewall question


ALso here is the output of slurmdbd -D -vvv (note I can only run this 
as sudo )


sudo slurmdbd -D -vvv
slurmdbd: debug: Log file re-opened
slurmdbd: debug: Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: 
innodb_buffer_pool_size innodb_lock_wait_timeout

slurmdbd: Accounting storage MYSQL plugin loaded
slurmdbd: debug2: ArchiveDir = /tmp
slurmdbd: debug2: ArchiveScript = (null)
slurmdbd: debug2: AuthAltTypes = (null)
slurmdbd: debug2: AuthInfo = (null)
slurmdbd: debug2: AuthType = auth/munge
slurmdbd: debug2: CommitDelay = 0
slurmdbd: debug2: DbdAddr = localhost
slurmdbd: debug2: DbdBackupHost = (null)
slurmdbd: debug2: DbdHost = hannibal-hn
slurmdbd: debug2: DbdPort = 7032
slurmdbd: debug2: DebugFlags = (null)
slurmdbd: debug2: DebugLevel = 6
slurmdbd: debug2: DebugLevelSyslog = 10
slurmdbd: debug2: DefaultQOS = (null)
slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
slurmdbd: debug2: MessageTimeout = 100
slurmdbd: debug2: Parameters = (null)
slurmdbd: debug2: PidFile = /run/slurmdbd.pid
slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
slurmdbd: debug2: PrivateData = none
slurmdbd: debug2: PurgeEventAfter = 1 months*
slurmdbd: debug2: PurgeJobAfter = 12 months*
slurmdbd: debug2: PurgeResvAfter = 1 months*
slurmdbd: debug2: PurgeStepAfter = 1 months
slurmdbd: debug2: PurgeSuspendAfter = 1 months
slurmdbd: debug2: PurgeTXNAfter = 12 months
slurmdbd: debug2: PurgeUsageAfter = 24 months
slurmdbd: debug2: SlurmUser = root(0)
slurmdbd: debug2: StorageBackupHost = (null)
slurmdbd: debug2: StorageHost = localhost
slurmdbd: debug2: StorageLoc = slurm_acct_db
slurmdbd: debug2: StoragePort = 3306
slurmdbd: debug2: StorageType = accounting_storage/mysql
slurmdbd: debug2: StorageUser = slurm
slurmdbd: debug2: TCPTimeout = 2
slurmdbd: debug2: TrackWCKey = 0
slurmdbd: debug2: TrackSlurmctldDown= 0
slurmdbd: debug2: acct_storage_p_get_connection: request new connection 1
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: slurmdbd version 19.05.5 started
slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
slurmdbd: debug2: Everything rolled up


It goes like this for some time and then it crashes with this message

slurmdbd: Terminate signal (SIGINT or SIGTERM) received
slurmdbd: debug: rpc_mgr shutting down


On Thu, May 30, 2024 at 8:18 AM mercan  
wrote:


Did you try to connect database using mysql command?

mysql --user=slurm --password=slurmdbpass slurm_acct_db

C. Ahmet Mercan

On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:

Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able
to getting it to be activated through systemctl I will share the
output for slurmdbd -D -vvv shortly but overall it is always
saying trying to connect to the db and then retries a couple of
times and crashes

R.




On Thu, May 30, 2024 at 2:51 AM mercan
 wrote:

Hi;

Did you check can you connect db with your conf parameters
from head-node:

mysql --user=slurm --password=slurmdbpass slurm_acct_db

Also, check and stop firewall and selinux, if they are running.

Last, you can stop slurmdbd, then run run terminal with:

slurmdbd -D -vvv

Regards;

C. Ahmet Mercan

On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:

Hi everyone
I am trying to get slurmdbd to run on my local home server
but I am really struggling.
Note : am a novice slurm user
my slurmdbd always times out even though all the details in
the conf file are correct

My log looks like this

[2024-05-29T20:51:30.088] Accounting storage MYSQL plugin
loaded
[2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
[2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
[2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
[2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
[2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
[2024-05-29T20:51:30.088] debug2: CommitDelay = 0
  

[slurm-users] Slurm version 24.05.0 is now available

2024-05-30 Thread Marshall Garey via slurm-users

We are pleased to announce the availability of Slurm 24.05.0.

To highlight some new features in 24.05:

- Isolated Job Step management. Enabled on a job-by-job basis with the 
--stepmgr option, or globally through SlurmctldParameters=enable_stepmgr.
- Federation - Allow for client command operation while SlurmDBD is 
unavailable.

- New MaxTRESRunMinsPerAccount and MaxTRESRunMinsPerUser QOS limits.
- New USER_DELETE reservation flag.
- New Flags=rebootless option on Features for node_features/helpers 
which indicates the given feature can be enabled without rebooting the node.
- Cloud power management options: New "max_powered_nodes=" option 
in SlurmctldParamters, and new SuspendExcNodes=: syntax 
allowing for  nodes out of a given node list to be excluded.
- StdIn/StdOut/StdErr now stored in SlurmDBD accounting records for 
batch jobs.
- New switch/nvidia_imex plugin for IMEX channel management on NVIDIA 
systems.
- New RestrictedCoresPerGPU option at the Node level, designed to ensure 
GPU workloads always have access to a certain number of CPUs even when 
nodes are running non-GPU workloads concurrently.


The Slurm documentation has also been updated to the 24.05 release. 
(Older versions can be found in the archive, linked from the main 
documentation page.)


Slurm can be downloaded from https://www.schedmd.com/downloads.php .

--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Thank you Ahmet and Brian,

Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all the
cnf files for mysql and I cannot find the data it is displaying here

slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values:
innodb_buffer_pool_size innodb_lock_wait_timeout


sudo tree /etc/mysql/*
/etc/mysql/conf.d
├── mysql.cnf
└── mysqldump.cnf
/etc/mysql/debian.cnf
/etc/mysql/debian-start
/etc/mysql/FROZEN
/etc/mysql/mariadb.cnf
/etc/mysql/mariadb.conf.d
├── 50-client.cnf
├── 50-mysql-clients.cnf
├── 50-mysqld_safe.cnf
└── 50-server.cnf
/etc/mysql/my.cnf
/etc/mysql/my.cnf.fallback
/etc/mysql/mysql.cnf
/etc/mysql/mysql.conf.d
├── mysql.cnf
└── mysqld.cnf

On Thu, May 30, 2024 at 12:21 PM Brian Andrus via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> That SIGTERM message means something is telling slurmdbd to quit.
>
> Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told to
> shutdown. If you are running in the foreground, a ^C does that. If you run
> a kill or killall on it, you will get that same message.
>
> Brian Andrus
> On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote:
>
> Yes I can connect to my database using mysql --user=slurm
> --password=slurmdbpass  slurm_acct_db and there is no firewall blocking
> mysql after checking the firewall question
>
> ALso here is the output of slurmdbd -D -vvv (note I can only run this as
> sudo )
>
> sudo slurmdbd -D -vvv
> slurmdbd: debug: Log file re-opened
> slurmdbd: debug: Munge authentication plugin loaded
> slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
> slurmdbd: debug2: Attempting to connect to localhost:3306
> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
> slurmdbd: debug2: innodb_log_file_size: 50331648
> slurmdbd: debug2: innodb_lock_wait_timeout: 50
> slurmdbd: error: Database settings not recommended values:
> innodb_buffer_pool_size innodb_lock_wait_timeout
> slurmdbd: Accounting storage MYSQL plugin loaded
> slurmdbd: debug2: ArchiveDir = /tmp
> slurmdbd: debug2: ArchiveScript = (null)
> slurmdbd: debug2: AuthAltTypes = (null)
> slurmdbd: debug2: AuthInfo = (null)
> slurmdbd: debug2: AuthType = auth/munge
> slurmdbd: debug2: CommitDelay = 0
> slurmdbd: debug2: DbdAddr = localhost
> slurmdbd: debug2: DbdBackupHost = (null)
> slurmdbd: debug2: DbdHost = hannibal-hn
> slurmdbd: debug2: DbdPort = 7032
> slurmdbd: debug2: DebugFlags = (null)
> slurmdbd: debug2: DebugLevel = 6
> slurmdbd: debug2: DebugLevelSyslog = 10
> slurmdbd: debug2: DefaultQOS = (null)
> slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
> slurmdbd: debug2: MessageTimeout = 100
> slurmdbd: debug2: Parameters = (null)
> slurmdbd: debug2: PidFile = /run/slurmdbd.pid
> slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
> slurmdbd: debug2: PrivateData = none
> slurmdbd: debug2: PurgeEventAfter = 1 months*
> slurmdbd: debug2: PurgeJobAfter = 12 months*
> slurmdbd: debug2: PurgeResvAfter = 1 months*
> slurmdbd: debug2: PurgeStepAfter = 1 months
> slurmdbd: debug2: PurgeSuspendAfter = 1 months
> slurmdbd: debug2: PurgeTXNAfter = 12 months
> slurmdbd: debug2: PurgeUsageAfter = 24 months
> slurmdbd: debug2: SlurmUser = root(0)
> slurmdbd: debug2: StorageBackupHost = (null)
> slurmdbd: debug2: StorageHost = localhost
> slurmdbd: debug2: StorageLoc = slurm_acct_db
> slurmdbd: debug2: StoragePort = 3306
> slurmdbd: debug2: StorageType = accounting_storage/mysql
> slurmdbd: debug2: StorageUser = slurm
> slurmdbd: debug2: TCPTimeout = 2
> slurmdbd: debug2: TrackWCKey = 0
> slurmdbd: debug2: TrackSlurmctldDown= 0
> slurmdbd: debug2: acct_storage_p_get_connection: request new connection 1
> slurmdbd: debug2: Attempting to connect to localhost:3306
> slurmdbd: slurmdbd version 19.05.5 started
> slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
> slurmdbd: debug2: Everything rolled up
>
>
> It goes like this for some time and then it crashes with this message
>
> slurmdbd: Terminate signal (SIGINT or SIGTERM) received
> slurmdbd: debug: rpc_mgr shutting down
>
>
> On Thu, May 30, 2024 at 8:18 AM mercan 
> wrote:
>
>> Did you try to connect database using mysql command?
>>
>> mysql --user=slurm --password=slurmdbpass  slurm_acct_db
>>
>> C. Ahmet Mercan
>>
>> On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:
>>
>> Thank you Ahmet,
>> I dont have a firewall active.
>> And because slurmdbd cannot connect to the database I am not able to
>> getting it to be activated through systemctl I will share the output for
>> slurmdbd -D -vvv shortly but overall it is always saying trying to connect
>> to the db and then retries a couple of times and crashes
>>
>> R.
>>
>>
>>
>>
>> On Thu, May 30, 2024 at 2:51 AM mercan 
>> wrote:
>>
>>> Hi;
>>>
>>> Did you check can you connect db with your conf parameters fr

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Ok I made some progress here.

I removed and purged slurmdbd mysql mariadb etc .. and started from scratch.
I added the recommended mysqld requirements

Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything
worked well

When I tried to start the service sudo systemctl start slurmdbd.service  it
didnt work

sudo systemctl status  slurmdbd.service
● slurmdbd.service - Slurm DBD accounting daemon
 Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
preset: enabled)
 Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC;
2min 5s ago
Process: 6258 ExecStart=/usr/sbin/slurmdbd -D
/etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS)

May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting
daemon...
May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation
timed out. Terminating.
May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with
result 'timeout'.
May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD
accounting daemon.

Even though it is the same command ?!

Any idea ?


On Thu, May 30, 2024 at 5:02 PM Radhouane Aniba  wrote:

> Thank you Ahmet and Brian,
>
> Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all
> the cnf files for mysql and I cannot find the data it is displaying here
>
> slurmdbd: debug2: Attempting to connect to localhost:3306
> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
> slurmdbd: debug2: innodb_log_file_size: 50331648
> slurmdbd: debug2: innodb_lock_wait_timeout: 50
> slurmdbd: error: Database settings not recommended values:
> innodb_buffer_pool_size innodb_lock_wait_timeout
>
>
> sudo tree /etc/mysql/*
> /etc/mysql/conf.d
> ├── mysql.cnf
> └── mysqldump.cnf
> /etc/mysql/debian.cnf
> /etc/mysql/debian-start
> /etc/mysql/FROZEN
> /etc/mysql/mariadb.cnf
> /etc/mysql/mariadb.conf.d
> ├── 50-client.cnf
> ├── 50-mysql-clients.cnf
> ├── 50-mysqld_safe.cnf
> └── 50-server.cnf
> /etc/mysql/my.cnf
> /etc/mysql/my.cnf.fallback
> /etc/mysql/mysql.cnf
> /etc/mysql/mysql.conf.d
> ├── mysql.cnf
> └── mysqld.cnf
>
> On Thu, May 30, 2024 at 12:21 PM Brian Andrus via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> That SIGTERM message means something is telling slurmdbd to quit.
>>
>> Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told to
>> shutdown. If you are running in the foreground, a ^C does that. If you run
>> a kill or killall on it, you will get that same message.
>>
>> Brian Andrus
>> On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote:
>>
>> Yes I can connect to my database using mysql --user=slurm
>> --password=slurmdbpass  slurm_acct_db and there is no firewall blocking
>> mysql after checking the firewall question
>>
>> ALso here is the output of slurmdbd -D -vvv (note I can only run this as
>> sudo )
>>
>> sudo slurmdbd -D -vvv
>> slurmdbd: debug: Log file re-opened
>> slurmdbd: debug: Munge authentication plugin loaded
>> slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
>> slurmdbd: debug2: Attempting to connect to localhost:3306
>> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
>> slurmdbd: debug2: innodb_log_file_size: 50331648
>> slurmdbd: debug2: innodb_lock_wait_timeout: 50
>> slurmdbd: error: Database settings not recommended values:
>> innodb_buffer_pool_size innodb_lock_wait_timeout
>> slurmdbd: Accounting storage MYSQL plugin loaded
>> slurmdbd: debug2: ArchiveDir = /tmp
>> slurmdbd: debug2: ArchiveScript = (null)
>> slurmdbd: debug2: AuthAltTypes = (null)
>> slurmdbd: debug2: AuthInfo = (null)
>> slurmdbd: debug2: AuthType = auth/munge
>> slurmdbd: debug2: CommitDelay = 0
>> slurmdbd: debug2: DbdAddr = localhost
>> slurmdbd: debug2: DbdBackupHost = (null)
>> slurmdbd: debug2: DbdHost = hannibal-hn
>> slurmdbd: debug2: DbdPort = 7032
>> slurmdbd: debug2: DebugFlags = (null)
>> slurmdbd: debug2: DebugLevel = 6
>> slurmdbd: debug2: DebugLevelSyslog = 10
>> slurmdbd: debug2: DefaultQOS = (null)
>> slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
>> slurmdbd: debug2: MessageTimeout = 100
>> slurmdbd: debug2: Parameters = (null)
>> slurmdbd: debug2: PidFile = /run/slurmdbd.pid
>> slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
>> slurmdbd: debug2: PrivateData = none
>> slurmdbd: debug2: PurgeEventAfter = 1 months*
>> slurmdbd: debug2: PurgeJobAfter = 12 months*
>> slurmdbd: debug2: PurgeResvAfter = 1 months*
>> slurmdbd: debug2: PurgeStepAfter = 1 months
>> slurmdbd: debug2: PurgeSuspendAfter = 1 months
>> slurmdbd: debug2: PurgeTXNAfter = 12 months
>> slurmdbd: debug2: PurgeUsageAfter = 24 months
>> slurmdbd: debug2: SlurmUser = root(0)
>> slurmdbd: debug2: StorageBackupHost = (null)
>> slurmdbd: debug2: StorageHost = localhost
>> slurmdbd: debug2: StorageLoc = slurm_acct_db
>> slurmdbd: debug2: StoragePort = 3306
>> slurmdbd: debug2: StorageType = accounting_storage/mysql
>> slurmdbd: debug2: StorageUser = slurm
>> slurmdbd: debug2: TCPTimeout = 2
>> s

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
manually running it through sudo slurmdbd -D /path/to/conf is very quick on
my fresh install

trying to start the slurmdbd through systemctl take 3 minutes and then
crashes and fail

Is there an alternative to systemctl to start the slurmdbd in the
background ?

But most importantly I wanted to know why it takes so long through
systemctl. Maybe I can increase the timeout limit ?

On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski 
wrote:

> It may take longer to start than systemd allows for. How long does it take
> to start from the command line? It’s common to need to run it manually for
> upgrades to complete.
>
> --
> #BlackLivesMatter
> 
> || \\UTGERS, |---*O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB
> A555B, Newark
>  `'
>
> On May 30, 2024, at 20:24, Radhouane Aniba via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
> Ok I made some progress here.
>
> I removed and purged slurmdbd mysql mariadb etc .. and started from
> scratch.
> I added the recommended mysqld requirements
>
> Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything
> worked well
>
> When I tried to start the service sudo systemctl start slurmdbd.service
> it didnt work
>
> sudo systemctl status  slurmdbd.service
> ● slurmdbd.service - Slurm DBD accounting daemon
>  Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
> preset: enabled)
>  Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC;
> 2min 5s ago
> Process: 6258 ExecStart=/usr/sbin/slurmdbd -D
> /etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS)
>
> May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting
> daemon...
> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation
> timed out. Terminating.
> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with
> result 'timeout'.
> May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD
> accounting daemon.
>
> Even though it is the same command ?!
>
> Any idea ?
>
>
> On Thu, May 30, 2024 at 5:02 PM Radhouane Aniba  wrote:
>
>> Thank you Ahmet and Brian,
>>
>> Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all
>> the cnf files for mysql and I cannot find the data it is displaying here
>>
>> slurmdbd: debug2: Attempting to connect to localhost:3306
>> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
>> slurmdbd: debug2: innodb_log_file_size: 50331648
>> slurmdbd: debug2: innodb_lock_wait_timeout: 50
>> slurmdbd: error: Database settings not recommended values:
>> innodb_buffer_pool_size innodb_lock_wait_timeout
>>
>>
>> sudo tree /etc/mysql/*
>> /etc/mysql/conf.d
>> ├── mysql.cnf
>> └── mysqldump.cnf
>> /etc/mysql/debian.cnf
>> /etc/mysql/debian-start
>> /etc/mysql/FROZEN
>> /etc/mysql/mariadb.cnf
>> /etc/mysql/mariadb.conf.d
>> ├── 50-client.cnf
>> ├── 50-mysql-clients.cnf
>> ├── 50-mysqld_safe.cnf
>> └── 50-server.cnf
>> /etc/mysql/my.cnf
>> /etc/mysql/my.cnf.fallback
>> /etc/mysql/mysql.cnf
>> /etc/mysql/mysql.conf.d
>> ├── mysql.cnf
>> └── mysqld.cnf
>>
>> On Thu, May 30, 2024 at 12:21 PM Brian Andrus via slurm-users <
>> slurm-users@lists.schedmd.com> wrote:
>>
>>> That SIGTERM message means something is telling slurmdbd to quit.
>>>
>>> Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told
>>> to shutdown. If you are running in the foreground, a ^C does that. If you
>>> run a kill or killall on it, you will get that same message.
>>>
>>> Brian Andrus
>>> On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote:
>>>
>>> Yes I can connect to my database using mysql --user=slurm
>>> --password=slurmdbpass  slurm_acct_db and there is no firewall blocking
>>> mysql after checking the firewall question
>>>
>>> ALso here is the output of slurmdbd -D -vvv (note I can only run this as
>>> sudo )
>>>
>>> sudo slurmdbd -D -vvv
>>> slurmdbd: debug: Log file re-opened
>>> slurmdbd: debug: Munge authentication plugin loaded
>>> slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
>>> slurmdbd: debug2: Attempting to connect to localhost:3306
>>> slurmdbd: debug2: innodb_buffer_pool_size: 134217728
>>> slurmdbd: debug2: innodb_log_file_size: 50331648
>>> slurmdbd: debug2: innodb_lock_wait_timeout: 50
>>> slurmdbd: error: Database settings not recommended values:
>>> innodb_buffer_pool_size innodb_lock_wait_timeout
>>> slurmdbd: Accounting storage MYSQL plugin loaded
>>> slurmdbd: debug2: ArchiveDir = /tmp
>>> slurmdbd: debug2: ArchiveScript = (null)
>>> slurmdbd: debug2: AuthAltTypes = (null)
>>> slurmdbd: debug2: AuthInfo = (null)
>>> slurmdbd: debug2: AuthType = auth/munge
>>> slurmdbd: debug2: CommitDelay = 0
>>> slurmdbd: debug2: DbdAddr = localhost
>>> slurmdbd: debug2: DbdBackupHost = (null)
>>> slurmdbd: debug2: Db

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Ryan Novosielski via slurm-users
Are you looking at the log/what appears on the screen, and do you know for a 
fact that it is all the way up (should say "version  started” at the 
end)?

If that’s not it, you could have a permissions thing or something.

I do not expect you’d need to extend the timeout for a normal run. I suspect it 
is doing something.

--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
 `'

On May 30, 2024, at 23:57, Radhouane Aniba  wrote:

manually running it through sudo slurmdbd -D /path/to/conf is very quick on my 
fresh install

trying to start the slurmdbd through systemctl take 3 minutes and then crashes 
and fail

Is there an alternative to systemctl to start the slurmdbd in the background ?

But most importantly I wanted to know why it takes so long through systemctl. 
Maybe I can increase the timeout limit ?

On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski 
mailto:novos...@rutgers.edu>> wrote:
It may take longer to start than systemd allows for. How long does it take to 
start from the command line? It’s common to need to run it manually for 
upgrades to complete.

--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - 
novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
 `'

On May 30, 2024, at 20:24, Radhouane Aniba via slurm-users 
mailto:slurm-users@lists.schedmd.com>> wrote:

Ok I made some progress here.

I removed and purged slurmdbd mysql mariadb etc .. and started from scratch.
I added the recommended mysqld requirements

Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything 
worked well

When I tried to start the service sudo systemctl start slurmdbd.service  it 
didnt work

sudo systemctl status  slurmdbd.service
● slurmdbd.service - Slurm DBD accounting daemon
 Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor 
preset: enabled)
 Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC; 2min 
5s ago
Process: 6258 ExecStart=/usr/sbin/slurmdbd -D /etc/slurm-llnl/slurmdbd.conf 
(code=exited, status=0/SUCCESS)

May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting daemon...
May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation timed 
out. Terminating.
May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with result 
'timeout'.
May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD accounting 
daemon.

Even though it is the same command ?!

Any idea ?

--
Rad Aniba, PhD



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
Yes when I run it manually it says something like this

[2024-05-31T00:20:01.142] Accounting storage MYSQL plugin loaded
[2024-05-31T00:20:01.146] slurmdbd version 19.05.5 started

But when I try to do it through systemctl

[2024-05-31T00:21:30.953] Terminate signal (SIGINT or SIGTERM) received
[2024-05-31T00:21:30.953] debug:  rpc_mgr shutting down



On Fri, May 31, 2024 at 12:01 AM Ryan Novosielski 
wrote:

> Are you looking at the log/what appears on the screen, and do you know for
> a fact that it is all the way up (should say "version  started”
> at the end)?
>
> If that’s not it, you could have a permissions thing or something.
>
> I do not expect you’d need to extend the timeout for a normal run. I
> suspect it is doing something.
>
> --
> #BlackLivesMatter
> 
> || \\UTGERS, |---*O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB
> A555B, Newark
>  `'
>
> On May 30, 2024, at 23:57, Radhouane Aniba  wrote:
>
> manually running it through sudo slurmdbd -D /path/to/conf is very quick
> on my fresh install
>
> trying to start the slurmdbd through systemctl take 3 minutes and then
> crashes and fail
>
> Is there an alternative to systemctl to start the slurmdbd in the
> background ?
>
> But most importantly I wanted to know why it takes so long through
> systemctl. Maybe I can increase the timeout limit ?
>
> On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski 
> wrote:
>
>> It may take longer to start than systemd allows for. How long does it
>> take to start from the command line? It’s common to need to run it manually
>> for upgrades to complete.
>>
>> --
>> #BlackLivesMatter
>> 
>> || \\UTGERS,
>> |---*O*---
>> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>> RBHS Campus
>> ||  \\of NJ  | Office of Advanced Research Computing - MSB
>> A555B, Newark
>>  `'
>>
>> On May 30, 2024, at 20:24, Radhouane Aniba via slurm-users <
>> slurm-users@lists.schedmd.com> wrote:
>>
>> Ok I made some progress here.
>>
>> I removed and purged slurmdbd mysql mariadb etc .. and started from
>> scratch.
>> I added the recommended mysqld requirements
>>
>> Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything
>> worked well
>>
>> When I tried to start the service sudo systemctl start slurmdbd.service
>> it didnt work
>>
>> sudo systemctl status  slurmdbd.service
>> ● slurmdbd.service - Slurm DBD accounting daemon
>>  Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled;
>> vendor preset: enabled)
>>  Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC;
>> 2min 5s ago
>> Process: 6258 ExecStart=/usr/sbin/slurmdbd -D
>> /etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS)
>>
>> May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting
>> daemon...
>> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation
>> timed out. Terminating.
>> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with
>> result 'timeout'.
>> May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD
>> accounting daemon.
>>
>> Even though it is the same command ?!
>>
>> Any idea ?
>>
>> --
> *Rad Aniba, PhD*
>
>
>

-- 
*Rad Aniba, PhD*

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Radhouane Aniba via slurm-users
I also run both commands using sudo so I am assuming permission should not
be the issue ?  my cluster user is root (i know not good, but im testing
things out)

On Fri, May 31, 2024 at 12:03 AM Radhouane Aniba  wrote:

> Yes when I run it manually it says something like this
>
> [2024-05-31T00:20:01.142] Accounting storage MYSQL plugin loaded
> [2024-05-31T00:20:01.146] slurmdbd version 19.05.5 started
>
> But when I try to do it through systemctl
>
> [2024-05-31T00:21:30.953] Terminate signal (SIGINT or SIGTERM) received
> [2024-05-31T00:21:30.953] debug:  rpc_mgr shutting down
>
>
>
> On Fri, May 31, 2024 at 12:01 AM Ryan Novosielski 
> wrote:
>
>> Are you looking at the log/what appears on the screen, and do you know
>> for a fact that it is all the way up (should say "version 
>> started” at the end)?
>>
>> If that’s not it, you could have a permissions thing or something.
>>
>> I do not expect you’d need to extend the timeout for a normal run. I
>> suspect it is doing something.
>>
>> --
>> #BlackLivesMatter
>> 
>> || \\UTGERS,
>> |---*O*---
>> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>> RBHS Campus
>> ||  \\of NJ  | Office of Advanced Research Computing - MSB
>> A555B, Newark
>>  `'
>>
>> On May 30, 2024, at 23:57, Radhouane Aniba  wrote:
>>
>> manually running it through sudo slurmdbd -D /path/to/conf is very quick
>> on my fresh install
>>
>> trying to start the slurmdbd through systemctl take 3 minutes and then
>> crashes and fail
>>
>> Is there an alternative to systemctl to start the slurmdbd in the
>> background ?
>>
>> But most importantly I wanted to know why it takes so long through
>> systemctl. Maybe I can increase the timeout limit ?
>>
>> On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski 
>> wrote:
>>
>>> It may take longer to start than systemd allows for. How long does it
>>> take to start from the command line? It’s common to need to run it manually
>>> for upgrades to complete.
>>>
>>> --
>>> #BlackLivesMatter
>>> 
>>> || \\UTGERS,
>>> |---*O*---
>>> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>>> RBHS Campus
>>> ||  \\of NJ  | Office of Advanced Research Computing - MSB
>>> A555B, Newark
>>>  `'
>>>
>>> On May 30, 2024, at 20:24, Radhouane Aniba via slurm-users <
>>> slurm-users@lists.schedmd.com> wrote:
>>>
>>> Ok I made some progress here.
>>>
>>> I removed and purged slurmdbd mysql mariadb etc .. and started from
>>> scratch.
>>> I added the recommended mysqld requirements
>>>
>>> Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and
>>> everything worked well
>>>
>>> When I tried to start the service sudo systemctl start slurmdbd.service
>>> it didnt work
>>>
>>> sudo systemctl status  slurmdbd.service
>>> ● slurmdbd.service - Slurm DBD accounting daemon
>>>  Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled;
>>> vendor preset: enabled)
>>>  Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC;
>>> 2min 5s ago
>>> Process: 6258 ExecStart=/usr/sbin/slurmdbd -D
>>> /etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS)
>>>
>>> May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting
>>> daemon...
>>> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start
>>> operation timed out. Terminating.
>>> May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with
>>> result 'timeout'.
>>> May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD
>>> accounting daemon.
>>>
>>> Even though it is the same command ?!
>>>
>>> Any idea ?
>>>
>>> --
>> *Rad Aniba, PhD*
>>
>>
>>
>
> --
> *Rad Aniba, PhD*
>
>

-- 
*Rad Aniba, PhD*

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com