Re: [slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-07-01 Thread Jean-Christophe HAESSIG
On 01/07/2022 15:05, Chris Samuel wrote: > On 29/6/22 09:01, Jean-Christophe HAESSIG wrote: > >> No, the job is placed through DRMAA API which enables programs to place >> jobs in a cluster-agnostic way. Th program doesn't know it is talking to >> Slurm. The DRMAA

Re: [slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-07-01 Thread Jean-Christophe HAESSIG
On 29/06/2022 15:01, Jean-Christophe HAESSIG wrote: Hi, Turns out I had libslurm36_20.11.7+really20.11.4-2+deb11u1 but slurm-wlm-basic-plugins was only at version 20.11.7+really20.11.4 because libslurm36 was installed some time after the last Slurm upgrade. The libraries were incompatible but

Re: [slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-06-29 Thread Jean-Christophe HAESSIG
On 28/06/2022 23:14, Chris Samuel wrote: > On 28/6/22 12:19 pm, Jean-Christophe HAESSIG wrote: Hi, > I suspect this is where your error is happening: > > https://github.com/SchedMD/slurm/blob/1ce55318222f89fbc862ce559edfd17e911fee38/src/common/plugin.c#L284 > > Yes I

[slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

2022-06-28 Thread Jean-Christophe HAESSIG
Hi, I'm facing a weird issue where launching a job through drmaa (https://github.com/natefoo/slurm-drmaa) aborts with the message "Plugin is corrupted", but only when that job is placed from one of my compute nodes. Running the command from the login node seems to work. My cluster runs Slurm 2

Re: [slurm-users] sreport outputs invalid values due to corrupted data

2022-03-09 Thread Jean-Christophe HAESSIG
On 09/03/2022 14:46, Loris Bennett wrote: > Hi Jean-Christophe, Hi, >scontrol show runawayjobs Thank you, I didn't know about that functionality. So, I undid the fiddling I had done on the database and ran sacctmgr show runawayjobs. It found the jobs and I 'fixed' them. Apparently it didn't

[slurm-users] sreport outputs invalid values due to corrupted data

2022-03-09 Thread Jean-Christophe HAESSIG
Hi, I recently noticed impossible usage values returned by sreport, my cluster was reportedly used at 100%. Upon further investigation, I found about 6000 jobs launched on 2020-08-31 that were 'COMPLETED' but had their CPUTime still increasing, amounting to about 500 days. The root cause for t

Re: [slurm-users] Using hyperthreaded processors

2020-11-05 Thread Jean-Christophe HAESSIG
Le mercredi 04 novembre 2020 à 21:41 +, Sebastian T Smith a écrit : > Hi, Hi, > We have Hyper-threading/SMT enabled on our cluster. It's challenging > to fully utilize threads, as Brian suggests. We have a few workloads > that benefit from it being enabled Our cluster services tasks in the

[slurm-users] Using hyperthreaded processors

2020-11-04 Thread Jean-Christophe HAESSIG
Hi, I would like to make good use of hyperthreaded processors and I already skimmed through a quantity of posts and documentation. It is pretty clear that Slurm likes to allocate processing units up to the core level, and to be able to allocate threads one has to either : - not declare Sockets/Co