On 5/16/24 20:27, Yuengling, Philip J. via slurm-users wrote:
I'm writing up some Ansible code to manage Slurm software updates, and I
haven't found any documentation about slurmdbd behavior if the
mysql/mariadb database doesn't upgrade successfully.
I would discourage the proposed Slurm upda
On 5/17/24 05:16, Ratnasamy, Fritz via slurm-users wrote:
What is the "official" process to remove nodes safely? I have drained
the nodes so jobs are completed and put them in down state after they are
completely drained.
I edited the slurm.conf file to remove the nodes. After some time, I can
If I’m not mistaken, the manual for slurm.conf or one of the others lists
either what action is needed to change every option, or has a combined list of
what requires what (I can never remember and would have to look it up anyway).
--
#BlackLivesMatter
|| \\UTGERS, |
Hi,
What is the "official" process to remove nodes safely? I have drained the
nodes so jobs are completed and put them in down state after they are
completely drained.
I edited the slurm.conf file to remove the nodes. After some time, I can
see that the nodes were removed from the partition with
Hi,
I have got a very simple LD_PRELOAD that can do this. Maybe I should see if I
can force slurmstepd to be run with that LD_PRELOAD and then see if that does
it.
Ultimately am trying to get all the useful accounting metrics into a clickhouse
database. If the LD_PRELOAD on slurmstepd seems to
I don't really have an answer for you, just responding to make your message
pop out in the "flood" of other topics we've got since you posted.
On our cluster we configure cancelling our jobs because it makes more sense
for our situation, so I have no experience with that resume from being
suspende
Not exactly the answer to your question (which I don't know) but if you can
get to prefix whatever is executed with this
https://github.com/NCAR/peak_memusage (which also uses getrusage) or a
variant you will be able to do that.
On Thu, May 16, 2024 at 4:10 PM Emyr James via slurm-users <
slurm-us
Hi,
We are trying out slurm having been running grid engine for a long while.
In grid engine, the cgroups peak memory and max_rss are generated at the end of
a job and recorded. It logs the information from the cgroup hierarchy as well
as doing a getrusage call right at the end on the parent pid
Hi everyone,
I'm writing up some Ansible code to manage Slurm software updates, and I
haven't found any documentation about slurmdbd behavior if the mysql/mariadb
database doesn't upgrade successfully.
What I do know is that if it is sucessful I can expect to see "Conversion done:
success!"
Hi there, SLURM community,
I swear I've done this before, but now it's failing on a new cluster I'm
deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I
run `srun -n 500 hostname`, the task gets queued since there's not 500
available CPU.
Wasn't there an option that allows
I figured out that the mailing list may not be appropriate for this message, so
I've created a bug report instead:
https://bugs.schedmd.com/show_bug.cgi?id=19894
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
11 matches
Mail list logo