"Ohlerich, Martin" writes:
> Hello Björn-Helge.
>
>
> Sigh ...
>
> First of all, of course, many thanks! This indeed helped a lot!
Good!
> b) This only works if I have to specify --mem for a task. Although
> manageable, I wonder why one needs to be that restrictive. In
> principle, in the use c
Hi Rob,
> Are you just creating those files and then including them in slurm.conf?
Yes.
We’re using puppet, but you could get the same results using jinja2.
The workflow we use is a little convoluted – the original YAML files are
validated then JSON formatted data is written to intermediate fi
Alright. I didn't see that option for GNU parallel. Retrying a task that failed
for good reasons, makes maybe not much sense (e.g. due to OOM). And if the
farming job timed out, on restart that job, GNU parallel does not start from
the former state, does it? I guess book-keeping is an extra issu
On 18/01/2023 15:22, Ohlerich, Martin wrote:
But Magnus (Thanks for the Link!) is right. This is still far away from a
feature rich job- or task-farming concept, where at least some overview of the
passed/failed/missing task statistics is available etc.
GNU parallel has log output and options
Sure ;) My example was just for fast reproductivity.
The complete job farm script is (if that's of interest):
->
#!/bin/bash
#SBATCH -J jobfarm_test
#SBATCH -o log.%x.%j.%N.out
#SBATCH -D ./
#SBATCH --mail-type=NONE
#SBATCH --time=00:05:00
#SBATCH --export
Generating the *.conf files from parseable/testable sources is an interesting
idea. You mention nodes.conf and partitions.conf. I can't find any
documentation on those. Are you just creating those files and then including
them in slurm.conf?
Rob
From: slurm-
This sounds like a great idea. My org has been strangely resistent to setting
up HA for slurm, this might be a good enough reason. thanks.
Rob
From: slurm-users on behalf of Brian
Andrus
Sent: Tuesday, January 17, 2023 5:54 PM
To: slurm-users@lists.schedmd.co
Hi Martin,
Just a tip: use gnu parallel instead of a for loop. Much easier and more
powerful.
Like:
parallel -j $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact ::: *.input
Ward
smime.p7s
Description: S/MIME Cryptographic Signature
Hi Martin,
I faced a similar problem where I had to deal with a huge taskfarm
(1000s of tasks processing 1TB of satellite data) with varying run
times and memory requirements. I ended up writing a REST server that
hands out tasks to clients. I then simply fired up an array job where
each job would
Hello Björn-Helge.
Sigh ...
First of all, of course, many thanks! This indeed helped a lot!
Two comments:
a) Why are Interfaces at Slurm tools changed? I once learned that the
Interfaces must be designed to be as stable as possible. Otherwise, users get
frustrated and go away.
b) This only
"Ohlerich, Martin" writes:
> Dear Colleagues,
>
>
> already for quite some years now are we again and again facing issues on our
> clusters with so-called job-farming (or task-farming) concepts in Slurm jobs
> using srun. And it bothers me that we can hardly help users with requests in
> this
Dear Colleagues,
already for quite some years now are we again and again facing issues on our
clusters with so-called job-farming (or task-farming) concepts in Slurm jobs
using srun. And it bothers me that we can hardly help users with requests in
this regard.
>From the documentation (https:
12 matches
Mail list logo