Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Bjørn-Helge Mevik
"Ohlerich, Martin" writes: > Hello Björn-Helge. > > > Sigh ... > > First of all, of course, many thanks! This indeed helped a lot! Good! > b) This only works if I have to specify --mem for a task. Although > manageable, I wonder why one needs to be that restrictive. In > principle, in the use c

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-18 Thread Greg Wickham
Hi Rob, > Are you just creating those files and then including them in slurm.conf? Yes. We’re using puppet, but you could get the same results using jinja2. The workflow we use is a little convoluted – the original YAML files are validated then JSON formatted data is written to intermediate fi

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Ohlerich, Martin
Alright. I didn't see that option for GNU parallel. Retrying a task that failed for good reasons, makes maybe not much sense (e.g. due to OOM). And if the farming job timed out, on restart that job, GNU parallel does not start from the former state, does it? I guess book-keeping is an extra issu

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Ward Poelmans
On 18/01/2023 15:22, Ohlerich, Martin wrote: But Magnus (Thanks for the Link!) is right. This is still far away from a feature rich job- or task-farming concept, where at least some overview of the passed/failed/missing task statistics is available etc. GNU parallel has log output and options

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Ohlerich, Martin
Sure ;) My example was just for fast reproductivity. The complete job farm script is (if that's of interest): -> #!/bin/bash #SBATCH -J jobfarm_test #SBATCH -o log.%x.%j.%N.out #SBATCH -D ./ #SBATCH --mail-type=NONE #SBATCH --time=00:05:00 #SBATCH --export

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-18 Thread Groner, Rob
Generating the *.conf files from parseable/testable sources is an interesting idea. You mention nodes.conf and partitions.conf. I can't find any documentation on those. Are you just creating those files and then including them in slurm.conf? Rob From: slurm-

Re: [slurm-users] Maintaining slurm config files for test and production clusters

2023-01-18 Thread Groner, Rob
This sounds like a great idea. My org has been strangely resistent to setting up HA for slurm, this might be a good enough reason. thanks. Rob From: slurm-users on behalf of Brian Andrus Sent: Tuesday, January 17, 2023 5:54 PM To: slurm-users@lists.schedmd.co

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Ward Poelmans
Hi Martin, Just a tip: use gnu parallel instead of a for loop. Much easier and more powerful. Like: parallel -j $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact ::: *.input Ward smime.p7s Description: S/MIME Cryptographic Signature

Re: [slurm-users] [ext] Re: srun jobfarming hassle question

2023-01-18 Thread Hagdorn, Magnus Karl Moritz
Hi Martin, I faced a similar problem where I had to deal with a huge taskfarm (1000s of tasks processing 1TB of satellite data) with varying run times and memory requirements. I ended up writing a REST server that hands out tasks to clients. I then simply fired up an array job where each job would

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Ohlerich, Martin
Hello Björn-Helge. Sigh ... First of all, of course, many thanks! This indeed helped a lot! Two comments: a) Why are Interfaces at Slurm tools changed? I once learned that the Interfaces must be designed to be as stable as possible. Otherwise, users get frustrated and go away. b) This only

Re: [slurm-users] srun jobfarming hassle question

2023-01-18 Thread Bjørn-Helge Mevik
"Ohlerich, Martin" writes: > Dear Colleagues, > > > already for quite some years now are we again and again facing issues on our > clusters with so-called job-farming (or task-farming) concepts in Slurm jobs > using srun. And it bothers me that we can hardly help users with requests in > this

[slurm-users] srun jobfarming hassle question

2023-01-18 Thread Ohlerich, Martin
Dear Colleagues, already for quite some years now are we again and again facing issues on our clusters with so-called job-farming (or task-farming) concepts in Slurm jobs using srun. And it bothers me that we can hardly help users with requests in this regard. >From the documentation (https: