Chip,
Thank you for your prompt response. We could do that, but the helper is
optional, and at times might involve additional helpers depending on
the inputs to the problem being solved, and we don't a priori know the
number of helpers that might be needed.
Alan
On 2/9/24 10:59, Chip Seraphine
Hi Sylvain,
For the series better late than never, is this still a problem?
If so, is this a new install or an update?
Whan environment/compiler are you using? The error
undefined reference to `__nv_init_env'
seems to indicate that you are doing something cuda-related which I think
you should not
If you would like the high watermark memory utilization after the job
completes, https://github.com/NCAR/peak_memusage is a great tool. Of course
it has the limitation that you need to know that you want that information
*before* starting the job, which might or might not a problem for your use
cas
Hello,
I'm wondering if there's a way to tell how much memory my job is using
per node. I'm doing
#SBATCH -n 256
srun solver inputfile
When I run sacct -o maxvmsize, the result apparently is the maxmimum VSZ
of the largest solver process, not the maximum of the sum of them all
(unlike when calli
Managed to narrow it down a little bit. Our groups file is pretty large and we
have a handful of individual groups that are also quite large as shown below
[root@batch1 ~]# wc /etc/group
6075 6075 349457 /etc/group
[root@batch1 ~]# grep 8xxx2 /etc/group | wc -c
56959
It looks like one of th
Hello,
TL,DR: How does the relative QOS flag work?
I have a QOS and I want it to be collectively restricted to 50% of the
reachable cores in the cluster. I’ve been managing this by dividing my core
count to 2 to get N, and doing ‘sacctmgr update qos foobar set MaxTRES=cpu=N’.
That’s fine,
Normally I'd address this by having an sbatch script allocate enough resources
for both jobs (specifying one node), and then kick off the helper as a separate
step (assuming I am understanding your issue correctly).
On 2/9/24, 9:57 AM, "Alan Stange via slurm-users"
mailto:slurm-users@lists.sc
Hello all,
I'm somewhat new to Slurm, but long time user of other batch systems.
Assume we have a simple cluster of uniform racks of systems with no
special resources, and our jobs are all single cpu tasks.
Lets say I have a long running job in the cluster, which needs to spawn
a helper process
Hi Alistair,
I was holding off replying in the hope someone would have a good answer. In
lieu of that, here’s my partial answer:
When I looked at trying to report per-user and per-group qos values a few
months I discovered that SLURM reports the information via this command:
scontrol -o show a