Re: [slurm-users] Aborting a job from inside the prolog

2023-06-20 Thread Alexander Grund
Am 19.06.23 um 17:32 schrieb Gerhard Strangar: Try to exit with 0, because it's not your prolog that failed. That seemingly works. I do see a value in exiting with 1 to drain the node to investigate why/what has exactly failed. Although it may be better to not drain it, I'm a bit nervous wit

Re: [slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-15 Thread Alexander Grund
Hi, Ah okay,  so your requirements include completely insulating (some) jobs from outside access, including root? Correct. I've seen this kind of requirements on e.g. working non-defaced medical data - generally a tough problem imo because this level of data security seems more or less incompa

Re: [slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-14 Thread Alexander Grund
nsidered secure. So while I fully agree that those plugins are better suited and likely easier to use I fear that it is much easier to prevent them from running and hence bypass those restrictions than having something (local) at the level of the SlurmD. Please correct me if I misunderstood anything. Kind Regards, Alexander Grund

[slurm-users] Disable --no-allocate support for a node/SlurmD

2023-06-14 Thread Alexander Grund
Hi, we do some additional checking on a user and the batch script in a Prolog script. However the `--no-allocate`/`-Z` bypasses allocation and hence execution of the Prolog/Epilog. Is there a way to configure SlurmD to deny access to jobs without allocations or more generally all interactive

[slurm-users] Aborting a job from inside the prolog

2023-06-14 Thread Alexander Grund
Hi, We are doing some checking on the users Job inside the prolog script and upon failure of those checks the job should be canceled. Our first approach with `scancel $SLURM_JOB_ID; exit 1` doesn't seem to work as the (sbatch) job still gets re-queued. Is this possible at all (i.e. prevent

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-05 Thread Alexander Grund
: debug2: spank: spank_nv_gpufreq.so: exit = 0 So good idea, seems someone defined "SLURM_HINT=nomultithread" in all users env. Removing that makes the allocation succeed. -- ~~ Alexander Grund Interdisziplinäre Anwendungsunterstützung und Koordin

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Alexander Grund
ot; for the srun/sbatch/salloc means "(physical) core". "CPU" as for scontrol (and pyslurm which seems to wrap this) means "Thread". This is confusing but at least the question seems to be answered now. -- ~~ Alexander Grund Inter

[slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Alexander Grund
d what are "Cores" to SLURM? Why does it mix up those 2? Most importantly: Does this mean `--cpus-per-task` can be as high as 176 on this node and `--mem-per-cpu` can be up to the reported "RealMemory"/176? Thanks a lot, Alexander Grund --

[slurm-users] Getting task distribution from environment

2020-02-03 Thread Alexander Grund
PUs allocated to each task? I know of SLURM_CPUS_PER_TASK but can the number of CPUs be different per task? Thanks, Alex -- ~~~~~~ Alexander Grund Interdisziplinäre Anwendungsunterstützung und Koordination (IAK) Technische Universität Dresden Zentrum für Informationsdi

Re: [slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Alexander Grund
Unfortunately this will not work. Example: salloc --nodes=3 --exclusive I'm wondering, why there is a discrepancy between the environment variables and scontrol. The latter clearly shows "NumNodes=3 NumCPUs=72 NumTasks=3 CPUs/Task=1" (yes I realize that those values are inconsistent too, but a

Re: [slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Alexander Grund
röm: On Wed, 11 Jul 2018 14:10:51 +0200 Alexander Grund wrote: > Hi all, > > is it expected/intended that the env variable SLURM_NTASKS is not > defined after salloc? It only gets defined after the an srun command. > The number of tasks appear in `scontrol -d show job ` though. >

[slurm-users] SLURM_NTASKS not defined after salloc

2018-07-11 Thread Alexander Grund
Hi all, is it expected/intended that the env variable SLURM_NTASKS is not defined after salloc? It only gets defined after the an srun command. The number of tasks appear in `scontrol -d show job ` though. So is it a bug in our installation or expected? Thanks, Alex