[slurm-users] How can I tell the OS that was used to build SLURM?

2024-06-20 Thread Carl Ponder via slurm-users
We're seeing SLURM mis-behaving on one of your clusters, that runs Ubuntu 22.04. Ampng other problems, we see an error-message regarding a missing library version that would have been shipped on Ubuntu 20.04 not 22.04. It's not clear that the library is being called from a SLURM component or

[slurm-users] How do I set SBATCH_EXCLUSIVE to its default value?

2023-05-19 Thread Carl Ponder
The SBATCH_EXCLUSIVE environment-variable is supposed to be equivalent to using the --exclusive flag on the command-line or in the sbatch-header *--exclusive*[={user|mcs}] The job allocation can not share nodes with other running jobs (or just other users with the "=user" option or with

[slurm-users] Priority jobs interfering with predictive scheduling

2023-04-12 Thread Carl Ponder
Our cluster has some nodes separated to their own partition for running interactive sessions, which are required to be short and only use a few nodes. I've always disliked this approach because I see some of the interactive nodes being idle while other jobs are waiting on the batch partition.

Re: [slurm-users] slurm and singularity

2023-02-07 Thread Carl Ponder
Take a look at this extension to SLURM: https://github.com/NVIDIA/pyxis You put the container path on the srun command-line and each rank runs inside it's own copy of the image. Subject:[slurm-users] slurm an

[slurm-users] Is there a way to restrict the NODES*TIME product for a partition?

2022-10-05 Thread Carl Ponder
In our scaling tests, it's normal to expect the job run-times to reduce as we increase the node-counts. Is there a way in SLURM to limit the NODES*TIME product for a partition, or do we just have to define a different partition (with a different duration-limit) for each job size?

Re: [slurm-users] Get original script of a job

2021-03-05 Thread Carl Ponder
I put this line in my job-control file (written in bash) to capture the original as part of the run: cp $0 $RUNDIR/$SLURM_JOB_NAME The $0 gives the full path to the working copy of the script, so it expands to this for example: /fs/slurm/var/spool/job67842/slurm_script It depends on t

[slurm-users] Differentiating subsets of PENDING state

2020-08-11 Thread Carl Ponder
One of the things I appreciate about SLURM is that I can write simple statements like this squeue -t R -a -O "nodelist:20,jobid:8,username:14,timelimit:14,timeused:12,PARTITION:13,QOS:18,command:0 that shows the list of running jobs along with stats showing when they can be expected to

[slurm-users] Using "curses" over an "srun" connection

2020-01-27 Thread Carl Ponder
I wrote a script that uses "screen" to create side-by-side windows that run co-operating processes and shows their outputs together. This looks fine when I run it remotely over an "ssh" connection. (Note that I don't need to use "ssh -X"). If I run it over an "srun" connection using forms like t