[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

2024-06-11 Thread George Leaver via slurm-users
Hi Loris,

> Doesn't splitting up your jobs over two partitions mean that either one of 
> the two partitions could be full, while the other has idle nodes?

Yes, potentially, and we may move away from our current config at some point 
(it's a bit of a hangover from an SGE cluster.) Hasn't really been an issue at 
the moment.

Do you find fragmentation a problem? Or do you just let the bf scheduler handle 
that (assuming jobs have a realistic wallclock request?)

But for now, would be handy if users didn't need to adjust their jobscripts (or 
we didn't need to write a submit script.) 

Regards,
George

--
George Leaver
Research Infrastructure, IT Services, University of Manchester
http://ri.itservices.manchester.ac.uk | @UoM_eResearch


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

2024-06-11 Thread Loris Bennett via slurm-users
Hi George,

George Leaver via slurm-users  writes:

> Hi Loris,
>
>> Doesn't splitting up your jobs over two partitions mean that either
>> one of the two partitions could be full, while the other has idle
>> nodes?
>
> Yes, potentially, and we may move away from our current config at some
> point (it's a bit of a hangover from an SGE cluster.) Hasn't really
> been an issue at the moment.
>
> Do you find fragmentation a problem? Or do you just let the bf scheduler 
> handle that (assuming jobs have a realistic wallclock request?)

Well, not with essentially only one partition we don't have
fragmentation related to that.  We did used to have multiple partitions
for different run-times, we did have fragmentation.  However, I couldn't
see any advantage in that setup, so we moved to one partition and
various QOS to handle say test or debug jobs.  However, users do still
sometimes add potentially arbitrary conditions to their jobs script,
such as the number of nodes for MPI jobs.  Whereas in principal it may
be a good idea to reduce the MPI-overhead by reducing the number of
nodes, in practice any such advantage may well be cancelled out or
exceeded by the extra time the job is going to have to wait for those
specific resources.

Backfill works fairly well for us, although indeed not without a little
badgering of users to get them to specify appropriate run-times.

> But for now, would be handy if users didn't need to adjust their jobscripts 
> (or we didn't need to write a submit script.) 

If you ditch one of the partitions, you could always use a job submit
plug-in to replace the invalid partition specified by the job by the
remaining partition.

Cheers,

Loris


> Regards,
> George
>
> --
> George Leaver
> Research Infrastructure, IT Services, University of Manchester
> http://ri.itservices.manchester.ac.uk | @UoM_eResearch
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: srun hostname - Socket timed out on send/recv operation

2024-06-11 Thread Arnuld via slurm-users
I enabled "debug3" logging and saw this in the node log:

error: mpi_conf_send_stepd: unable to resolve MPI plugin offset from
plugin_id=106. This error usually results from a job being submitted
against an MPI plugin which was not compiled into slurmd but was for job
submission command.
error: _send_slurmstepd_init: mpi_conf_send_stepd(9, 106) failed: No error

I removed "MpiDefault" option from slurm.conf and now "srun -N2 -l
hostname" returns hostnames of all machines



On Tue, Jun 11, 2024 at 11:05 AM Arnuld  wrote:

> I have two machines. When I run "srum hostname" on one machine (it's both
> a controller and a node) then I get the hostname fine but I get socket
> timed out error in these two situations:
>
> 1) "srun hostname" on 2nd machine (it's a node)
> 2) "srun -N 2 hostname" on controller
>
> "scontrol show node" shows both mach2 and mach4. "sinfo" shows both nodes
> too.  Also the job gets stuck forever in CG state after the error. Here is
> the output:
>
> $ srun -N 2 hostname
> mach2
> srun: error: slurm_receive_msgs: [[mach4]:6818] failed: Socket timed out
> on send/recv operation
> srun: error: Task launch for StepId=.0 failed on node hpc4: Socket
> timed out on send/recv operation
> srun: error: Application launch failed: Socket timed out on send/recv
> operation
> srun: Job step aborted
>
>
> Output form "squeue" 3 seconds apart
>
> Tue Jun 11 05:09:56 2024
>  JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
>    poxo hostname   arnuld  R   0:19  2
> mach4,mach2
>
> Tue Jun 11 05:09:59 2024
>  JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
>    poxo hostname   arnuld CG   0:20  1 mach4
>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com