[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks
Hi Loris, > Doesn't splitting up your jobs over two partitions mean that either one of > the two partitions could be full, while the other has idle nodes? Yes, potentially, and we may move away from our current config at some point (it's a bit of a hangover from an SGE cluster.) Hasn't really been an issue at the moment. Do you find fragmentation a problem? Or do you just let the bf scheduler handle that (assuming jobs have a realistic wallclock request?) But for now, would be handy if users didn't need to adjust their jobscripts (or we didn't need to write a submit script.) Regards, George -- George Leaver Research Infrastructure, IT Services, University of Manchester http://ri.itservices.manchester.ac.uk | @UoM_eResearch -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks
Hi George, George Leaver via slurm-users writes: > Hi Loris, > >> Doesn't splitting up your jobs over two partitions mean that either >> one of the two partitions could be full, while the other has idle >> nodes? > > Yes, potentially, and we may move away from our current config at some > point (it's a bit of a hangover from an SGE cluster.) Hasn't really > been an issue at the moment. > > Do you find fragmentation a problem? Or do you just let the bf scheduler > handle that (assuming jobs have a realistic wallclock request?) Well, not with essentially only one partition we don't have fragmentation related to that. We did used to have multiple partitions for different run-times, we did have fragmentation. However, I couldn't see any advantage in that setup, so we moved to one partition and various QOS to handle say test or debug jobs. However, users do still sometimes add potentially arbitrary conditions to their jobs script, such as the number of nodes for MPI jobs. Whereas in principal it may be a good idea to reduce the MPI-overhead by reducing the number of nodes, in practice any such advantage may well be cancelled out or exceeded by the extra time the job is going to have to wait for those specific resources. Backfill works fairly well for us, although indeed not without a little badgering of users to get them to specify appropriate run-times. > But for now, would be handy if users didn't need to adjust their jobscripts > (or we didn't need to write a submit script.) If you ditch one of the partitions, you could always use a job submit plug-in to replace the invalid partition specified by the job by the remaining partition. Cheers, Loris > Regards, > George > > -- > George Leaver > Research Infrastructure, IT Services, University of Manchester > http://ri.itservices.manchester.ac.uk | @UoM_eResearch > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: srun hostname - Socket timed out on send/recv operation
I enabled "debug3" logging and saw this in the node log: error: mpi_conf_send_stepd: unable to resolve MPI plugin offset from plugin_id=106. This error usually results from a job being submitted against an MPI plugin which was not compiled into slurmd but was for job submission command. error: _send_slurmstepd_init: mpi_conf_send_stepd(9, 106) failed: No error I removed "MpiDefault" option from slurm.conf and now "srun -N2 -l hostname" returns hostnames of all machines On Tue, Jun 11, 2024 at 11:05 AM Arnuld wrote: > I have two machines. When I run "srum hostname" on one machine (it's both > a controller and a node) then I get the hostname fine but I get socket > timed out error in these two situations: > > 1) "srun hostname" on 2nd machine (it's a node) > 2) "srun -N 2 hostname" on controller > > "scontrol show node" shows both mach2 and mach4. "sinfo" shows both nodes > too. Also the job gets stuck forever in CG state after the error. Here is > the output: > > $ srun -N 2 hostname > mach2 > srun: error: slurm_receive_msgs: [[mach4]:6818] failed: Socket timed out > on send/recv operation > srun: error: Task launch for StepId=.0 failed on node hpc4: Socket > timed out on send/recv operation > srun: error: Application launch failed: Socket timed out on send/recv > operation > srun: Job step aborted > > > Output form "squeue" 3 seconds apart > > Tue Jun 11 05:09:56 2024 > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > poxo hostname arnuld R 0:19 2 > mach4,mach2 > > Tue Jun 11 05:09:59 2024 > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > poxo hostname arnuld CG 0:20 1 mach4 > > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com