Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-23 Thread James A. Peltier
lla Baccarelli" |> |> Sent: Wednesday, January 17, 2018 5:41:54 PM |> Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv |> operation |> |> Hi John |> |> thanks for the infos. |> We are investigating the slowdown of sssd and I found some bug |&

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-22 Thread Alessandro Federico
;John DeSantis" > Cc: hpc-sysmgt-i...@cineca.it, "Slurm User Community List" > , "Isabella Baccarelli" > > Sent: Wednesday, January 17, 2018 5:41:54 PM > Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv > operation > > Hi

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread Alessandro Federico
ssage - > From: "John DeSantis" > To: "Alessandro Federico" > Cc: "Slurm User Community List" , "Isabella > Baccarelli" , > hpc-sysmgt-i...@cineca.it > Sent: Wednesday, January 17, 2018 3:30:43 PM > Subject: Re: [slurm-users]

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread John DeSantis
> > thank you very much > ale > > - Original Message - > > From: "John DeSantis" > > To: "Matthieu Hautreux" > > Cc: hpc-sysmgt-i...@cineca.it, "Slurm User Community List" > > , "Isabella Baccarelli" > >

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread Alessandro Federico
k you very much ale - Original Message - > From: "John DeSantis" > To: "Matthieu Hautreux" > Cc: hpc-sysmgt-i...@cineca.it, "Slurm User Community List" > , "Isabella Baccarelli" > > Sent: Tuesday, January 16, 2018 8:20:20 PM > Su

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-16 Thread John DeSantis
> > > > system. The time-based history has proven more helpful at times > > > > than log contents by themselves. > > > > > > > > See Giovanni Torres' post on setting this up... > > > > > > > > http://gio

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-16 Thread Matthieu Hautreux
_slurm_rpc_epilog_complete: 1 Note very large processing time from > > _slurm_rpc_job_pack_alloc_info: 3 Note very large processing time > > from _slurm_rpc_step_complete: > > > > processing times are always around tens of seconds. > > > > I'm attaching sdiag outp

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-16 Thread John DeSantis
e processing time > from _slurm_rpc_step_complete: > > processing times are always around tens of seconds. > > I'm attaching sdiag output and slurm.conf. > > thanks > ale > > ----- Original Message - > > From: "Trevor Cooper" > > To: "S

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-16 Thread Alessandro Federico
let you know if it solves the problem. > > > > Thanks > > ale > > > > - Original Message ----- > >> From: "John DeSantis" > >> To: "Alessandro Federico" > >> Cc: slurm-users@lists.schedmd.com, "Isabella Baccare

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-16 Thread Alessandro Federico
Hi Trevor thank you very much we'll give it a try ale - Original Message - > From: "Trevor Cooper" > To: "Slurm User Community List" > Sent: Tuesday, January 16, 2018 12:10:21 AM > Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on s

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-15 Thread Cooper, Trevor
@lists.schedmd.com, "Isabella Baccarelli" >> , hpc-sysmgt-i...@cineca.it >> Sent: Friday, January 12, 2018 7:58:38 PM >> Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv >> operation >> >> Ciao Alessandro, >> >>> D

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-15 Thread Alessandro Federico
quot;John DeSantis" > To: "Alessandro Federico" > Cc: slurm-users@lists.schedmd.com, "Isabella Baccarelli" > , hpc-sysmgt-i...@cineca.it > Sent: Friday, January 12, 2018 7:58:38 PM > Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on send/re

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-12 Thread John DeSantis
Ciao Alessandro, > Do we have to apply any particular setting to avoid incurring the > problem? What is your "MessageTimeout" value in slurm.conf? If it's at the default of 10, try changing it to 20. I'd also check and see if the slurmctld log is reporting anything pertaining to the server thr

[slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-12 Thread Alessandro Federico
Hi all, we are setting up SLURM 17.11.2 on a small test cluster of about 100 nodes. Sometimes we get the error in the subject when running any SLURM command (e.g. sinfo, squeue, scontrol reconf, etc...) Do we have to apply any particular setting to avoid incurring the problem? We found t