Hello,

I should preface this with I've just recently started getting my head around 
grid engine and as such may not have all the information I should for 
administering the grid but someone's has to do it. Anyways...


Our company across an issue recently where a one of the nodes seems to become 
very delayed in its response to grid submissions.  Whether it be a qsub, qrsh 
or qlogin submission jobs can take anywhere from 30s to 4-5min to successfully 
submit. In particular, while users may complain a qsub job looks like it has 
submitted but do nothing, doing a qlogin to the node in question will give the 
following:

Your job 287104 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...timeout (3 s) expired while 
waiting on socket fd 7

Now I've seen  a series of forum articles bring up this message while seaching 
through back logs but there never seems to be any conclusions in those threads 
for me to start delving into on our end.

Our past attempts to resolve the issue have only succeeded by rebooting the 
node in question, and not having any real ideas on why is becoming a general 
frustration.

Any initial thoughts/pointers would be greatly appreciated

Kind Regards,

Derek

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to