Hi, > Am 23.08.2017 um 13:02 schrieb Ondrej Valousek <ondrej.valou...@s3group.com>: > > Hi List, > > When running qstat, I am sometimes receiving messages like: > ''ERROR: failed receiving gdi request response for mid=1 (got syncron message > receive timeout error)". > > Also, qping - info shows warning/error and high number of qmaster clients (> > 40) at times when I receive messages like above. > So it seems to me that qmaster is not able to handle higher number of clients > for some reason. > > I am thinking of two possible reasoning: > > 1. Buggy jsv script (but jsv should not be executed when running just > 'qstat' right?)
Correct. > 2. Qmaster spool directory stored on shared NFS storage Yes, it would be better to have it local on the node where the qmaster is running (unless you wan to have a redundant setup of two qmasters, where it has to be on a shared device of course). > Could someone tell me more about this? Anyone experienced similar issue? It > seems to me that qmaster should handle ~100 clients without any substantial > problem (at least machine CPU load is minimal). If your clients are using `qstat` that often, it might be good to throttle the number of invocations of `qstat`. If they need this to start other jobs, one could look into using the job_id/job_name top start the next, use `inotify` (Linux) or `qevent` (SGE) and reducing the poll-load. -- Reuti _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss