I moved the database to the local filesystem. It helped dramatically - number of qmaster clients now do not exceed 18 (was very varying before) and also qping does not show any problems any longer.
I think it would also help to use anynchronous NFS (albeit it is bit dangerous) but since we do not need shadow qmaster, this was an obvious choice. Ondrej >-----Original Message----- >From: SGE-discuss [mailto:sge-discuss-boun...@liverpool.ac.uk] On Behalf Of >Ondrej Valousek >Sent: Wednesday, August 23, 2017 2:03 PM >To: Reuti <re...@staff.uni-marburg.de> >Cc: SGE-discuss@liv.ac.uk <sge-disc...@liverpool.ac.uk> >Subject: Re: [SGE-discuss] Problem with sgemaster & higher number of clients > >Hi Reuti, > >So if I understand correctly it is unlikely for qstat to time out with this >error >because of JSV issue. >You say the NFS could cause performance problems, but is it likely to expect >them at ~40 clients? > >Also, we are using Netapp based NFS server which is only offering >synchronous exports. >I am also thinking of moving to a linux based NFS server with asynchronous >exports (much faster roundtrip times). > >Thanks, >Ondrej > >>-----Original Message----- >>From: Reuti [mailto:re...@staff.uni-marburg.de] >>Sent: Wednesday, August 23, 2017 1:25 PM >>To: Ondrej Valousek <ondrej.valou...@s3group.com> >>Cc: SGE-discuss@liv.ac.uk <sge-disc...@liverpool.ac.uk> >>Subject: Re: [SGE-discuss] Problem with sgemaster & higher number of >>clients >> >>Hi, >> >>> Am 23.08.2017 um 13:02 schrieb Ondrej Valousek >><ondrej.valou...@s3group.com>: >>> >>> Hi List, >>> >>> When running qstat, I am sometimes receiving messages like: >>> ''ERROR: failed receiving gdi request response for mid=1 (got syncron >>message receive timeout error)". >>> >>> Also, qping - info shows warning/error and high number of qmaster >>> clients >>(> 40) at times when I receive messages like above. >>> So it seems to me that qmaster is not able to handle higher number of >>clients for some reason. >>> >>> I am thinking of two possible reasoning: >>> >>> 1. Buggy jsv script (but jsv should not be executed when running just >>'qstat' right?) >> >>Correct. >> >> >>> 2. Qmaster spool directory stored on shared NFS storage >> >>Yes, it would be better to have it local on the node where the qmaster >>is running (unless you wan to have a redundant setup of two qmasters, >>where it has to be on a shared device of course). >> >> >>> Could someone tell me more about this? Anyone experienced similar >issue? >>It seems to me that qmaster should handle ~100 clients without any >>substantial problem (at least machine CPU load is minimal). >> >>If your clients are using `qstat` that often, it might be good to >>throttle the number of invocations of `qstat`. If they need this to >>start other jobs, one could look into using the job_id/job_name top >>start the next, use `inotify` >>(Linux) or `qevent` (SGE) and reducing the poll-load. >> >>-- Reuti >----- > >The information contained in this e-mail and in any attachments is confidential >and is designated solely for the attention of the intended recipient(s). If you >are not an intended recipient, you must not use, disclose, copy, distribute or >retain this e-mail or any part thereof. If you have received this e-mail in >error, >please notify the sender by return e-mail and delete all copies of this e-mail >from your computer system(s). Please direct any additional queries to: >communicati...@s3group.com. Thank You. Silicon and Software Systems >Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South >County Business Park, Leopardstown, Dublin 18. > >_______________________________________________ >SGE-discuss mailing list >SGE-discuss@liv.ac.uk >https://arc.liv.ac.uk/mailman/listinfo/sge-discuss ----- The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18. _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss