Hello all,

we have a rather old installation of SGE that has been running for years
without any problems. In the last 2-3 weeks I've been experiencing an
odd problem: when issuing any command (qsub, qstat, qping, etc) I get
the following error:

    error: commlib error: access denied (server host resolves destination host 
"<server address>" as "(HOST_NOT_RESOLVABLE)")

    error: unable to contact qmaster using port 6444 on host "<server address>"

There are several odd things about this:

  * Nothing has changed on the server or the clients in the months
    before the error started appearing.
  * This happens from most of the clients, but not all.
  * The error persists for 5-10 minutes, and then everything works fine.
  * Both gethostbyname and gethostbyaddr return the correct values from
    the client while the error occurs (I haven't had a chance to try
    them from the master during these episodes).

I get a feeling that this has something to do with DNS and reverse
lookup, but I don't know where to start debugging it.

Anyone have any clue what I should look at ?

Thanks,

-- 
Valerio Luccio             (212) 998-8736
Center for Brain Imaging   4 Washington Place, Room 157
New York University        New York, NY 10003

    "In an open world, who needs windows or gates ?"

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to