Hi, > Am 08.12.2016 um 04:18 schrieb Tim Landscheidt <t...@tim-landscheidt.de>: > > (anonymous) wrote: > >>> with gridengine-master 6.2u5-7.3 (Ubuntu Trusty), our >>> /var/lib/gridengine/spool/qmaster/messages gets constantly >>> filled with: > >>> | 12/07/2016 04:11:43|worker|tools-grid-master|E|got load report of unknown >>> exec host "tools-exec-1204.eqiad.wmflabs" > >>> (tools-exec-1204.eqiad.wmflabs is a host that no longer >>> exists.) > >>> How can I convince the grid master to "move on", >>> i. e. "accept" that it did receive a load report from an >>> unknown host, or "delete" the load report from its inbox? > >> Do you have any custom load sensors defined, either on a >> global or local level per exechost? The machine in question >> was completely removed and shut down? > > I don't think we have any custom load sensors defined, but > your latter question caused me reconsider the facts: The > host was shut down, removed from DNS and an entry for that > host removed from > /var/lib/gridengine/default/common/host_aliases, /but/ the > grid master had not been restarted afterwards, i. e. it was > still working with the old host_aliases that had an entry > for that host. After "service gridengine-master restart", > the error no longer shows up in > /var/lib/gridengine/spool/qmaster/messages. So I assume the > outdated host_aliases confused the grid master.
Usually the host_aliases file is read live, i.e. you can change any entry therein and it will be honored instantly without a restart. Maybe the order is important, i.e. in your case to remove the host first from exechosts lists, then from the host_aliases and finally from the network completely would have given instant success. -- Reuti _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users