Hi,

> Am 08.12.2016 um 04:18 schrieb Tim Landscheidt <t...@tim-landscheidt.de>:
> 
> (anonymous) wrote:
> 
>>> with gridengine-master 6.2u5-7.3 (Ubuntu Trusty), our
>>> /var/lib/gridengine/spool/qmaster/messages gets constantly
>>> filled with:
> 
>>> | 12/07/2016 04:11:43|worker|tools-grid-master|E|got load report of unknown 
>>> exec host "tools-exec-1204.eqiad.wmflabs"
> 
>>> (tools-exec-1204.eqiad.wmflabs is a host that no longer
>>> exists.)
> 
>>> How can I convince the grid master to "move on",
>>> i. e. "accept" that it did receive a load report from an
>>> unknown host, or "delete" the load report from its inbox?
> 
>> Do you have any custom load sensors defined, either on a
>> global or local level per exechost? The machine in question
>> was completely removed and shut down?
> 
> I don't think we have any custom load sensors defined, but
> your latter question caused me reconsider the facts: The
> host was shut down, removed from DNS and an entry for that
> host removed from
> /var/lib/gridengine/default/common/host_aliases, /but/ the
> grid master had not been restarted afterwards, i. e. it was
> still working with the old host_aliases that had an entry
> for that host.  After "service gridengine-master restart",
> the error no longer shows up in
> /var/lib/gridengine/spool/qmaster/messages.  So I assume the
> outdated host_aliases confused the grid master.

Usually the host_aliases file is read live, i.e. you can change any entry 
therein and it will be honored instantly without a restart. Maybe the order is 
important, i.e. in your case to remove the host first from exechosts lists, 
then from the host_aliases and finally from the network completely would have 
given instant success.

-- Reuti


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to