On Thu, Apr 19, 2012 at 7:06 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > 19.04.2012 11:24, Andreas Kurz wrote: >> On 04/18/2012 11:46 PM, ruslan usifov wrote: >>> >>> >>> 2012/4/18 Andreas Kurz <andr...@hastexo.com <mailto:andr...@hastexo.com>> >>> >>> On 04/17/2012 09:31 PM, ruslan usifov wrote: >>> > >>> > >>> > 2012/4/17 Proskurin Kirill <k.prosku...@corp.mail.ru >>> <mailto:k.prosku...@corp.mail.ru> >>> > <mailto:k.prosku...@corp.mail.ru <mailto:k.prosku...@corp.mail.ru>>> >>> > >>> > On 04/17/2012 03:46 PM, ruslan usifov wrote: >>> > >>> > 2012/4/17 Andreas Kurz <andr...@hastexo.com >>> <mailto:andr...@hastexo.com> >>> > <mailto:andr...@hastexo.com <mailto:andr...@hastexo.com>> >>> <mailto:andr...@hastexo.com <mailto:andr...@hastexo.com> >>> > <mailto:andr...@hastexo.com <mailto:andr...@hastexo.com>>>> >>> > >>> > >>> > On 04/14/2012 11:14 PM, ruslan usifov wrote: >>> > > Hello >>> > > >>> > > I remove 2 nodes from cluster, with follow sequence: >>> > > >>> > > crm_node --force -R <id of node1> >>> > > crm_node --force -R <id of node2> >>> > > cibadmin --delete --obj_type nodes --crm_xml '<node >>> > uname="node1"/>' >>> > > cibadmin --delete --obj_type status --crm_xml >>> '<node_state >>> > uname="node1"/>' >>> > > cibadmin --delete --obj_type nodes --crm_xml '<node >>> > uname="node2"/>' >>> > > cibadmin --delete --obj_type status --crm_xml >>> '<node_state >>> > uname="node2"/>' >>> > > >>> > > >>> > > Nodes after this deleted, but if for example i restart >>> > (reboot) >>> > one of >>> > > existent nodes in working cluster, this deleted nodes >>> > appear again in >>> > > OFFLINE state >>> > >>> > >>> > I have this problem some time ago. >>> > I "solved" it something like that: >>> > >>> > crm node delete NODENAME >>> > crm_node --force --remove NODENAME >>> > cibadmin --delete --obj_type nodes --crm_xml '<node >>> uname="NODENAME"/>' >>> > cibadmin --delete --obj_type status --crm_xml '<node_state >>> > uname="NODENAME"/>' >>> > >>> > -- >>> > >>> > >>> > I do the same, but some times after cluster reconfiguration (node >>> failed >>> > due power supply failure) removed nodes appear again, and this happens >>> > 3-4 times >>> >>> And the same behavior if you switch your cluster into maintenance-mode >>> (to avoid service downtime) and stop/start pacemaker and corosync >>> completely? >>> >>> >>> We will have maintenance window at this Friday (20.04.2012) so after >>> that i can report more info. >> >> Of course, that is the safest option ... though you won't have a service >> downtime if you enable maintenance-mode prior to cluster restart. > > Unless you are using DLM (CLVM, GFS2, OCFS2). Then you should not stop > corosync - dlm_controld uses CPG. > > And, DLM may use pacemaker parts for fencing (cib, attrd, stonith, > depending on version). > >> >>> >>> PS: I had similar situation on other cluster some times ago, and there i >>> fully restart cluster and problem reproduced. But after some time(about >>> 1-2 week) not existent nodes have ceased to appear >> >> Now that is really strange ... if that happens again, the >> corosync/pacemaker log files would be really interesting to have a look at. > > I recall that is a known issue for a rather long time. > One need to do a full (not rolling) restart to make node fully disappear. > I checked this again not so long ago, and yes, node deletion does not > work with current master branch (or very close to it) - it appears again > after pacemaker restart on any other node.
Not really enough info do anything about. > > May be it is because of lrmd cache, like with failed actions? It looks > very similar to that. Nope. The cache is for the local node, if the node is gone so is its cache. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org