I'm using external/ssh in my test cluster (a bunch of vm's), and for some reason the cluster has tried to terminate it but failed, like:
ct 14 15:54:45 ctest0 stonith-ng: [2006]: info: call_remote_stonith: Requesting that ctest2 perform op off ctest1 Oct 14 15:54:45 ctest0 stonith-ng: [2006]: info: call_remote_stonith: No remaining peers capable of terminating ctest1 Oct 14 15:54:45 ctest0 stonith-ng: [2006]: ERROR: remote_op_done: Already sent notifications for 'off of ctest1 by (null)' (op=6ca32814-1272-482f-bb67-f0b46daef78b, for=d49c9501-bcd3-4563-87a0-303f1b6d4c22, state=1): Operation timed out Oct 14 15:54:46 ctest0 stonith-ng: [2006]: info: call_remote_stonith: Requesting that ctest2 perform op off ctest1 Oct 14 15:54:46 ctest0 stonith-ng: [2006]: info: call_remote_stonith: No remaining peers capable of terminating ctest1 Oct 14 15:54:46 ctest0 stonith-ng: [2006]: ERROR: remote_op_done: Already sent notifications for 'off of ctest1 by (null)' (op=6ca32814-1272-482f-bb67-f0b46daef78b, for=d49c9501-bcd3-4563-87a0-303f1b6d4c22, state=1): Operation timed out Oct 14 15:54:47 ctest0 stonith-ng: [2006]: info: call_remote_stonith: Requesting that ctest2 perform op off ctest1 Oct 14 15:54:47 ctest0 stonith-ng: [2006]: info: call_remote_stonith: No remaining peers capable of terminating ctest1 Oct 14 15:54:47 ctest0 stonith-ng: [2006]: ERROR: remote_op_done: Already sent notifications for 'off of ctest1 by (null)' (op=6ca32814-1272-482f-bb67-f0b46daef78b, for=d49c9501-bcd3-4563-87a0-303f1b6d4c22, state=1): Operation timed out Oct 14 15:54:48 ctest0 stonith-ng: [2006]: info: call_remote_stonith: Requesting that ctest2 perform op off ctest1 Oct 14 15:54:48 ctest0 stonith-ng: [2006]: info: call_remote_stonith: No remaining peers capable of terminating ctest1 Oct 14 15:54:48 ctest0 stonith-ng: [2006]: ERROR: remote_op_done: Already sent notifications for 'off of ctest1 by (null)' (op=6ca32814-1272-482f-bb67-f0b46daef78b, for=d49c9501-bcd3-4563-87a0-303f1b6d4c22, state=1): Operation timed out And I'll look into why that is, but the result is that there are 20 'at' jobs in the queue and every time the machine starts up it shuts down again. Easy enough to fix but it probably shouldn't happen (even for external/ssh which is advertised as 'not for production'). Is there a way to schedule an at job in such a way that it cancels a currently scheduled job? I can't see one... James _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org