On 10/16/2010 at 09:45 AM, Jai <away...@gmail.com> wrote: > I have setup a DRBD->Xen failover cluster. Last night at around 02:50 it > failed > the resources from server "bravo" to "alpha". I'm trying to find out what > caused the failover of resources. I don't see anything in the logs that > indicate the cause but I don't really know what to look for. If someone could > > help me understand these logs and what I'm looking for would be great. I'm > not even sure how far back I need to go.
I reckon it's this: Oct 16 02:46:04 bravo attrd: [25098]: info: attrd_perform_update: Sent update 161: pingval=0 Which suggests bravo lost connectivity to 12.12.12.1 around that time, causing the failover. For reference, if you're looking at pengine logs... A few lines above where it says "info: process_pe_message: Transition NNN: PEngine Input stored in: /var/lib/pengine/pe-input-MMM.bz2", you'll see what it's about to do to your resources. If this is just: "Leave resource FOO (Started/Master/Slave etc.)" that transition is probably boring. If it says "Start FOO (...)" or "Promote/Demote/Stop FOO (...)", it means something has changed. Scroll up a bit, to above where pengine is saying "unpack_config", "determine_node_status" etc. and you should see a message suggesting the cause for the change (failed op, timeout, ping attribute modified, etc.) It might be a bit inscrutable sometimes, but it'll be there somewhere... HTH Tim -- Tim Serong <tser...@novell.com> Senior Clustering Engineer, OPS Engineering, Novell Inc. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker