Hello, does anyone have an idea ?
it seems that at 13:06:38 resources et started on slave member. But then there is something wrong on server01 : Feb 8 13:06:39 server01 pengine: [19469]: info: determine_online_status: Node server01 is online Feb 8 13:06:39 server01 pengine: [19469]: notice: unpack_rsc_op: Operation apache2_monitor_0 found resource apache2 active on server01 Feb 8 13:06:39 server01 pengine: [19469]: notice: group_print: Resource Group: supervision-grp Feb 8 13:06:39 server01 pengine: [19469]: notice: native_print: fs-data (ocf::heartbeat:Filesystem): Stopped Feb 8 13:06:39 server01 pengine: [19469]: notice: native_print: nagios-ip (ocf::heartbeat:IPaddr2): Stopped Feb 8 13:06:39 server01 pengine: [19469]: notice: native_print: apache2 (ocf::heartbeat:apache): Started server01 Feb 8 13:06:39 server01 pengine: [19469]: notice: native_print: nagios (lsb:nagios3): Stopped But I don't understand what fails if this is DRBD or apache2 causes the issue. Any idea ? On 10 February 2012 09:39, Hugo Deprez <hugo.dep...@gmail.com> wrote: > Hello, > > please found attach to this mail the corosync logs. > If you have any tips :) > > > > Regards, > > Hugo > > > On 8 February 2012 15:39, Florian Haas <flor...@hastexo.com> wrote: > >> On Wed, Feb 8, 2012 at 2:29 PM, Hugo Deprez <hugo.dep...@gmail.com> >> wrote: >> > Dear community, >> > >> > I am currently running different corosync / drbd cluster using VM >> running on >> > vmware esxi host. >> > Guest Os are Debian Squeeze. >> > >> > the active member of the cluster just freeze the VM was unreachable. >> > But the resources didn't achieved to move to the other node. >> > >> > My cluster has the following ressources : >> > >> > Resource Group: grp >> > fs-data (ocf::heartbeat:Filesystem): >> > nagios-ip (ocf::heartbeat:IPaddr2): >> > apache2 (ocf::heartbeat:apache): >> > nagios (lsb:nagios3): >> > pnp (lsb:npcd): >> > >> > >> > I am currently troubleshooting this issue. I don't really know where to >> > look. Of course I had a look at the logs, but it is pretty hard for me >> to >> > understand what happen. >> >> It's pretty hard for anyone else to understand _without_ logs. :) >> >> > I noticed that the VM crash at 12:09 and that the cluster only try to >> move >> > the ressources at 12:58, this does not make sens for me. Or maybe the >> host >> > wasn't totaly down ? >> > >> > Do you have any idea how I can troubleshoot ? >> >> Log analysis is where I would start. >> >> > Last thing, I notice that If I start apache2 on the slave server, >> corosync >> > didn't detect that the resource is started, could that be an issue ? >> >> Sure it could, but Pacemaker should happily recover from that. >> >> Cheers, >> Florian >> >> -- >> Need help with High Availability? >> http://www.hastexo.com/now >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org