On 20 Feb 2014, at 10:04 pm, Andrey Groshev <gre...@yandex.ru> wrote:
> > > 20.02.2014, 13:57, "Andrew Beekhof" <and...@beekhof.net>: >> On 20 Feb 2014, at 5:33 pm, Andrey Groshev <gre...@yandex.ru> wrote: >> >>> 20.02.2014, 01:22, "Andrew Beekhof" <and...@beekhof.net>: >>>> On 20 Feb 2014, at 4:18 am, Andrey Groshev <gre...@yandex.ru> wrote: >>>>> 19.02.2014, 06:47, "Andrew Beekhof" <and...@beekhof.net>: >>>>>> On 18 Feb 2014, at 9:29 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>> Hi, ALL and Andrew! >>>>>>> >>>>>>> Today is a good day - I killed a lot, and a lot of shooting at me. >>>>>>> In general - I am happy (almost like an elephant) :) >>>>>>> Except resources on the node are important to me eight processes: >>>>>>> corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd. >>>>>>> I killed them with different signals (4,6,11 and even 9). >>>>>>> Behavior does not depend of number signal - it's good. >>>>>>> If STONITH send reboot to the node - it rebooted and rejoined the >>>>>>> cluster - too it's good. >>>>>>> But the behavior is different from killing various demons. >>>>>>> >>>>>>> Turned four groups: >>>>>>> 1. corosync,cib - STONITH work 100%. >>>>>>> Kill via any signals - call STONITH and reboot. >>>>>> excellent >>>>>>> 3. stonithd,attrd,pengine - not need STONITH >>>>>>> This daemons simple restart, resources - stay running. >>>>>> right >>>>>>> 2. lrmd,crmd - strange behavior STONITH. >>>>>>> Sometimes called STONITH - and the corresponding reaction. >>>>>>> Sometimes restart daemon >>>>>> The daemon will always try to restart, the only variable is how long >>>>>> it takes the peer to notice and initiate fencing. >>>>>> If the failure happens just before a they're due to receive totem >>>>>> token, the failure will be very quickly detected and the node fenced. >>>>>> If the failure happens just after, then detection will take longer - >>>>>> giving the node longer to recover and not be fenced. >>>>>> >>>>>> So fence/not fence is normal and to be expected. >>>>>>> and restart resources with large delay MS:pgsql. >>>>>>> One time after restart crmd - pgsql don't restart. >>>>>> I would not expect pgsql to ever restart - if the RA does its job >>>>>> properly anyway. >>>>>> In the case the node is not fenced, the crmd will respawn and the the >>>>>> PE will request that it re-detect the state of all resources. >>>>>> >>>>>> If the agent reports "all good", then there is nothing more to do. >>>>>> If the agent is not reporting "all good", you should really be asking >>>>>> why. >>>>>>> 4. pacemakerd - nothing happens. >>>>>> On non-systemd based machines, correct. >>>>>> >>>>>> On a systemd based machine pacemakerd is respawned and reattaches to >>>>>> the existing daemons. >>>>>> Any subsequent daemon failure will be detected and the daemon >>>>>> respawned. >>>>> And! I almost forgot about IT! >>>>> Exist another (NORMAL) the variants, the methods, the ideas? >>>>> Without this ... @$%#$%&$%^&$%^&##@#$$^$%& !!!!! >>>>> Otherwise - it's a full epic fail ;) >>>> -ENOPARSE >>> OK, I remove my personal attitude to "systemd". >>> Let me explain. >>> >>> Somewhere in the beginning of this topic, I wrote: >>> A.G.:Who knows who runs lrmd? >>> A.B.:Pacemakerd. >>> That's one! >>> >>> Let's see the list of processes: >>> #ps -axf >>> ..... >>> 6067 ? Ssl 7:24 corosync >>> 6092 ? S 0:25 pacemakerd >>> 6094 ? Ss 116:13 \_ /usr/libexec/pacemaker/cib >>> 6095 ? Ss 0:25 \_ /usr/libexec/pacemaker/stonithd >>> 6096 ? Ss 1:27 \_ /usr/libexec/pacemaker/lrmd >>> 6097 ? Ss 0:49 \_ /usr/libexec/pacemaker/attrd >>> 6098 ? Ss 0:25 \_ /usr/libexec/pacemaker/pengine >>> 6099 ? Ss 0:29 \_ /usr/libexec/pacemaker/crmd >>> ..... >>> That's two! >> >> Whats two? I don't follow. > In the sense that it creates other processes. But it does not matter. > > >>> And more, more... >>> Now you must understand - why I want this process to work always. >>> Even I think, No need for anyone here to explain it! >>> >>> And Now you say about "pacemakerd nice work, but only on systemd distros" >>> !!! >> >> No, I;m saying it works _better_ on systemd distros. >> On non-systemd distros you still need quite a few unlikely-to-happen >> failures to trigger a situation in which the node still gets fenced and >> recovered (assuming no-one saw any of the error messages and didn't run >> "service pacemaker restart" prior to the additional failures). >> > Can you show me the place where: > "On a systemd based machine pacemakerd is respawned and reattaches to the > existing daemons."? The code for it is in mcp/pacemaker.c, look for find_and_track_existing_processes() The ps tree will look different though 6094 ? Ss 116:13 /usr/libexec/pacemaker/cib 6095 ? Ss 0:25 /usr/libexec/pacemaker/stonithd 6096 ? Ss 1:27 /usr/libexec/pacemaker/lrmd 6097 ? Ss 0:49 /usr/libexec/pacemaker/attrd 6098 ? Ss 0:25 /usr/libexec/pacemaker/pengine 6099 ? Ss 0:29 /usr/libexec/pacemaker/crmd ... 6666 ? S 0:25 pacemakerd but pacemakerd will be watching the old children and respawning them on failure. at which point you might see: 6094 ? Ss 116:13 /usr/libexec/pacemaker/cib 6096 ? Ss 1:27 /usr/libexec/pacemaker/lrmd 6097 ? Ss 0:49 /usr/libexec/pacemaker/attrd 6098 ? Ss 0:25 /usr/libexec/pacemaker/pengine 6099 ? Ss 0:29 /usr/libexec/pacemaker/crmd ... 6666 ? S 0:25 pacemakerd 6667 ? Ss 0:25 \_ /usr/libexec/pacemaker/stonithd > If I respawn via upstart process pacemakerd - "reattaches to the existing > daemons" ? If upstart is capable of detecting the pacemakerd failure and automagically respawning it, then yes - the same process will happen. > >>> What should I do now? >>> * Integrate systemd in CentOS? >>> * Migrate to Fefora? >>> * Buy RHEL7 !? >> >> Option 3 is particularly good :) > > It's too easy. Normal heroes are always going to bypass :) > >>> Each a variants is great, but don't fit for me. >>> >>> P.S. And I'm not talking distros which don't migrate to systemd (and will >>> not do). >> >> Are there any? Even debian and ubuntu have raised the white flag. > > It certainly a lyrics, but potentially it can be any Unix-like system. > > >>> Do not be offended! We also do so. >>> We are building a secret military factory, >>> large concrete fence around it, >>> wall barbed wire, but forget to install the gates. :) >>>>>>> And then I can kill any process of the third group. They do not >>>>>>> restart. >>>>>> Until they become needed. >>>>>> Eg. if the DC goes to invoke the policy engine, that will fail causing >>>>>> the crmd to fail and the node to be fenced. >>>>>>> Generaly don't touch corosync,cib and maybe lrmd,crmd. >>>>>>> >>>>>>> What do you think about this? >>>>>>> The main question of this topic - we decided. >>>>>>> But this varied behavior - another big problem. >>>>>>> >>>>>>> 17.02.2014, 08:52, "Andrey Groshev" <gre...@yandex.ru>: >>>>>>>> 17.02.2014, 02:27, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>> With no quick follow-up, dare one hope that means the patch >>>>>>>>> worked? :-) >>>>>>>> Hi, >>>>>>>> No, unfortunately the chief changed my plans on Friday and all day >>>>>>>> I was engaged in a parallel project. >>>>>>>> I hope that today have time to carry out the necessary tests. >>>>>>>>> On 14 Feb 2014, at 3:37 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>>> wrote: >>>>>>>>>> Yes, of course. Now beginning build world and test ) >>>>>>>>>> >>>>>>>>>> 14.02.2014, 04:41, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>>> The previous patch wasn't quite right. >>>>>>>>>>> Could you try this new one? >>>>>>>>>>> >>>>>>>>>>> http://paste.fedoraproject.org/77123/13923376/ >>>>>>>>>>> >>>>>>>>>>> [11:23 AM] beekhof@f19 ~/Development/sources/pacemaker/devel ☺ >>>>>>>>>>> # git diff >>>>>>>>>>> diff --git a/crmd/callbacks.c b/crmd/callbacks.c >>>>>>>>>>> index ac4b905..d49525b 100644 >>>>>>>>>>> --- a/crmd/callbacks.c >>>>>>>>>>> +++ b/crmd/callbacks.c >>>>>>>>>>> @@ -199,8 +199,7 @@ peer_update_callback(enum crm_status_type >>>>>>>>>>> type, crm_node_t * node, const void *d >>>>>>>>>>> stop_te_timer(down->timer); >>>>>>>>>>> >>>>>>>>>>> flags |= node_update_join | >>>>>>>>>>> node_update_expected; >>>>>>>>>>> - crm_update_peer_join(__FUNCTION__, node, >>>>>>>>>>> crm_join_none); >>>>>>>>>>> - crm_update_peer_expected(__FUNCTION__, node, >>>>>>>>>>> CRMD_JOINSTATE_DOWN); >>>>>>>>>>> + crmd_peer_down(node, FALSE); >>>>>>>>>>> check_join_state(fsa_state, __FUNCTION__); >>>>>>>>>>> >>>>>>>>>>> update_graph(transition_graph, down); >>>>>>>>>>> diff --git a/crmd/crmd_utils.h b/crmd/crmd_utils.h >>>>>>>>>>> index bc472c2..1a2577a 100644 >>>>>>>>>>> --- a/crmd/crmd_utils.h >>>>>>>>>>> +++ b/crmd/crmd_utils.h >>>>>>>>>>> @@ -100,6 +100,7 @@ void crmd_join_phase_log(int level); >>>>>>>>>>> const char *get_timer_desc(fsa_timer_t * timer); >>>>>>>>>>> gboolean too_many_st_failures(void); >>>>>>>>>>> void st_fail_count_reset(const char * target); >>>>>>>>>>> +void crmd_peer_down(crm_node_t *peer, bool full); >>>>>>>>>>> >>>>>>>>>>> # define fsa_register_cib_callback(id, flag, data, fn) do { >>>>>>>>>>> \ >>>>>>>>>>> fsa_cib_conn->cmds->register_callback( >>>>>>>>>>> \ >>>>>>>>>>> diff --git a/crmd/te_actions.c b/crmd/te_actions.c >>>>>>>>>>> index f31d4ec..3bfce59 100644 >>>>>>>>>>> --- a/crmd/te_actions.c >>>>>>>>>>> +++ b/crmd/te_actions.c >>>>>>>>>>> @@ -80,11 +80,8 @@ send_stonith_update(crm_action_t * action, >>>>>>>>>>> const char *target, const char *uuid) >>>>>>>>>>> crm_info("Recording uuid '%s' for node '%s'", uuid, >>>>>>>>>>> target); >>>>>>>>>>> peer->uuid = strdup(uuid); >>>>>>>>>>> } >>>>>>>>>>> - crm_update_peer_proc(__FUNCTION__, peer, crm_proc_none, >>>>>>>>>>> NULL); >>>>>>>>>>> - crm_update_peer_state(__FUNCTION__, peer, CRM_NODE_LOST, >>>>>>>>>>> 0); >>>>>>>>>>> - crm_update_peer_expected(__FUNCTION__, peer, >>>>>>>>>>> CRMD_JOINSTATE_DOWN); >>>>>>>>>>> - crm_update_peer_join(__FUNCTION__, peer, crm_join_none); >>>>>>>>>>> >>>>>>>>>>> + crmd_peer_down(peer, TRUE); >>>>>>>>>>> node_state = >>>>>>>>>>> do_update_node_cib(peer, >>>>>>>>>>> node_update_cluster | >>>>>>>>>>> node_update_peer | node_update_join | >>>>>>>>>>> diff --git a/crmd/te_utils.c b/crmd/te_utils.c >>>>>>>>>>> index ad7e573..0c92e95 100644 >>>>>>>>>>> --- a/crmd/te_utils.c >>>>>>>>>>> +++ b/crmd/te_utils.c >>>>>>>>>>> @@ -247,10 +247,7 @@ tengine_stonith_notify(stonith_t * st, >>>>>>>>>>> stonith_event_t * st_event) >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - crm_update_peer_proc(__FUNCTION__, peer, >>>>>>>>>>> crm_proc_none, NULL); >>>>>>>>>>> - crm_update_peer_state(__FUNCTION__, peer, >>>>>>>>>>> CRM_NODE_LOST, 0); >>>>>>>>>>> - crm_update_peer_expected(__FUNCTION__, peer, >>>>>>>>>>> CRMD_JOINSTATE_DOWN); >>>>>>>>>>> - crm_update_peer_join(__FUNCTION__, peer, >>>>>>>>>>> crm_join_none); >>>>>>>>>>> + crmd_peer_down(peer, TRUE); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> diff --git a/crmd/utils.c b/crmd/utils.c >>>>>>>>>>> index 3988cfe..2df53ab 100644 >>>>>>>>>>> --- a/crmd/utils.c >>>>>>>>>>> +++ b/crmd/utils.c >>>>>>>>>>> @@ -1077,3 +1077,13 @@ update_attrd_remote_node_removed(const >>>>>>>>>>> char *host, const char *user_name) >>>>>>>>>>> crm_trace("telling attrd to clear attributes for remote >>>>>>>>>>> host %s", host); >>>>>>>>>>> update_attrd_helper(host, NULL, NULL, user_name, TRUE, >>>>>>>>>>> 'C'); >>>>>>>>>>> } >>>>>>>>>>> + >>>>>>>>>>> +void crmd_peer_down(crm_node_t *peer, bool full) >>>>>>>>>>> +{ >>>>>>>>>>> + if(full && peer->state == NULL) { >>>>>>>>>>> + crm_update_peer_state(__FUNCTION__, peer, >>>>>>>>>>> CRM_NODE_LOST, 0); >>>>>>>>>>> + crm_update_peer_proc(__FUNCTION__, peer, >>>>>>>>>>> crm_proc_none, NULL); >>>>>>>>>>> + } >>>>>>>>>>> + crm_update_peer_join(__FUNCTION__, peer, crm_join_none); >>>>>>>>>>> + crm_update_peer_expected(__FUNCTION__, peer, >>>>>>>>>>> CRMD_JOINSTATE_DOWN); >>>>>>>>>>> +} >>>>>>>>>>> >>>>>>>>>>> On 16 Jan 2014, at 7:24 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>>>>> wrote: >>>>>>>>>>>> 16.01.2014, 01:30, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>>>>> On 16 Jan 2014, at 12:41 am, Andrey Groshev >>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>> 15.01.2014, 02:53, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>>>>>>> On 15 Jan 2014, at 12:15 am, Andrey Groshev >>>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>> 14.01.2014, 10:00, "Andrey Groshev" <gre...@yandex.ru>: >>>>>>>>>>>>>>>>> 14.01.2014, 07:47, "Andrew Beekhof" >>>>>>>>>>>>>>>>> <and...@beekhof.net>: >>>>>>>>>>>>>>>>>> Ok, here's what happens: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. node2 is lost >>>>>>>>>>>>>>>>>> 2. fencing of node2 starts >>>>>>>>>>>>>>>>>> 3. node2 reboots (and cluster starts) >>>>>>>>>>>>>>>>>> 4. node2 returns to the membership >>>>>>>>>>>>>>>>>> 5. node2 is marked as a cluster member >>>>>>>>>>>>>>>>>> 6. DC tries to bring it into the cluster, but needs >>>>>>>>>>>>>>>>>> to cancel the active transition first. >>>>>>>>>>>>>>>>>> Which is a problem since the node2 fencing >>>>>>>>>>>>>>>>>> operation is part of that >>>>>>>>>>>>>>>>>> 7. node2 is in a transition (pending) state until >>>>>>>>>>>>>>>>>> fencing passes or fails >>>>>>>>>>>>>>>>>> 8a. fencing fails: transition completes and the >>>>>>>>>>>>>>>>>> node joins the cluster >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thats in theory, except we automatically try again. >>>>>>>>>>>>>>>>>> Which isn't appropriate. >>>>>>>>>>>>>>>>>> This should be relatively easy to fix. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 8b. fencing passes: the node is incorrectly marked >>>>>>>>>>>>>>>>>> as offline >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This I have no idea how to fix yet. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On another note, it doesn't look like this agent >>>>>>>>>>>>>>>>>> works at all. >>>>>>>>>>>>>>>>>> The node has been back online for a long time and >>>>>>>>>>>>>>>>>> the agent is still timing out after 10 minutes. >>>>>>>>>>>>>>>>>> So "Once the script makes sure that the victim will >>>>>>>>>>>>>>>>>> rebooted and again available via ssh - it exit with 0." does >>>>>>>>>>>>>>>>>> not seem true. >>>>>>>>>>>>>>>>> Damn. Looks like you're right. At some time I broke >>>>>>>>>>>>>>>>> my agent and had not noticed it. Who will understand. >>>>>>>>>>>>>>>> I repaired my agent - after send reboot he is wait >>>>>>>>>>>>>>>> STDIN. >>>>>>>>>>>>>>>> Returned "normally" a behavior - hangs "pending", >>>>>>>>>>>>>>>> until manually send reboot. :) >>>>>>>>>>>>>>> Right. Now you're in case 8b. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can you try this patch: >>>>>>>>>>>>>>> http://paste.fedoraproject.org/68450/38973966 >>>>>>>>>>>>>> Killed all day experiences. >>>>>>>>>>>>>> It turns out here that: >>>>>>>>>>>>>> 1. Did cluster. >>>>>>>>>>>>>> 2. On the node-2 send signal (-4) - killed corosink >>>>>>>>>>>>>> 3. From node-1 (there DC) - stonith sent reboot >>>>>>>>>>>>>> 4. Noda rebooted and resources start. >>>>>>>>>>>>>> 5. Again. On the node-2 send signal (-4) - killed corosink >>>>>>>>>>>>>> 6. Again. From node-1 (there DC) - stonith sent reboot >>>>>>>>>>>>>> 7. Noda-2 rebooted and hangs in "pending" >>>>>>>>>>>>>> 8. Waiting, waiting..... manually reboot. >>>>>>>>>>>>>> 9. Noda-2 reboot and raised resources start. >>>>>>>>>>>>>> 10. GOTO p.2 >>>>>>>>>>>>> Logs? >>>>>>>>>>>> Yesterday I wrote an additional letter why not put the logs. >>>>>>>>>>>> Read it please, it contains a few more questions. >>>>>>>>>>>> Today again began to hang and continue along the same cycle. >>>>>>>>>>>> Logs here http://send2me.ru/crmrep2.tar.bz2 >>>>>>>>>>>>>>>> New logs: http://send2me.ru/crmrep1.tar.bz2 >>>>>>>>>>>>>>>>>> On 14 Jan 2014, at 1:19 pm, Andrew Beekhof >>>>>>>>>>>>>>>>>> <and...@beekhof.net> wrote: >>>>>>>>>>>>>>>>>>> Apart from anything else, your timeout needs to >>>>>>>>>>>>>>>>>>> be bigger: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jan 13 12:21:36 [17223] >>>>>>>>>>>>>>>>>>> dev-cluster2-node1.unix.tensor.ru stonith-ng: ( >>>>>>>>>>>>>>>>>>> commands.c:1321 ) error: log_operation: Operation >>>>>>>>>>>>>>>>>>> 'reboot' [11331] (call 2 from crmd.17227) for host >>>>>>>>>>>>>>>>>>> 'dev-cluster2-node2.unix.tensor.ru' with device 'st1' >>>>>>>>>>>>>>>>>>> returned: -62 (Timer expired) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 14 Jan 2014, at 7:18 am, Andrew Beekhof >>>>>>>>>>>>>>>>>>> <and...@beekhof.net> wrote: >>>>>>>>>>>>>>>>>>>> On 13 Jan 2014, at 8:31 pm, Andrey Groshev >>>>>>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>>>>>> 13.01.2014, 02:51, "Andrew Beekhof" >>>>>>>>>>>>>>>>>>>>> <and...@beekhof.net>: >>>>>>>>>>>>>>>>>>>>>> On 10 Jan 2014, at 9:55 pm, Andrey Groshev >>>>>>>>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>>>>>>>> 10.01.2014, 14:31, "Andrey Groshev" >>>>>>>>>>>>>>>>>>>>>>> <gre...@yandex.ru>: >>>>>>>>>>>>>>>>>>>>>>>> 10.01.2014, 14:01, "Andrew Beekhof" >>>>>>>>>>>>>>>>>>>>>>>> <and...@beekhof.net>: >>>>>>>>>>>>>>>>>>>>>>>>> On 10 Jan 2014, at 5:03 pm, Andrey Groshev >>>>>>>>>>>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> 10.01.2014, 05:29, "Andrew Beekhof" >>>>>>>>>>>>>>>>>>>>>>>>>> <and...@beekhof.net>: >>>>>>>>>>>>>>>>>>>>>>>>>>> On 9 Jan 2014, at 11:11 pm, Andrey >>>>>>>>>>>>>>>>>>>>>>>>>>> Groshev <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> 08.01.2014, 06:22, "Andrew Beekhof" >>>>>>>>>>>>>>>>>>>>>>>>>>>> <and...@beekhof.net>: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29 Nov 2013, at 7:17 pm, Andrey >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Groshev <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, ALL. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still trying to cope with the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fact that after the fence - node hangs in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "pending". >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please define "pending". Where did >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you see this? >>>>>>>>>>>>>>>>>>>>>>>>>>>> In crm_mon: >>>>>>>>>>>>>>>>>>>>>>>>>>>> ...... >>>>>>>>>>>>>>>>>>>>>>>>>>>> Node dev-cluster2-node2 (172793105): >>>>>>>>>>>>>>>>>>>>>>>>>>>> pending >>>>>>>>>>>>>>>>>>>>>>>>>>>> ...... >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The experiment was like this: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Four nodes in cluster. >>>>>>>>>>>>>>>>>>>>>>>>>>>> On one of them kill corosync or >>>>>>>>>>>>>>>>>>>>>>>>>>>> pacemakerd (signal 4 or 6 oк 11). >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thereafter, the remaining start it >>>>>>>>>>>>>>>>>>>>>>>>>>>> constantly reboot, under various pretexts, "softly >>>>>>>>>>>>>>>>>>>>>>>>>>>> whistling", "fly low", "not a cluster member!" ... >>>>>>>>>>>>>>>>>>>>>>>>>>>> Then in the log fell out "Too many >>>>>>>>>>>>>>>>>>>>>>>>>>>> failures ...." >>>>>>>>>>>>>>>>>>>>>>>>>>>> All this time in the status in >>>>>>>>>>>>>>>>>>>>>>>>>>>> crm_mon is "pending". >>>>>>>>>>>>>>>>>>>>>>>>>>>> Depending on the wind direction >>>>>>>>>>>>>>>>>>>>>>>>>>>> changed to "UNCLEAN" >>>>>>>>>>>>>>>>>>>>>>>>>>>> Much time has passed and I can not >>>>>>>>>>>>>>>>>>>>>>>>>>>> accurately describe the behavior... >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Now I am in the following state: >>>>>>>>>>>>>>>>>>>>>>>>>>>> I tried locate the problem. Came here >>>>>>>>>>>>>>>>>>>>>>>>>>>> with this. >>>>>>>>>>>>>>>>>>>>>>>>>>>> I set big value in property >>>>>>>>>>>>>>>>>>>>>>>>>>>> stonith-timeout="600s". >>>>>>>>>>>>>>>>>>>>>>>>>>>> And got the following behavior: >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. pkill -4 corosync >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. from node with DC call my fence >>>>>>>>>>>>>>>>>>>>>>>>>>>> agent "sshbykey" >>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. It sends reboot victim and waits >>>>>>>>>>>>>>>>>>>>>>>>>>>> until she comes to life again. >>>>>>>>>>>>>>>>>>>>>>>>>>> Hmmm.... what version of pacemaker? >>>>>>>>>>>>>>>>>>>>>>>>>>> This sounds like a timing issue that we >>>>>>>>>>>>>>>>>>>>>>>>>>> fixed a while back >>>>>>>>>>>>>>>>>>>>>>>>>> Was a version 1.1.11 from December 3. >>>>>>>>>>>>>>>>>>>>>>>>>> Now try full update and retest. >>>>>>>>>>>>>>>>>>>>>>>>> That should be recent enough. Can you >>>>>>>>>>>>>>>>>>>>>>>>> create a crm_report the next time you reproduce? >>>>>>>>>>>>>>>>>>>>>>>> Of course yes. Little delay.... :) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ...... >>>>>>>>>>>>>>>>>>>>>>>> cc1: warnings being treated as errors >>>>>>>>>>>>>>>>>>>>>>>> upstart.c: In function >>>>>>>>>>>>>>>>>>>>>>>> ‘upstart_job_property’: >>>>>>>>>>>>>>>>>>>>>>>> upstart.c:264: error: implicit declaration >>>>>>>>>>>>>>>>>>>>>>>> of function ‘g_variant_lookup_value’ >>>>>>>>>>>>>>>>>>>>>>>> upstart.c:264: error: nested extern >>>>>>>>>>>>>>>>>>>>>>>> declaration of ‘g_variant_lookup_value’ >>>>>>>>>>>>>>>>>>>>>>>> upstart.c:264: error: assignment makes >>>>>>>>>>>>>>>>>>>>>>>> pointer from integer without a cast >>>>>>>>>>>>>>>>>>>>>>>> gmake[2]: *** [libcrmservice_la-upstart.lo] >>>>>>>>>>>>>>>>>>>>>>>> Error 1 >>>>>>>>>>>>>>>>>>>>>>>> gmake[2]: Leaving directory >>>>>>>>>>>>>>>>>>>>>>>> `/root/ha/pacemaker/lib/services' >>>>>>>>>>>>>>>>>>>>>>>> make[1]: *** [all-recursive] Error 1 >>>>>>>>>>>>>>>>>>>>>>>> make[1]: Leaving directory >>>>>>>>>>>>>>>>>>>>>>>> `/root/ha/pacemaker/lib' >>>>>>>>>>>>>>>>>>>>>>>> make: *** [core] Error 1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I'm trying to solve this a problem. >>>>>>>>>>>>>>>>>>>>>>> Do not get solved quickly... >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://developer.gnome.org/glib/2.28/glib-GVariant.html#g-variant-lookup-value >>>>>>>>>>>>>>>>>>>>>>> g_variant_lookup_value () Since 2.28 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> # yum list installed glib2 >>>>>>>>>>>>>>>>>>>>>>> Loaded plugins: fastestmirror, rhnplugin, >>>>>>>>>>>>>>>>>>>>>>> security >>>>>>>>>>>>>>>>>>>>>>> This system is receiving updates from RHN >>>>>>>>>>>>>>>>>>>>>>> Classic or Red Hat Satellite. >>>>>>>>>>>>>>>>>>>>>>> Loading mirror speeds from cached hostfile >>>>>>>>>>>>>>>>>>>>>>> Installed Packages >>>>>>>>>>>>>>>>>>>>>>> glib2.x86_64 >>>>>>>>>>>>>>>>>>>>>>> 2.26.1-3.el6 >>>>>>>>>>>>>>>>>>>>>>> installed >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> # cat /etc/issue >>>>>>>>>>>>>>>>>>>>>>> CentOS release 6.5 (Final) >>>>>>>>>>>>>>>>>>>>>>> Kernel \r on an \m >>>>>>>>>>>>>>>>>>>>>> Can you try this patch? >>>>>>>>>>>>>>>>>>>>>> Upstart jobs wont work, but the code will >>>>>>>>>>>>>>>>>>>>>> compile >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> diff --git a/lib/services/upstart.c >>>>>>>>>>>>>>>>>>>>>> b/lib/services/upstart.c >>>>>>>>>>>>>>>>>>>>>> index 831e7cf..195c3a4 100644 >>>>>>>>>>>>>>>>>>>>>> --- a/lib/services/upstart.c >>>>>>>>>>>>>>>>>>>>>> +++ b/lib/services/upstart.c >>>>>>>>>>>>>>>>>>>>>> @@ -231,12 +231,21 @@ upstart_job_exists(const >>>>>>>>>>>>>>>>>>>>>> char *name) >>>>>>>>>>>>>>>>>>>>>> static char * >>>>>>>>>>>>>>>>>>>>>> upstart_job_property(const char *obj, const >>>>>>>>>>>>>>>>>>>>>> gchar * iface, const char *name) >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> + char *output = NULL; >>>>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>>>> +#if !GLIB_CHECK_VERSION(2,28,0) >>>>>>>>>>>>>>>>>>>>>> + static bool err = TRUE; >>>>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>>>> + if(err) { >>>>>>>>>>>>>>>>>>>>>> + crm_err("This version of glib is too >>>>>>>>>>>>>>>>>>>>>> old to support upstart jobs"); >>>>>>>>>>>>>>>>>>>>>> + err = FALSE; >>>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>>> +#else >>>>>>>>>>>>>>>>>>>>>> GError *error = NULL; >>>>>>>>>>>>>>>>>>>>>> GDBusProxy *proxy; >>>>>>>>>>>>>>>>>>>>>> GVariant *asv = NULL; >>>>>>>>>>>>>>>>>>>>>> GVariant *value = NULL; >>>>>>>>>>>>>>>>>>>>>> GVariant *_ret = NULL; >>>>>>>>>>>>>>>>>>>>>> - char *output = NULL; >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> crm_info("Calling GetAll on %s", obj); >>>>>>>>>>>>>>>>>>>>>> proxy = get_proxy(obj, BUS_PROPERTY_IFACE); >>>>>>>>>>>>>>>>>>>>>> @@ -272,6 +281,7 @@ upstart_job_property(const >>>>>>>>>>>>>>>>>>>>>> char *obj, const gchar * iface, const char *name) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> g_object_unref(proxy); >>>>>>>>>>>>>>>>>>>>>> g_variant_unref(_ret); >>>>>>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>>>>>> return output; >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> Ok :) I patch source. >>>>>>>>>>>>>>>>>>>>> Type "make rc" - the same error. >>>>>>>>>>>>>>>>>>>> Because its not building your local changes >>>>>>>>>>>>>>>>>>>>> Make new copy via "fetch" - the same error. >>>>>>>>>>>>>>>>>>>>> It seems that if not exist >>>>>>>>>>>>>>>>>>>>> ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz, then >>>>>>>>>>>>>>>>>>>>> download it. >>>>>>>>>>>>>>>>>>>>> Otherwise use exist archive. >>>>>>>>>>>>>>>>>>>>> Cutted log ....... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> # make rc >>>>>>>>>>>>>>>>>>>>> make TAG=Pacemaker-1.1.11-rc3 rpm >>>>>>>>>>>>>>>>>>>>> make[1]: Entering directory `/root/ha/pacemaker' >>>>>>>>>>>>>>>>>>>>> rm -f pacemaker-dirty.tar.* pacemaker-tip.tar.* >>>>>>>>>>>>>>>>>>>>> pacemaker-HEAD.tar.* >>>>>>>>>>>>>>>>>>>>> if [ ! -f >>>>>>>>>>>>>>>>>>>>> ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz ]; then >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> rm -f pacemaker.tar.*; >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> if [ Pacemaker-1.1.11-rc3 = dirty ]; >>>>>>>>>>>>>>>>>>>>> then \ >>>>>>>>>>>>>>>>>>>>> git commit -m "DO-NOT-PUSH" -a; >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> git archive >>>>>>>>>>>>>>>>>>>>> --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ HEAD >>>>>>>>>>>>>>>>>>>>> | gzip > >>>>>>>>>>>>>>>>>>>>> ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \ >>>>>>>>>>>>>>>>>>>>> git reset --mixed HEAD^; >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> git archive >>>>>>>>>>>>>>>>>>>>> --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ >>>>>>>>>>>>>>>>>>>>> Pacemaker-1.1.11-rc3 | gzip > >>>>>>>>>>>>>>>>>>>>> ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \ >>>>>>>>>>>>>>>>>>>>> fi; >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> echo `date`: Rebuilt >>>>>>>>>>>>>>>>>>>>> ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> echo `date`: Using existing tarball: >>>>>>>>>>>>>>>>>>>>> ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; >>>>>>>>>>>>>>>>>>>>> \ >>>>>>>>>>>>>>>>>>>>> fi >>>>>>>>>>>>>>>>>>>>> Mon Jan 13 13:23:21 MSK 2014: Using existing >>>>>>>>>>>>>>>>>>>>> tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz >>>>>>>>>>>>>>>>>>>>> ....... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Well, "make rpm" - build rpms and I create >>>>>>>>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>>>>>>> I spent the same tests and confirmed the >>>>>>>>>>>>>>>>>>>>> behavior. >>>>>>>>>>>>>>>>>>>>> crm_reoprt log here - >>>>>>>>>>>>>>>>>>>>> http://send2me.ru/crmrep.tar.bz2 >>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> Pacemaker mailing list: >>>>>>>>>>>>>>>>>> Pacemaker@oss.clusterlabs.org >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>>>>>>>> Getting started: >>>>>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>>>>>>> Getting started: >>>>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>>>>>> Getting started: >>>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>>>>>> , >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>>>>> Getting started: >>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>>>> >>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>>>> Getting started: >>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>>>> , >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>>> >>>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>>> Getting started: >>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>>> >>>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>>> Getting started: >>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>> , >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>>> >>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>> Getting started: >>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>> _______________________________________________ >>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>>> >>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>> Getting started: >>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>> , >>>>>>>>> _______________________________________________ >>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>> _______________________________________________ >>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> _______________________________________________ >>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> , >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> , >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> , >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org