13.01.2014, 02:51, "Andrew Beekhof" <and...@beekhof.net>: > On 10 Jan 2014, at 9:55 pm, Andrey Groshev <gre...@yandex.ru> wrote: > >> 10.01.2014, 14:31, "Andrey Groshev" <gre...@yandex.ru>: >>> 10.01.2014, 14:01, "Andrew Beekhof" <and...@beekhof.net>: >>>> On 10 Jan 2014, at 5:03 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>> 10.01.2014, 05:29, "Andrew Beekhof" <and...@beekhof.net>: >>>>>> On 9 Jan 2014, at 11:11 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>> 08.01.2014, 06:22, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>> On 29 Nov 2013, at 7:17 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>> wrote: >>>>>>>>> Hi, ALL. >>>>>>>>> >>>>>>>>> I'm still trying to cope with the fact that after the fence - >>>>>>>>> node hangs in "pending". >>>>>>>> Please define "pending". Where did you see this? >>>>>>> In crm_mon: >>>>>>> ...... >>>>>>> Node dev-cluster2-node2 (172793105): pending >>>>>>> ...... >>>>>>> >>>>>>> The experiment was like this: >>>>>>> Four nodes in cluster. >>>>>>> On one of them kill corosync or pacemakerd (signal 4 or 6 oк 11). >>>>>>> Thereafter, the remaining start it constantly reboot, under >>>>>>> various pretexts, "softly whistling", "fly low", "not a cluster >>>>>>> member!" ... >>>>>>> Then in the log fell out "Too many failures ...." >>>>>>> All this time in the status in crm_mon is "pending". >>>>>>> Depending on the wind direction changed to "UNCLEAN" >>>>>>> Much time has passed and I can not accurately describe the >>>>>>> behavior... >>>>>>> >>>>>>> Now I am in the following state: >>>>>>> I tried locate the problem. Came here with this. >>>>>>> I set big value in property stonith-timeout="600s". >>>>>>> And got the following behavior: >>>>>>> 1. pkill -4 corosync >>>>>>> 2. from node with DC call my fence agent "sshbykey" >>>>>>> 3. It sends reboot victim and waits until she comes to life again. >>>>>> Hmmm.... what version of pacemaker? >>>>>> This sounds like a timing issue that we fixed a while back >>>>> Was a version 1.1.11 from December 3. >>>>> Now try full update and retest. >>>> That should be recent enough. Can you create a crm_report the next time >>>> you reproduce? >>> Of course yes. Little delay.... :) >>> >>> ...... >>> cc1: warnings being treated as errors >>> upstart.c: In function ‘upstart_job_property’: >>> upstart.c:264: error: implicit declaration of function >>> ‘g_variant_lookup_value’ >>> upstart.c:264: error: nested extern declaration of ‘g_variant_lookup_value’ >>> upstart.c:264: error: assignment makes pointer from integer without a cast >>> gmake[2]: *** [libcrmservice_la-upstart.lo] Error 1 >>> gmake[2]: Leaving directory `/root/ha/pacemaker/lib/services' >>> make[1]: *** [all-recursive] Error 1 >>> make[1]: Leaving directory `/root/ha/pacemaker/lib' >>> make: *** [core] Error 1 >>> >>> I'm trying to solve this a problem. >> Do not get solved quickly... >> >> >> https://developer.gnome.org/glib/2.28/glib-GVariant.html#g-variant-lookup-value >> g_variant_lookup_value () Since 2.28 >> >> # yum list installed glib2 >> Loaded plugins: fastestmirror, rhnplugin, security >> This system is receiving updates from RHN Classic or Red Hat Satellite. >> Loading mirror speeds from cached hostfile >> Installed Packages >> glib2.x86_64 >> 2.26.1-3.el6 >> installed >> >> # cat /etc/issue >> CentOS release 6.5 (Final) >> Kernel \r on an \m > > Can you try this patch? > Upstart jobs wont work, but the code will compile > > diff --git a/lib/services/upstart.c b/lib/services/upstart.c > index 831e7cf..195c3a4 100644 > --- a/lib/services/upstart.c > +++ b/lib/services/upstart.c > @@ -231,12 +231,21 @@ upstart_job_exists(const char *name) > static char * > upstart_job_property(const char *obj, const gchar * iface, const char *name) > { > + char *output = NULL; > + > +#if !GLIB_CHECK_VERSION(2,28,0) > + static bool err = TRUE; > + > + if(err) { > + crm_err("This version of glib is too old to support upstart jobs"); > + err = FALSE; > + } > +#else > GError *error = NULL; > GDBusProxy *proxy; > GVariant *asv = NULL; > GVariant *value = NULL; > GVariant *_ret = NULL; > - char *output = NULL; > > crm_info("Calling GetAll on %s", obj); > proxy = get_proxy(obj, BUS_PROPERTY_IFACE); > @@ -272,6 +281,7 @@ upstart_job_property(const char *obj, const gchar * > iface, const char *name) > > g_object_unref(proxy); > g_variant_unref(_ret); > +#endif > return output; > } >
Ok :) I patch source. Type "make rc" - the same error. Make new copy via "fetch" - the same error. It seems that if not exist ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz, then download it. Otherwise use exist archive. Cutted log ....... # make rc make TAG=Pacemaker-1.1.11-rc3 rpm make[1]: Entering directory `/root/ha/pacemaker' rm -f pacemaker-dirty.tar.* pacemaker-tip.tar.* pacemaker-HEAD.tar.* if [ ! -f ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz ]; then \ rm -f pacemaker.tar.*; \ if [ Pacemaker-1.1.11-rc3 = dirty ]; then \ git commit -m "DO-NOT-PUSH" -a; \ git archive --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ HEAD | gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \ git reset --mixed HEAD^; \ else \ git archive --prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ Pacemaker-1.1.11-rc3 | gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \ fi; \ echo `date`: Rebuilt ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \ else \ echo `date`: Using existing tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \ fi Mon Jan 13 13:23:21 MSK 2014: Using existing tarball: ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz ....... Well, "make rpm" - build rpms and I create cluster. I spent the same tests and confirmed the behavior. crm_reoprt log here - http://send2me.ru/crmrep.tar.bz2 >>>>>>> Once the script makes sure that the victim will rebooted and >>>>>>> again available via ssh - it exit with 0. >>>>>>> All command is logged both the victim and the killer - all right. >>>>>>> 4. A little later, the status of the (victim) nodes in crm_mon >>>>>>> changes to online. >>>>>>> 5. BUT... not one resource don't start! Despite the fact that >>>>>>> "crm_simalate -sL" shows the correct resource to start: >>>>>>> * Start pingCheck:3 (dev-cluster2-node2) >>>>>>> 6. In this state, we spend the next 600 seconds. >>>>>>> After completing this timeout causes another node (not DC) >>>>>>> decides to kill again our victim. >>>>>>> All command again is logged both the victim and the killer - All >>>>>>> documented :) >>>>>>> 7. NOW all resource started in right sequence. >>>>>>> >>>>>>> I almost happy, but I do not like: two reboots and 10 minutes of >>>>>>> waiting ;) >>>>>>> And if something happens on another node, this the behavior is >>>>>>> superimposed on old and not any resources not start until the last node >>>>>>> will not reload twice. >>>>>>> >>>>>>> I tried understood this behavior. >>>>>>> As I understand it: >>>>>>> 1. Ultimately, in ./lib/fencing/st_client.c call >>>>>>> internal_stonith_action_execute(). >>>>>>> 2. It make fork and pipe from tham. >>>>>>> 3. Async call mainloop_child_add with callback to >>>>>>> stonith_action_async_done. >>>>>>> 4. Add timeout g_timeout_add to TERM and KILL signals. >>>>>>> >>>>>>> If all right must - call stonith_action_async_done, remove timeout. >>>>>>> For some reason this does not happen. I sit and think .... >>>>>>>>> At this time, there are constant re-election. >>>>>>>>> Also, I noticed the difference when you start pacemaker. >>>>>>>>> At normal startup: >>>>>>>>> * corosync >>>>>>>>> * pacemakerd >>>>>>>>> * attrd >>>>>>>>> * pengine >>>>>>>>> * lrmd >>>>>>>>> * crmd >>>>>>>>> * cib >>>>>>>>> >>>>>>>>> When hangs start: >>>>>>>>> * corosync >>>>>>>>> * pacemakerd >>>>>>>>> * attrd >>>>>>>>> * pengine >>>>>>>>> * crmd >>>>>>>>> * lrmd >>>>>>>>> * cib. >>>>>>>> Are you referring to the order of the daemons here? >>>>>>>> The cib should not be at the bottom in either case. >>>>>>>>> Who knows who runs lrmd? >>>>>>>> Pacemakerd. >>>>>>>>> _______________________________________________ >>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>> , >>>>>>>> _______________________________________________ >>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> _______________________________________________ >>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> , >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> , >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > , > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org