05.03.2014, 04:04, "Andrew Beekhof" <and...@beekhof.net>: > On 25 Feb 2014, at 8:30 pm, Andrey Groshev <gre...@yandex.ru> wrote: > >> 21.02.2014, 12:04, "Andrey Groshev" <gre...@yandex.ru>: >>> 21.02.2014, 05:53, "Andrew Beekhof" <and...@beekhof.net>: >>>> On 19 Feb 2014, at 7:53 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>> 19.02.2014, 09:49, "Andrew Beekhof" <and...@beekhof.net>: >>>>>> On 19 Feb 2014, at 4:18 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>> 19.02.2014, 09:08, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>> On 19 Feb 2014, at 4:00 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>> wrote: >>>>>>>>> 19.02.2014, 06:48, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>> On 18 Feb 2014, at 11:05 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>>>> wrote: >>>>>>>>>>> Hi, ALL and Andrew! >>>>>>>>>>> >>>>>>>>>>> Today is a good day - I killed a lot, and a lot of shooting >>>>>>>>>>> at me. >>>>>>>>>>> In general - I am happy (almost like an elephant) :) >>>>>>>>>>> Except resources on the node are important to me eight >>>>>>>>>>> processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd. >>>>>>>>>>> I killed them with different signals (4,6,11 and even 9). >>>>>>>>>>> Behavior does not depend of number signal - it's good. >>>>>>>>>>> If STONITH send reboot to the node - it rebooted and rejoined >>>>>>>>>>> the cluster - too it's good. >>>>>>>>>>> But the behavior is different from killing various demons. >>>>>>>>>>> >>>>>>>>>>> Turned four groups: >>>>>>>>>>> 1. corosync,cib - STONITH work 100%. >>>>>>>>>>> Kill via any signals - call STONITH and reboot. >>>>>>>>>>> >>>>>>>>>>> 2. lrmd,crmd - strange behavior STONITH. >>>>>>>>>>> Sometimes called STONITH - and the corresponding reaction. >>>>>>>>>>> Sometimes restart daemon and restart resources with large >>>>>>>>>>> delay MS:pgsql. >>>>>>>>>>> One time after restart crmd - pgsql don't restart. >>>>>>>>>>> >>>>>>>>>>> 3. stonithd,attrd,pengine - not need STONITH >>>>>>>>>>> This daemons simple restart, resources - stay running. >>>>>>>>>>> >>>>>>>>>>> 4. pacemakerd - nothing happens. >>>>>>>>>>> And then I can kill any process of the third group. They do >>>>>>>>>>> not restart. >>>>>>>>>>> Generaly don't touch corosync,cib and maybe lrmd,crmd. >>>>>>>>>>> >>>>>>>>>>> What do you think about this? >>>>>>>>>>> The main question of this topic - we decided. >>>>>>>>>>> But this varied behavior - another big problem. >>>>>>>>>>> >>>>>>>>>>> Forgоt logs http://send2me.ru/pcmk-Tue-18-Feb-2014.tar.bz2 >>>>>>>>>> Which of the various conditions above do the logs cover? >>>>>>>>> All various in day. >>>>>>>> Are you trying to torture me? >>>>>>>> Can you give me a rough idea what happened when? >>>>>>> No, there is 8 processes on the 4th signal and repeats the >>>>>>> experiments with unknown outcome :) >>>>>>> Easier to conduct new experiments and individual new logs . >>>>>>> Which variant is more interesting? >>>>>> The long delay in restarting pgsql. >>>>>> Everything else seems correct. >>>>> He even don't tried start pgsql. >>>>> In Logs tree the tests. >>>>> kill -s4 lrmd pid. >>>>> 1. STONITH >>>>> 2. STONITH >>>>> 3. hangs >>>> Its waiting on a value for default_ping_set >>>> >>>> It seems we're calling monitor for pingCheck but for some reason its not >>>> performing an update: >>>> >>>> # grep 2632.*lrmd.*pingCheck >>>> /Users/beekhof/Downloads/pcmk-Wed-19-Feb-2014/dev-cluster2-node2.unix.tensor.ru/corosync.log >>>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> info: process_lrmd_get_rsc_info: Resource 'pingCheck' not found (3 active >>>> resources) >>>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> info: process_lrmd_get_rsc_info: Resource 'pingCheck:3' not found (3 >>>> active resources) >>>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> info: process_lrmd_rsc_register: Added 'pingCheck' to the rsc list (4 >>>> active resources) >>>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: log_execute: executing - rsc:pingCheck action:monitor call_id:19 >>>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: operation_finished: pingCheck_monitor_0:2658 - exited with rc=0 >>>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: operation_finished: pingCheck_monitor_0:2658:stderr [ -- empty -- ] >>>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: operation_finished: pingCheck_monitor_0:2658:stdout [ -- empty -- ] >>>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: log_finished: finished - rsc:pingCheck action:monitor call_id:19 >>>> pid:2658 exit-code:0 exec-time:2039ms queue-time:0ms >>>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: log_execute: executing - rsc:pingCheck action:monitor call_id:20 >>>> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: operation_finished: pingCheck_monitor_10000:2816 - exited with rc=0 >>>> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: operation_finished: pingCheck_monitor_10000:2816:stderr [ -- empty >>>> -- ] >>>> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >>>> debug: operation_finished: pingCheck_monitor_10000:2816:stdout [ -- empty >>>> -- ] >>>> >>>> Could you add: >>>> >>>> export OCF_TRACE_RA=1 >>>> >>>> to the top of the ping agent and retest? >>> Today the fourth time worked. >>> I even doubted if the difference is how to kill (kill -s 4 pid or pkill -4 >>> lrmd) >>> Logs http://send2me.ru/pcmk-Fri-21-Feb-2014.tar.bz2 >> Hi, >> You haven't watched it? > > Not yet. I've been hitting ACLs with a large hammer. > Where are we up to with this? Do I disregard this one and look at the most > recent email? >
Hi. No. These are two different cases. * When after kill lrmd resources don't start. This http://send2me.ru/pcmk-Fri-21-Feb-2014.tar.bz2 * When standby a entrie cluster (all nodes standby). Second node - hangs pending. But last rebuild rpm - not confirmed the problem. Therefore, this problem can be considered as long as not a problem. http://send2me.ru/pcmk-04-Mar-2014-2.tar.bz2 _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org