21.02.2014, 12:04, "Andrey Groshev" <gre...@yandex.ru>: > 21.02.2014, 05:53, "Andrew Beekhof" <and...@beekhof.net>: > >> On 19 Feb 2014, at 7:53 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>> 19.02.2014, 09:49, "Andrew Beekhof" <and...@beekhof.net>: >>>> On 19 Feb 2014, at 4:18 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>> 19.02.2014, 09:08, "Andrew Beekhof" <and...@beekhof.net>: >>>>>> On 19 Feb 2014, at 4:00 pm, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>> 19.02.2014, 06:48, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>> On 18 Feb 2014, at 11:05 pm, Andrey Groshev <gre...@yandex.ru> >>>>>>>> wrote: >>>>>>>>> Hi, ALL and Andrew! >>>>>>>>> >>>>>>>>> Today is a good day - I killed a lot, and a lot of shooting at >>>>>>>>> me. >>>>>>>>> In general - I am happy (almost like an elephant) :) >>>>>>>>> Except resources on the node are important to me eight >>>>>>>>> processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd. >>>>>>>>> I killed them with different signals (4,6,11 and even 9). >>>>>>>>> Behavior does not depend of number signal - it's good. >>>>>>>>> If STONITH send reboot to the node - it rebooted and rejoined >>>>>>>>> the cluster - too it's good. >>>>>>>>> But the behavior is different from killing various demons. >>>>>>>>> >>>>>>>>> Turned four groups: >>>>>>>>> 1. corosync,cib - STONITH work 100%. >>>>>>>>> Kill via any signals - call STONITH and reboot. >>>>>>>>> >>>>>>>>> 2. lrmd,crmd - strange behavior STONITH. >>>>>>>>> Sometimes called STONITH - and the corresponding reaction. >>>>>>>>> Sometimes restart daemon and restart resources with large delay >>>>>>>>> MS:pgsql. >>>>>>>>> One time after restart crmd - pgsql don't restart. >>>>>>>>> >>>>>>>>> 3. stonithd,attrd,pengine - not need STONITH >>>>>>>>> This daemons simple restart, resources - stay running. >>>>>>>>> >>>>>>>>> 4. pacemakerd - nothing happens. >>>>>>>>> And then I can kill any process of the third group. They do not >>>>>>>>> restart. >>>>>>>>> Generaly don't touch corosync,cib and maybe lrmd,crmd. >>>>>>>>> >>>>>>>>> What do you think about this? >>>>>>>>> The main question of this topic - we decided. >>>>>>>>> But this varied behavior - another big problem. >>>>>>>>> >>>>>>>>> Forgоt logs http://send2me.ru/pcmk-Tue-18-Feb-2014.tar.bz2 >>>>>>>> Which of the various conditions above do the logs cover? >>>>>>> All various in day. >>>>>> Are you trying to torture me? >>>>>> Can you give me a rough idea what happened when? >>>>> No, there is 8 processes on the 4th signal and repeats the experiments >>>>> with unknown outcome :) >>>>> Easier to conduct new experiments and individual new logs . >>>>> Which variant is more interesting? >>>> The long delay in restarting pgsql. >>>> Everything else seems correct. >>> He even don't tried start pgsql. >>> In Logs tree the tests. >>> kill -s4 lrmd pid. >>> 1. STONITH >>> 2. STONITH >>> 3. hangs >> Its waiting on a value for default_ping_set >> >> It seems we're calling monitor for pingCheck but for some reason its not >> performing an update: >> >> # grep 2632.*lrmd.*pingCheck >> /Users/beekhof/Downloads/pcmk-Wed-19-Feb-2014/dev-cluster2-node2.unix.tensor.ru/corosync.log >> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> info: process_lrmd_get_rsc_info: Resource 'pingCheck' not found (3 active >> resources) >> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> info: process_lrmd_get_rsc_info: Resource 'pingCheck:3' not found (3 active >> resources) >> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> info: process_lrmd_rsc_register: Added 'pingCheck' to the rsc list (4 active >> resources) >> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: log_execute: executing - rsc:pingCheck action:monitor call_id:19 >> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: operation_finished: pingCheck_monitor_0:2658 - exited with rc=0 >> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: operation_finished: pingCheck_monitor_0:2658:stderr [ -- empty -- ] >> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: operation_finished: pingCheck_monitor_0:2658:stdout [ -- empty -- ] >> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: log_finished: finished - rsc:pingCheck action:monitor call_id:19 >> pid:2658 exit-code:0 exec-time:2039ms queue-time:0ms >> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: log_execute: executing - rsc:pingCheck action:monitor call_id:20 >> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: operation_finished: pingCheck_monitor_10000:2816 - exited with rc=0 >> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: operation_finished: pingCheck_monitor_10000:2816:stderr [ -- empty -- >> ] >> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: >> debug: operation_finished: pingCheck_monitor_10000:2816:stdout [ -- empty -- >> ] >> >> Could you add: >> >> export OCF_TRACE_RA=1 >> >> to the top of the ping agent and retest? > > Today the fourth time worked. > I even doubted if the difference is how to kill (kill -s 4 pid or pkill -4 > lrmd) > Logs http://send2me.ru/pcmk-Fri-21-Feb-2014.tar.bz2 Hi, You haven't watched it?
>>> http://send2me.ru/pcmk-Wed-19-Feb-2014.tar.bz2 >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> , >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org