On 11 Nov 2013, at 5:08 pm, yusuke iida <yusk.i...@gmail.com> wrote:
> Hi, Andrew > > I tested by the following versions. > https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc > > However, the problem has not been solved yet. > > I do not think that this problem can cope with it by batch-limit. > Execution of a job is interrupted by batch-limit temporarily. > However, graph will be immediately resumed by trigger_graph called in > match_graph_event. batch-limit controls how many in-flight jobs can be performed (and therefor how busy the CIB can be). If batch-limit=10 and there are still 10 jobs in progress, then calling trigger_graph() over and over does nothing until there are 9 jobs (or less). At which point one more can be scheduled. So if "synchronous message of CIB is sent now ceaseless", then there is a bug somewhere. Did you confirm that throttle_get_total_job_limit() was returning an appropriate value? > Since the synchronous message of CIB is sent now ceaseless, the IPC > message sent from crmd cannot be processed. > > The following methods can be considered to solve a problem for this > CPG message sent continuously. > > In order to make the time when a CPG message is processed, it stops > that DC sends job for a definite period of time. > > Or I think that it is necessary to make the priority of a CPG message > be the same as that of G_PRIORITY_DEFAULT defined by > gio_poll_dispatch_add(). > > I attach report which tested. > https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing > > Regards, > Yusuke > > 2013/11/8 Andrew Beekhof <and...@beekhof.net>: >> >> On 8 Nov 2013, at 12:10 am, yusuke iida <yusk.i...@gmail.com> wrote: >> >>> Hi, Andrew >>> >>> The shown code seems not to process correctly. >>> I wrote correction. >>> Please check. >>> https://github.com/yuusuke/pacemaker/commit/3b90af1b11a4389f8b4a95a20ef12b8c259e73dc >> >> Ah, yes that looks better. >> Did it help at all? >> >>> >>> Regards, >>> Yusuke >>> >>> 2013/11/7 Andrew Beekhof <and...@beekhof.net>: >>>> >>>> On 7 Nov 2013, at 12:43 pm, yusuke iida <yusk.i...@gmail.com> wrote: >>>> >>>>> Hi, Andrew >>>>> >>>>> 2013/11/7 Andrew Beekhof <and...@beekhof.net>: >>>>>> >>>>>> On 6 Nov 2013, at 4:48 pm, yusuke iida <yusk.i...@gmail.com> wrote: >>>>>> >>>>>>> Hi, Andrew >>>>>>> >>>>>>> I tested by the following versions. >>>>>>> https://github.com/ClusterLabs/pacemaker/commit/3492fec7fe58a6fd94071632df27d3fd3fc3ffe3 >>>>>>> >>>>>>> load-threshold was checked at 60%, 40%, and 20%. >>>>>>> >>>>>>> However, the problem was not solved. >>>>>>> It will not change but timeout will occur. >>>>>> >>>>>> That is extremely surprising. I will have a look at your logs today. >>>>>> How many cores do these machines have btw? >>>>> >>>>> The machine which I am using by the test is a virtual machine of KVM. >>>>> There are four physical servers. Four virtual machines are started on >>>>> each server. >>>>> Has four core physical server, I am assigned a core of separate to the >>>>> virtual machine. >>>>> The number of CPUs currently assigned to the virtual machine is one piece. >>>>> The memory is assigning 2048 MB per set. >>>> >>>> I think I understand whats happening... >>>> >>>> The throttling code is designed to keep the cib's CPU usage from reaching >>>> 100% (ie. 1 core completely busy). >>>> In a single core setup, thats already much too late, and with 16 nodes I >>>> can easily imagine that even 1 job per machine is going to be too much for >>>> an underpowered CPU. >>>> >>>> I'm currently experimenting with: >>>> >>>> http://paste.fedoraproject.org/52283/37994581 >>>> >>>> which may help on both fronts. >>>> >>>> Essentially it is trying to dynamically infer a "good" value for >>>> batch-limit when the CIB is using too much CPU. >>>> >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> -- >>> ---------------------------------------- >>> METRO SYSTEMS CO., LTD >>> >>> Yusuke Iida >>> Mail: yusk.i...@gmail.com >>> ---------------------------------------- >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > -- > ---------------------------------------- > METRO SYSTEMS CO., LTD > > Yusuke Iida > Mail: yusk.i...@gmail.com > ---------------------------------------- > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org