On 29/05/2013, at 6:19 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:
> 29.05.2013 11:01, Andrew Beekhof wrote: >> >> On 28/05/2013, at 4:30 PM, Andrew Beekhof <and...@beekhof.net> wrote: >> >>> >>> On 28/05/2013, at 10:12 AM, Andrew Beekhof <and...@beekhof.net> wrote: >>> >>>> >>>> On 27/05/2013, at 5:08 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: >>>> >>>>> 27.05.2013 04:20, Yuichi SEINO wrote: >>>>>> Hi, >>>>>> >>>>>> 2013/5/24 Vladislav Bogdanov <bub...@hoster-ok.com>: >>>>>>> 24.05.2013 06:34, Andrew Beekhof wrote: >>>>>>>> Any help figuring out where the leaks might be would be very much >>>>>>>> appreciated :) >>>>>>> >>>>>>> One (and the only) suspect is unfortunately crmd itself. >>>>>>> It has private heap grown from 2708 to 3680 kB. >>>>>>> >>>>>>> All other relevant differences are in qb shm buffers, which are >>>>>>> controlled and may grow until they reach configured size. >>>>>>> >>>>>>> @Yuichi >>>>>>> I would recommend to try running under valgrind on a testing cluster to >>>>>>> figure out is that a memleak (lost memory) or some history data >>>>>>> (referenced memory). Latter may be a logical memleak though. You may >>>>>>> look in /etc/sysconfig/pacemaker for details. >>>>>> >>>>>> I got valgrind for about 2 days. And, I attached valgrind in ACT node >>>>>> and SBY node. >>>>> >>>>> >>>>> I do not see any "direct" memory leaks (repeating 'definitely-lost' >>>>> allocations) there. >>>>> >>>>> So what we see is probably one of: >>>>> * Cache/history/etc, which grows up to some limit (or expired at the >>>>> some point in time). >>>>> * Unlimited/not-expirable lists/hashes of data structures, which are >>>>> correctly freed at exit >>>> >>>> There is still plenty of memory chunks not free'd at exit, I'm slowly >>>> working through those. >>> >>> I've pushed the following to my repo: >>> >>> + Andrew Beekhof (2 hours ago) d070092: Test: More glib suppressions >>> + Andrew Beekhof (2 hours ago) ec74bf0: Fix: Fencing: Ensure API object is >>> consistently free'd >>> + Andrew Beekhof (2 hours ago) 6130d23: Fix: Free additional memory at exit >>> + Andrew Beekhof (2 hours ago) b76d6be: Refactor: crmd: Allocate a mainloop >>> before doing anything to help valgrind >>> + Andrew Beekhof (3 hours ago) d4041de: Log: init: Remove unnecessary >>> detail from shutdown message >>> + Andrew Beekhof (3 hours ago) 282032b: Fix: Clean up internal mainloop >>> structures at exit >>> + Andrew Beekhof (4 hours ago) 0947721: Fix: Core: Correctly unreference >>> GSource inputs >>> + Andrew Beekhof (25 hours ago) d94140d: Fix: crmd: Clean up more memory >>> before exit >>> + Andrew Beekhof (25 hours ago) b44257c: Test: cman: Ignore additional >>> valgrind errors >>> >>> If someone would like to run the cluster (no valgrind needed) for a while >>> with >>> >>> export >>> PCMK_trace_functions=mainloop_gio_destroy,mainloop_add_fd,mainloop_del_fd,crmd_exit,crm_peer_destroy,empty_uuid_cache,lrm_state_destroy_all,internal_lrm_state_destroy,do_stop,mainloop_destroy_trigger,mainloop_setup_trigger,do_startup,stonith_api_delete >>> >>> and then (after grabbing smaps) shut it down, we should have some >>> information about any lists/hashes that are growing too large. >>> >>> Also, be sure to run with: >>> >>> export G_SLICE=always-malloc >>> >>> which will prevent glib from accumulating pools of memory and distorting >>> any results. >> >> >> I did this today with 2747e25 and it looks to me like there is no leak >> (anymore?) >> For context, between smaps.5 and smaps.6, the 4 node cluster ran over 120 >> "standby" tests (lots of PE runs and resource activity). >> So unless someone can show me otherwise, I'm going to move on :) > > I would say I'm convinced ;) > I'd bet that is because of 0947721, glib programming is not always > intuitive (you should remember that bug with IOwatches). There aren't so many sources added/removed from crmd though. If anything its the lrmd that would have been most affected by that one. > And GSources are probably destroyed when you exit mainloop, Actually not, since there was still refs to them. > that's why > we do not see that in valgrind. They were in valgrind, but as reachable not "definitely lost". > Hopefully mainloop/gio code is now stable as a rock. > > Is this DC or ordinary member btw? DC > >> >> Note that the [heap] changes are actually the memory usage going _backwards_. >> >> Raw results below. >> >> [root@corosync-host-1 ~]# cat /proc/`pidof crmd`/smaps > smaps.6 ; diff -u >> smaps.5 smaps.6; >> --- smaps.5 2013-05-29 02:39:25.032940230 -0400 >> +++ smaps.6 2013-05-29 03:48:51.278940819 -0400 >> @@ -40,16 +40,16 @@ >> Swap: 0 kB >> KernelPageSize: 4 kB >> MMUPageSize: 4 kB >> -0226b000-02517000 rw-p 00000000 00:00 0 >> [heap] >> -Size: 2736 kB >> -Rss: 2268 kB >> -Pss: 2268 kB >> +0226b000-02509000 rw-p 00000000 00:00 0 >> [heap] >> +Size: 2680 kB >> +Rss: 2212 kB >> +Pss: 2212 kB >> Shared_Clean: 0 kB >> Shared_Dirty: 0 kB >> Private_Clean: 0 kB >> -Private_Dirty: 2268 kB >> -Referenced: 2268 kB >> -Anonymous: 2268 kB >> +Private_Dirty: 2212 kB >> +Referenced: 2212 kB >> +Anonymous: 2212 kB >> AnonHugePages: 0 kB >> Swap: 0 kB >> KernelPageSize: 4 kB >> @@ -112,13 +112,13 @@ >> MMUPageSize: 4 kB >> 7f0c6e918000-7f0c6ee18000 rw-s 00000000 00:10 522579 >> /dev/shm/qb-pengine-event-27411-27412-6-data >> Size: 5120 kB >> -Rss: 3572 kB >> -Pss: 1785 kB >> +Rss: 4936 kB >> +Pss: 2467 kB >> Shared_Clean: 0 kB >> -Shared_Dirty: 3572 kB >> +Shared_Dirty: 4936 kB >> Private_Clean: 0 kB >> Private_Dirty: 0 kB >> -Referenced: 3572 kB >> +Referenced: 4936 kB >> Anonymous: 0 kB >> AnonHugePages: 0 kB >> Swap: 0 kB >> @@ -841,7 +841,7 @@ >> 7f0c72b00000-7f0c72b1d000 r-xp 00000000 fd:00 119 >> /lib64/libselinux.so.1 >> Size: 116 kB >> Rss: 36 kB >> -Pss: 5 kB >> +Pss: 4 kB >> Shared_Clean: 36 kB >> Shared_Dirty: 0 kB >> Private_Clean: 0 kB >> @@ -1401,7 +1401,7 @@ >> 7f0c740c6000-7f0c74250000 r-xp 00000000 fd:00 45 >> /lib64/libc-2.12.so >> Size: 1576 kB >> Rss: 588 kB >> -Pss: 20 kB >> +Pss: 19 kB >> Shared_Clean: 588 kB >> Shared_Dirty: 0 kB >> Private_Clean: 0 kB >> >> >>> >>> >>>> Once we know all memory is being cleaned up, the next step is to check the >>>> size of things beforehand. >>>> >>>> I'm hoping one or more of them show up as unnaturally large, indicating >>>> things are being added but not removed. >>>> >>>>> (f.e like dlm_controld has(had???) for a >>>>> debugging buffer or like glibc resolver had in EL3). This cannot be >>>>> caught with valgrind if you use it in a standard way. >>>>> >>>>> I believe we have former one. To prove that, it would be very >>>>> interesting to run under valgrind *debugger* (--vgdb=yes|full) for some >>>>> long enough (2-3 weeks) period of time and periodically get memory >>>>> allocation state from there (with 'monitor leak_check full reachable >>>>> any' gdb command). I wanted to do that a long time ago, but >>>>> unfortunately did not have enough spare time to even try that (although >>>>> I tried to valgrind other programs that way). >>>>> >>>>> This is described in valgrind documentation: >>>>> http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver >>>>> >>>>> We probably do not need to specify '--vgdb-error=0' because we do not >>>>> need to install watchpoints at the start (and we do not need/want to >>>>> immediately connect to crmd with gdb to tell it to continue), we just >>>>> need to periodically get status of memory allocations >>>>> (stop-leak_check-cont sequence). Probably that should be done in a >>>>> 'fast' manner, so crmd does not stop for a long time, and the rest of >>>>> pacemaker does not see it 'hanged'. Again, I did not try that, and I do >>>>> not know if it's even possible to do that with crmd. >>>>> >>>>> And, as pacemaker heavily utilizes glib, which has own memory allocator >>>>> (slices), it is better to switch it to a 'standard' malloc/free for >>>>> debugging with G_SLICE=always-malloc env var. >>>>> >>>>> Last, I did memleak checks for a 'static' (i.e. no operations except >>>>> monitors are performed) cluster for ~1.1.8, and did not find any. It >>>>> would be interesting to see if that is true for an 'active' one, which >>>>> starts/stops resources, handles failures, etc. >>>>> >>>>>> >>>>>> Sincerely, >>>>>> Yuichi >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Also, the measurements are in pages... could you run "getconf >>>>>>>> PAGESIZE" and let us know the result? >>>>>>>> I'm guessing 4096 bytes. >>>>>>>> >>>>>>>> On 23/05/2013, at 5:47 PM, Yuichi SEINO <seino.clust...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I retry the test after we updated packages to the latest tag and OS. >>>>>>>>> glue and booth is latest. >>>>>>>>> >>>>>>>>> * Environment >>>>>>>>> OS:RHEL 6.4 >>>>>>>>> cluster-glue:latest(commit:2755:8347e8c9b94f) + >>>>>>>>> patch[detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787] >>>>>>>>> resource-agent:v3.9.5 >>>>>>>>> libqb:v0.14.4 >>>>>>>>> corosync:v2.3.0 >>>>>>>>> pacemaker:v1.1.10-rc2 >>>>>>>>> crmsh:v1.2.5 >>>>>>>>> booth:latest(commit:67e1208973de728958432aaba165766eac1ce3a0) >>>>>>>>> >>>>>>>>> * Test procedure >>>>>>>>> we regularly switch a ticket. The previous test also used the same >>>>>>>>> way. >>>>>>>>> And, There was no a memory leak when we tested pacemaker-1.1 before >>>>>>>>> pacemaker use libqb. >>>>>>>>> >>>>>>>>> * Result >>>>>>>>> As a result, I think that crmd may cause the memory leak. >>>>>>>>> >>>>>>>>> crmd smaps(a total of each addresses) >>>>>>>>> In detail, we attached smaps of start and end. And, I recorded smaps >>>>>>>>> every 1 minutes. >>>>>>>>> >>>>>>>>> Start >>>>>>>>> RSS: 7396 >>>>>>>>> SHR(Shared_Clean+Shared_Dirty):3560 >>>>>>>>> Private(Private_Clean+Private_Dirty):3836 >>>>>>>>> >>>>>>>>> Interbal(about 30h later) >>>>>>>>> RSS:18464 >>>>>>>>> SHR:14276 >>>>>>>>> Private:4188 >>>>>>>>> >>>>>>>>> End(about 70h later) >>>>>>>>> RSS:19104 >>>>>>>>> SHR:14336 >>>>>>>>> Private:4768 >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Yuichi >>>>>>>>> >>>>>>>>> 2013/5/15 Yuichi SEINO <seino.clust...@gmail.com>: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I ran the test for about two days. >>>>>>>>>> >>>>>>>>>> Environment >>>>>>>>>> >>>>>>>>>> OS:RHEL 6.3 >>>>>>>>>> pacemaker-1.1.9-devel (commit >>>>>>>>>> 138556cb0b375a490a96f35e7fbeccc576a22011) >>>>>>>>>> corosync-2.3.0 >>>>>>>>>> cluster-glue >>>>>>>>>> latest+patch(detail:http://www.gossamer-threads.com/lists/linuxha/dev/85787) >>>>>>>>>> libqb- 0.14.4 >>>>>>>>>> >>>>>>>>>> There may be a memory leak in crmd and lrmd. I regularly got rss of >>>>>>>>>> ps. >>>>>>>>>> >>>>>>>>>> start-up >>>>>>>>>> crmd:5332 >>>>>>>>>> lrmd:3625 >>>>>>>>>> >>>>>>>>>> interval(about 30h later) >>>>>>>>>> crmd:7716 >>>>>>>>>> lrmd:3744 >>>>>>>>>> >>>>>>>>>> ending(about 60h later) >>>>>>>>>> crmd:8336 >>>>>>>>>> lrmd:3780 >>>>>>>>>> >>>>>>>>>> I still don't run a test that pacemaker-1.1.10-rc2 use. So, I will >>>>>>>>>> run its test. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Yuichi >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Yuichi SEINO >>>>>>>>>> METROSYSTEMS CORPORATION >>>>>>>>>> E-mail:seino.clust...@gmail.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Yuichi SEINO >>>>>>>>> METROSYSTEMS CORPORATION >>>>>>>>> E-mail:seino.clust...@gmail.com >>>>>>>>> <smaps_log.tar.gz>_______________________________________________ >>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> Getting started: >>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> Getting started: >>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Yuichi SEINO >>>>>> METROSYSTEMS CORPORATION >>>>>> E-mail:seino.clust...@gmail.com >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org