Attached are the valgrind outputs from two separate runs of lrmd with the suggested variables set. Do they help narrow the issue down?
Thanks Greg On 02/05/2014 03:01, "Andrew Beekhof" <and...@beekhof.net> wrote: > >On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.mur...@gamesparks.com> >wrote: > >> Hi >> >> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10), >>kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version >>1.1.10+git20130802-1ubuntu1. > >The problem is that I have no way of knowing what code is/isn't included >in '1.1.10+git20130802-1ubuntu1'. >You could try setting the following in your environment before starting >pacemaker though > ># Variables for running child daemons under valgrind and/or checking for >memory problems >G_SLICE=always-malloc >MALLOC_PERTURB_=221 # or 0 >MALLOC_CHECK_=3 # or 0,1,2 >PCMK_valgrind_enabled=lrmd >VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25 >--log-file=/var/lib/pacemaker/valgrind-%p >--suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions >--gen-suppressions=all" > > >> The cluster is configured with a DRBD master/slave set and then a >>failover resource group containing MySQL (along with its DRBD >>filesystem) and a Zabbix Proxy and Agent. >> >> Since I built the cluster around two months ago I¹ve noticed that on >>the the active node the memory footprint of lrmd gradually grows to >>quite a significant size. The cluster was last restarted three weeks >>ago, and now lrmd has over 1GB of mapped memory on the active node and >>only 151MB on the passive node. Current excerpts from /proc/PID/status >>are: >> >> Active node >> VmPeak: >> 1146740 kB >> VmSize: >> 1146740 kB >> VmLck: >> 0 kB >> VmPin: >> 0 kB >> VmHWM: >> 267680 kB >> VmRSS: >> 188764 kB >> VmData: >> 1065860 kB >> VmStk: >> 136 kB >> VmExe: >> 32 kB >> VmLib: >> 10416 kB >> VmPTE: >> 2164 kB >> VmSwap: >> 822752 kB >> >> Passive node >> VmPeak: >> 220832 kB >> VmSize: >> 155428 kB >> VmLck: >> 0 kB >> VmPin: >> 0 kB >> VmHWM: >> 4568 kB >> VmRSS: >> 3880 kB >> VmData: >> 74548 kB >> VmStk: >> 136 kB >> VmExe: >> 32 kB >> VmLib: >> 10416 kB >> VmPTE: >> 172 kB >> VmSwap: >> 0 kB >> >> During the last week or so I¹ve taken a couple of snapshots of >>/proc/PID/smaps on the active node, and the heap particularly stands out >>as growing: (I have the full outputs captured if they¹ll help) >> >> 20140422 >> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0 >> [heap] >> Size: 274508 kB >> Rss: 180152 kB >> Pss: 180152 kB >> Shared_Clean: 0 kB >> Shared_Dirty: 0 kB >> Private_Clean: 0 kB >> Private_Dirty: 180152 kB >> Referenced: 120472 kB >> Anonymous: 180152 kB >> AnonHugePages: 0 kB >> Swap: 91568 kB >> KernelPageSize: 4 kB >> MMUPageSize: 4 kB >> Locked: 0 kB >> VmFlags: rd wr mr mw me ac >> >> >> 20140423 >> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0 >> [heap] >> Size: 289688 kB >> Rss: 184136 kB >> Pss: 184136 kB >> Shared_Clean: 0 kB >> Shared_Dirty: 0 kB >> Private_Clean: 0 kB >> Private_Dirty: 184136 kB >> Referenced: 69748 kB >> Anonymous: 184136 kB >> AnonHugePages: 0 kB >> Swap: 103112 kB >> KernelPageSize: 4 kB >> MMUPageSize: 4 kB >> Locked: 0 kB >> VmFlags: rd wr mr mw me ac >> >> 20140430 >> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0 >> [heap] >> Size: 436884 kB >> Rss: 140812 kB >> Pss: 140812 kB >> Shared_Clean: 0 kB >> Shared_Dirty: 0 kB >> Private_Clean: 744 kB >> Private_Dirty: 140068 kB >> Referenced: 43600 kB >> Anonymous: 140812 kB >> AnonHugePages: 0 kB >> Swap: 287392 kB >> KernelPageSize: 4 kB >> MMUPageSize: 4 kB >> Locked: 0 kB >> VmFlags: rd wr mr mw me ac >> >> I noticed in the release notes for 1.1.10-rc1 >>(https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-r >>c1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory >>leaks² but I¹m not sure which particular bug this was related to. (And >>those fixes should be in the version I¹m running anyway). >> >> I¹ve also spotted a few memory leak fixes in >>https://github.com/beekhof/pacemaker, but I¹m not sure whether they >>relate to my issue (assuming I have a memory leak and this isn¹t >>expected behaviour). >> >> Is there additional debugging that I can perform to check whether I >>have a leak, or is there enough evidence to justify upgrading to 1.1.11? >> >> Thanks in advance >> >> Greg Murphy >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >
lrmd.tgz
Description: lrmd.tgz
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org