[perf-discuss] Memory leak somewhere, maybe in libc (libc_hwcap2.so.1 / SXCE b105 x64) ?

Martin Bochnig Wed, 11 Feb 2009 18:35:25 -0800

Hello,

I use my x64 Amilo Laptop as webserver, workstation and devel machine at once.
It has 3GB mem and 8GB of zfs swap space.


After less than just one week of uptime it ran out of physical memory
and continued to allocate more and more mem by increasingly going into
swap.
This time not the Xorg server consumed most mem (only about 180MB).
And also FireFox3.x only took 522MB away. VirtualBox needed to be
stopped because of the entire system's sluggishness in the state
without allocatable free physical mem, so nothing like VirtualBox, Xen
or qemu was running anymore. I also closed all gedit windows and tabs,
even restarted Xorg. Then after having restarted Xorg, no graphical
application was running anymore, except for Gnome itself and one
gnome-terminal window with just 8 tabs where I had ssh access to a few
other systems.

So what consumed all the memory???
/usr/bin/top showed me that one of the 5 wget sessions I had started a
week ago (for fetching opensolaris.org, sdlc.com/osol, genunix.org and
a few LinUX mirrors) had grown to 2GB!!! And one day later it was
2.2GB. So I killed that wget pid and started the same wget (same dir /
website) again. The four other wget processes were between "only"
122MB and about 250MB.

So, one day later they had grown further, but here comes the absolute
HAMMER: One top process was at 1340MB!!!
I mean, ok: I got used to it, that small daemons like the
network-auto-magic manager can consume 86MB. Also, that small almost
useless little Gnome-applets can consume hundreds of MB's (wnck-applet
104MB, clock-applet 80MB, mixer-applet2 91MB, trashapplet 87MB,
gnome-panel 114MB, etc etc etc ...).
Remember, that each of those amounts would have been considered
sufficient for a server machine until just a few years ago. But can it
be? Is this normal?
How can top consume 1340MB? I killed the pid 25420 and ... voila:
1.3GB of mem/swap got freed. So it was not some mis-reporting by the
other top process.

Performance-analysis experts: Something seems to be wrong, even though
you have a rich set of the most sophisticated test suites. And little
df or top show an end-user that something significant *is* wrong. How
can it be? What will you explain a customer that you have under a
support contract and who is running a supported SXCE (with Solaris
Express service plan, don't know if there still are any) or who is
running a supported Indiana?

Solaris always tends to be mem-hungry and sluggish for quite some time
now (especially starting with Solaris 10 and even more so with 11).
But in this case it must be a memory leak and therefore a severe bug.
I'm not a dtrace expert (normally more interested in other things) and
leave the entire procedure that must follow to others.
But I hereby meet my duty as a opensolaris.org (and Solaris in a wider
sense) community member to report this problem.

Shall I file this message as a bug?

I noticed it under this config (which happens to be my primary box) :

bash-3.2$ psrinfo -v
Status of virtual processor 0 as of: 02/12/2009 03:09:19
 on-line since 02/03/2009 15:31:17.
 The i386 processor operates at 2000 MHz,
       and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 02/12/2009 03:09:19
 on-line since 02/03/2009 15:31:23.
 The i386 processor operates at 2000 MHz,
       and has an i387 compatible floating point processor.
bash-3.2$ prtdiag -v
System Configuration: FUJITSU SIEMENS AMILO Notebook Pa 3515
BIOS Configuration: Phoenix Technologies LTD V1.13            10/06/2008

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Athlon X2 QL-62                  Socket S1G2

==== Memory Device Sockets ================================

Type        Status Set Device Locator      Bank Locator
----------- ------ --- ------------------- ----------------
DDR2        in use 1   S1                  DIMM1
DDR2        in use 2   S2                  DIMM2

==== On-Board Devices =====================================
ATI RS690M
ESS 1869

==== Upgradeable Slots ====================================

ID  Status    Type             Description
--- --------- ---------------- ----------------------------
11  available PCI              MINI PCI
bash-3.2$ uname -a
SunOS unknown 5.11 snv_105 i86pc i386 i86pc
bash-3.2$

(full all-clusters plus oem install with POSIX-C as system locale) and
mostly default config, except for Xorg which I upgraded to a
self-built version of the fox-gate from February 1rst (hence server
1.5.3 with libpciaccess)

Please find two Screenshots with top-output (one time before
I killed pid 25420 and another afterwards, note that the wget
processes run in the background
and log their output into text files via wget -m -p -k xxx >
logfile.lod 2>&1 & )
http://natamar.org/content/bugs/Screenshot-12.png (image/png) 237K
http://natamar.org/content/bugs/Screenshot-14.png (image/png) 239K
Normally I would like to have attached them for the records/archives,
but the limit it at just 40K.

Regards,
-
Martin Bochnig
Sun Contributor Agreement number OS0335
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

[perf-discuss] Memory leak somewhere, maybe in libc (libc_hwcap2.so.1 / SXCE b105 x64) ?

Reply via email to