On Wed, 2013-02-20 at 12:18 +0100, Josip Rodin wrote: > On Wed, Feb 20, 2013 at 10:27:02AM +0000, Ian Campbell wrote: > > On Sun, 2013-02-17 at 00:22 +0100, Josip Rodin wrote: > > > Package: linux-image-2.6.32-5-xen-amd64 > > > > This is in a guest, right? Is it possible to try the non-Xen amd64 > > flavour? I forget the exact status in Squeeze but IIRC most of the domU > > functionality is present in the -amd64 flavour with the -xen-amd64 > > flavour only being required for dom0 and some of the more advanced domU > > features. > > > > The reason I ask this is that the non-xen flavour is closer to mainline > > and therefore should be easier to track down the issue with. > > > > If you are also able separately to try this with the Wheezy kernel that > > would be very useful too. > > OK, I can install both (it's got PV-GRUB), which do you prefer to test first? > I'm asking because it'll likely take a few weeks for the bug to appear, > judging by what it did before.
Probably at this stage I would be more interested in making sure Wheezy was going to be OK first. > > > The thing I noticed was the slab_unreclaimable explosion, by a factor > > > of 122. That... doesn't sound like something that should be happenning. > > > > Indeed. Is the system responsive enough to login and > > examine /proc/slabinfo? There is probably one which has exploded in > > size, it may even be sufficient to observe this over time and see if one > > seems to be slowly creeping upwards towards $doom. > > > > > I'm going to try to run slabtop the next time I catch it in this state, > > > in order to try to glean some more information. > > > > That would be great. > > I did post two consecutive slabtop results... Sorry, I just have missed these. > I thought they had all the > relevant info from /proc/slabinfo. > > The two large elements that grew both in the total number of objects and > the active number were (extracted from my previous message): > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > first readout: > 65419 65419 100% 4.00K 14179 8 453728K kmalloc-4096 > 65390 65390 100% 2.06K 13338 15 426816K net_namespace > second readout: > 65428 65428 100% 4.00K 14181 8 453792K kmalloc-4096 > 65391 65391 100% 2.06K 13339 15 426848K net_namespace > > How do I trace which process is calling this? I'm not sure. The net_namespace one should be easy enough to track in the code since: net_cachep = kmem_cache_create("net_namespace", sizeof(struct net), and therefore users of net_cachep must be responsible, I'd expect there to be not all that many of those. Are you actually using network namespaces in the guest? The kmalloc-4096 one seems a lot more generic, tracking the users down is going to be harder I should think. The Debian kernels have SLUB: /boot/config-2.6.32-5-xen-amd64:CONFIG_SLUB_DEBUG=y /boot/config-2.6.32-5-xen-amd64:CONFIG_SLUB=y /boot/config-2.6.32-5-xen-amd64:# CONFIG_SLUB_DEBUG_ON is not set /boot/config-2.6.32-5-xen-amd64:# CONFIG_SLUB_STATS is not set (same as native). Documentation/vm/slub.txt has some info on adding debugging stuff there, e.g. adding slub_debug to the command line. It doesn't look like rebuilding with the other two option would initially be useful (the first is equivalent to the command line option anyway) Ian. > > In comparison, now, under seemingly normal circumstances, slabtop looks like > this on that machine: > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 56124 25272 45% 0.11K 1559 36 6236K buffer_head > 24843 12898 51% 0.19K 1183 21 4732K dentry > 23100 16107 69% 1.01K 1540 15 24640K nfs_inode_cache > 11456 6403 55% 0.06K 179 64 716K kmalloc-64 > 10208 8864 86% 0.12K 319 32 1276K kmalloc-128 > 7308 5275 72% 0.55K 522 14 4176K radix_tree_node > 4947 4940 99% 0.08K 97 51 388K sysfs_dir_cache > 3584 3573 99% 0.01K 7 512 28K kmalloc-8 > 3200 2016 63% 0.79K 160 20 2560K ext3_inode_cache > 2068 1981 95% 0.18K 94 22 376K vm_area_struct > 1792 1790 99% 0.02K 7 256 28K kmalloc-16 > 1692 1631 96% 0.63K 141 12 1128K proc_inode_cache > 1632 1588 97% 1.00K 102 16 1632K kmalloc-1024 > 1472 1442 97% 0.25K 92 16 368K kmalloc-256 > 1428 1129 79% 0.19K 68 21 272K kmalloc-192 > 1296 1284 99% 4.00K 162 8 5184K kmalloc-4096 > 1275 1270 99% 2.06K 85 15 2720K net_namespace > [...] > -- Ian Campbell Current Noise: Old Man's Child - Twilight Damnation "Once they go up, who cares where they come down? That's not my department." -- Werner von Braun -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1361360144.23294.18.ca...@zakaz.uk.xensource.com