Indeed, things should be simpler with fewer (generally one) pool. That said, I suspect I know the reason for the particular problem you're seeing: we currently do a bit too much vdev-level caching. Each vdev can have up to 10MB of cache. With 132 pools, even if each pool is just a single iSCSI device, that's 1.32GB of cache.
We need to fix this, obviously. In the interim, you might try setting zfs_vdev_cache_size to some smaller value, like 1MB. Still, I'm curious -- why lots of pools? Administration would be simpler with a single pool containing many filesystems. Jeff On Wed, Apr 30, 2008 at 11:48:07AM -0700, Bill Moore wrote: > A silly question: Why are you using 132 ZFS pools as opposed to a > single ZFS pool with 132 ZFS filesystems? > > > --Bill > > On Wed, Apr 30, 2008 at 01:53:32PM -0400, Chris Siebenmann wrote: > > I have a test system with 132 (small) ZFS pools[*], as part of our > > work to validate a new ZFS-based fileserver environment. In testing, > > it appears that we can produce situations that will run the kernel out > > of memory, or at least out of some resource such that things start > > complaining 'bash: fork: Resource temporarily unavailable'. Sometimes > > the system locks up solid. > > > > I've found at least two situations that reliably do this: > > - trying to 'zpool scrub' each pool in sequence (waiting for each scrub > > to complete before starting the next one). > > - starting simultaneous sequential read IO from all pools from a NFS client. > > (trying to do the same IO from the server basically kills the server > > entirely.) > > > > If I aggregate the same disk space into 12 pools instead of 132, the > > same IO load does not kill the system. > > > > The ZFS machine is an X2100 M2 with 2GB of physical memory and 1GB > > of swap, running 64-bit Solaris 10 U4 with an almost current set of > > patches; it gets the storage from another machine via ISCSI. The pools > > are non-redundant, with each vdev being a whole ISCSI LUN. > > > > Is this a known issue (or issues)? If this isn't a known issue, does > > anyone have pointers to good tools to trace down what might be happening > > and where memory is disappearing and so on? Does the system plain need > > more memory for this number of pools and if so, does anyone know how > > much? > > > > Thanks in advance. > > > > (I was pointed to mdb -k's '::kmastat' by some people on the OpenSolaris > > IRC channel but I haven't spotted anything particularly enlightening in > > its output, and I can't run it once the system has gone over the edge.) > > > > - cks > > [*: we have an outstanding uncertainty over how many ZFS pools a > > single system can sensibly support, so testing something larger > > than we'd use in production seemed sensible.] > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss