I have a test system with 132 (small) ZFS pools[*], as part of our work to validate a new ZFS-based fileserver environment. In testing, it appears that we can produce situations that will run the kernel out of memory, or at least out of some resource such that things start complaining 'bash: fork: Resource temporarily unavailable'. Sometimes the system locks up solid.
I've found at least two situations that reliably do this: - trying to 'zpool scrub' each pool in sequence (waiting for each scrub to complete before starting the next one). - starting simultaneous sequential read IO from all pools from a NFS client. (trying to do the same IO from the server basically kills the server entirely.) If I aggregate the same disk space into 12 pools instead of 132, the same IO load does not kill the system. The ZFS machine is an X2100 M2 with 2GB of physical memory and 1GB of swap, running 64-bit Solaris 10 U4 with an almost current set of patches; it gets the storage from another machine via ISCSI. The pools are non-redundant, with each vdev being a whole ISCSI LUN. Is this a known issue (or issues)? If this isn't a known issue, does anyone have pointers to good tools to trace down what might be happening and where memory is disappearing and so on? Does the system plain need more memory for this number of pools and if so, does anyone know how much? Thanks in advance. (I was pointed to mdb -k's '::kmastat' by some people on the OpenSolaris IRC channel but I haven't spotted anything particularly enlightening in its output, and I can't run it once the system has gone over the edge.) - cks [*: we have an outstanding uncertainty over how many ZFS pools a single system can sensibly support, so testing something larger than we'd use in production seemed sensible.] _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss