Updates to my problem: 1. The destroy operation appears to be restarting from the same point after the system hangs and has to be rebooted. Oracle gave me the following to track progress:
echo '::pgrep "zpool$" |::walk thread|::findstack -v' | mdb -k | grep dsl_dataset_destroy then take first arg of dsl_dataset_destroy and echo '<ARG>::print dsl_dataset_t ds_phys->ds_used_bytes' | mdb -k I am logging these values every minute. Yesterday when I started tracking this I got a value of 0x75d97516b62, my last data point before the system hung was 0x4ee1098bdfd. My first first data point today after rebooting, restarting the logging scripts, and restarting the zpool import is 0x7a0b0634a1b. So it looks like I've made no real progress. 2. It looks like the root cause of the original system crash that left the incomplete zfs recv snapshot is that the a zfs recv filled the zpool (there are two parallel zfs recv's running, one for an old configuration (many datasets) and one for the new (one large dataset)). My replication script checks for free space before stating the replication, but we had a huge data load and replication of it running (3 TB), and when it started there was room for it, but other (much smaller) data loads and replication may have consumed it. This system has no other activity on it, it is just a repository for this replicated data. So ... it looks like I have: - a full zpool - an incomplete (corrupt ?) snapshot from a zfs recv ... and every time I try to import this zpool I hang the system due to lack of memory (the box has 32 GB of RAM). Any suggestions how to delete / destroy this incomplete snapshot without running the system out of RAM ? On Wed, Aug 3, 2011 at 9:56 AM, Paul Kraus <p...@kraus-haus.org> wrote: > An additional data point, when i try to do a zdb -e -d and find the > incomplete zfs recv snapshot I get an error as follows: > > # sudo zdb -e -d xxx-yy-01 | grep "%" > Could not open xxx-yy-01/aaa-bb-01/aaa-bb-01-01/%1309906801, error 16 > # > > Anyone know what error 16 means from zdb and how this might impact > importing this zpool ? > > On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus <p...@kraus-haus.org> wrote: >> I am having a very odd problem, and so far the folks at Oracle >> Support have not provided a working solution, so I am asking the crowd >> here while still pursuing it via Oracle Support. >> >> The system is a T2000 running 10U9 with CPU-2010-01and two J4400 >> loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15 >> disk vdev + 3 hot spare). This system is the target for zfs send / >> recv replication from our production server.The OS is UFS on local >> disk. >> >> While I was on vacation this T2000 hung with "out of resource" >> errors. Other staff tried rebooting, which hung the box. Then they >> rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support >> had them apply a couple patches and an IDR to address zfs "stability >> and reliability problems" as well as set the following in /etc/system >> >> set zfs:zfs_arc_max = 0x700000000 (which is 28 GB) >> set zfs:arc_meta_limit = 0x700000000 (which is 28 GB) >> >> The system has 32 GB RAM and 32 (virtual) CPUs. They then tried >> importing the zpool and the system hung (after many hours) with the >> same "out of resource" error. At this point they left the problem for >> me :-( >> >> I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted >> from that. I then applied the IDR (IDR146118-12 )and the zfs patch it >> depended on (145788-03). I did not include the zfs arc and zfs arc >> meta limits as I did not think they relevant. A zpool import shows the >> pool is OK and a sampling with zdb -l of the drives shows good labels. >> I started importing the zpool and after many hours it hung the system >> with "out of resource" errors. I had a number of tools running to see >> what was going on. The only thing this system is doing is importing >> the zpool. >> >> ARC had climbed to about 8 GB and then declined to 3 GB by the time >> the system hung. This tells me that there is something else consuming >> RAM and the ARC is releasing it. >> >> The hung TOP screen showed the largest user process only had 148 MB >> allocated (and much less resident). >> >> VMSTAT showed a scan rate of over 900,000 (NOT a typo) and almost 8 GB >> of free swap (so whatever is using memory cannot be paged out). >> >> So my guess is that there is a kernel module that is consuming all >> (and more) of the RAM in the box. I am looking for a way to query how >> much RAM each kernel module is using and script that in a loop (which >> will hang when the box runs out of RAM next). I am very open to >> suggestions here. >> >> Since this is the recv end of replication, I assume there was a zfs >> recv going on at the time the system initially hung. I know there was >> a 3+ TB snapshot replicating (via a 100 Mbps WAN link) when I left for >> vacation, that may have still been running. I also assume that any >> partial snapshots (% instead of @) are being removed when the pool is >> imported. But what could be causing a partial snapshot removal, even >> of a very large snapshot, to run the system out of RAM ? What caused >> the initial hang of the system (I assume due to out of RAM) ? I did >> not think there was a limit to the size of either a snapshot or a zfs >> recv. >> >> Hung TOP screen: >> >> load averages: 91.43, 33.48, 18.989 xxx-xxx1 >> 18:45:34 >> 84 processes: 69 sleeping, 12 running, 1 zombie, 2 on cpu >> CPU states: 95.2% idle, 0.5% user, 4.4% kernel, 0.0% iowait, 0.0% swap >> Memory: 31.9G real, 199M free, 267M swap in use, 7.7G swap free >> >> PID USERNAME THR PR NCE SIZE RES STATE TIME FLTS CPU COMMAND >> 533 root 51 59 0 148M 30.6M run 520:21 0 9.77% java >> 1210 yyyyyy 1 0 0 5248K 1048K cpu25 2:08 0 2.23% xload >> 14720 yyyyyy 1 59 0 3248K 1256K cpu24 1:56 0 0.03% top >> 154 root 1 59 0 4024K 1328K sleep 1:17 0 0.02% vmstat >> 1268 yyyyyy 1 59 0 4248K 1568K sleep 1:26 0 0.01% iostat >> ... >> >> VMSTAT: >> >> kthr memory page disk faults cpu >> r b w swap free re mf pi po fr de sr m0 m1 m2 m3 in sy cs us sy >> id >> 0 0 112 8117096 211888 55 46 0 0 425 0 912684 0 0 0 0 976 166 836 0 2 >> 98 >> 0 0 112 8117096 211936 53 51 6 0 394 0 926702 0 0 0 0 976 167 833 0 2 >> 98 >> >> ARC size (B): 4065882656 >> >> -- >> {--------1---------2---------3---------4---------5---------6---------7---------} >> Paul Kraus >> -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) >> -> Sound Designer: Frankenstein, A New Musical >> (http://www.facebook.com/event.php?eid=123170297765140) >> -> Sound Coordinator, Schenectady Light Opera Company ( >> http://www.sloctheater.org/ ) >> -> Technical Advisor, RPI Players >> > > > > -- > {--------1---------2---------3---------4---------5---------6---------7---------} > Paul Kraus > -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) > -> Sound Designer: Frankenstein, A New Musical > (http://www.facebook.com/event.php?eid=123170297765140) > -> Sound Coordinator, Schenectady Light Opera Company ( > http://www.sloctheater.org/ ) > -> Technical Advisor, RPI Players > -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Designer: Frankenstein, A New Musical (http://www.facebook.com/event.php?eid=123170297765140) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss