Re: [zfs-discuss] zpool import starves machine of memory

Paul Kraus Thu, 04 Aug 2011 10:27:17 -0700

Updates to my problem:

1. The destroy operation appears to be restarting from the same point
after the system hangs and has to be rebooted. Oracle gave me the
following to track progress:


echo '::pgrep "zpool$" |::walk thread|::findstack -v' | mdb -k | grep
dsl_dataset_destroy
then take first arg of dsl_dataset_destroy and
echo '<ARG>::print dsl_dataset_t ds_phys->ds_used_bytes' | mdb -k

I am logging these values every minute. Yesterday when I started
tracking this I got a value of 0x75d97516b62, my last data point
before the system hung was 0x4ee1098bdfd. My first first data point
today after rebooting, restarting the logging scripts, and restarting
the zpool import is 0x7a0b0634a1b. So it looks like I've made no real
progress.

2. It looks like the root cause of the original system crash that left
the incomplete zfs recv snapshot is that the a zfs recv filled the
zpool (there are two parallel zfs recv's running, one for an old
configuration (many datasets) and one for the new (one large
dataset)). My replication script checks for free space before stating
the replication, but we had a huge data load and replication of it
running (3 TB), and when it started there was room for it, but other
(much smaller) data loads and replication may have consumed it. This
system has no other activity on it, it is just a repository for this
replicated data.

So ... it looks like I have:
- a full zpool
- an incomplete (corrupt ?) snapshot from a zfs recv
... and every time I try to import this zpool I hang the system due to
lack of memory (the box has 32 GB of RAM).

Any suggestions how to delete / destroy this incomplete snapshot
without running the system out of RAM ?

On Wed, Aug 3, 2011 at 9:56 AM, Paul Kraus <p...@kraus-haus.org> wrote:
> An additional data point, when i try to do a zdb -e -d and find the
> incomplete zfs recv snapshot I get an error as follows:
>
> # sudo zdb -e -d xxx-yy-01 | grep "%"
> Could not open xxx-yy-01/aaa-bb-01/aaa-bb-01-01/%1309906801, error 16
> #
>
> Anyone know what error 16 means from zdb and how this might impact
> importing this zpool ?
>
> On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus <p...@kraus-haus.org> wrote:
>>    I am having a very odd problem, and so far the folks at Oracle
>> Support have not provided a working solution, so I am asking the crowd
>> here while still pursuing it via Oracle Support.
>>
>>    The system is a T2000 running 10U9 with CPU-2010-01and two J4400
>> loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15
>> disk vdev + 3 hot spare). This system is the target for zfs send /
>> recv replication from our production server.The OS is UFS on local
>> disk.
>>
>>     While I was on vacation this T2000 hung with "out of resource"
>> errors. Other staff tried rebooting, which hung the box. Then they
>> rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support
>> had them apply a couple patches and an IDR to address zfs "stability
>> and reliability problems" as well as set the following in /etc/system
>>
>> set zfs:zfs_arc_max = 0x700000000 (which is 28 GB)
>> set zfs:arc_meta_limit = 0x700000000 (which is 28 GB)
>>
>>    The system has 32 GB RAM and 32 (virtual) CPUs. They then tried
>> importing the zpool and the system hung (after many hours) with the
>> same "out of resource" error. At this point they left the problem for
>> me :-(
>>
>>    I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted
>> from that. I then applied the IDR (IDR146118-12 )and the zfs patch it
>> depended on (145788-03). I did not include the zfs arc and zfs arc
>> meta limits as I did not think they relevant. A zpool import shows the
>> pool is OK and a sampling with zdb -l of the drives shows good labels.
>> I started importing the zpool and after many hours it hung the system
>> with "out of resource" errors. I had a number of tools running to see
>> what was going on. The only thing this system is doing is importing
>> the zpool.
>>
>> ARC had climbed to about 8 GB and then declined to 3 GB by the time
>> the system hung. This tells me that there is something else consuming
>> RAM and the ARC is releasing it.
>>
>> The hung TOP screen showed the largest user process only had 148 MB
>> allocated (and much less resident).
>>
>> VMSTAT showed a scan rate of over 900,000 (NOT a typo) and almost 8 GB
>> of free swap (so whatever is using memory cannot be paged out).
>>
>>    So my guess is that there is a kernel module that is consuming all
>> (and more) of the RAM in the box. I am looking for a way to query how
>> much RAM each kernel module is using and script that in a loop (which
>> will hang when the box runs out of RAM next). I am very open to
>> suggestions here.
>>
>>   Since this is the recv end of replication, I assume there was a zfs
>> recv going on at the time the system initially hung. I know there was
>> a 3+ TB snapshot replicating (via a 100 Mbps WAN link) when I left for
>> vacation, that may have still been running. I also assume that any
>> partial snapshots (% instead of @) are being removed when the pool is
>> imported. But what could be causing a partial snapshot removal, even
>> of a very large snapshot, to run the system out of RAM ? What caused
>> the initial hang of the system (I assume due to out of RAM) ? I did
>> not think there was a limit to the size of either a snapshot or a zfs
>> recv.
>>
>> Hung TOP screen:
>>
>> load averages: 91.43, 33.48, 18.989             xxx-xxx1               
>> 18:45:34
>> 84 processes:  69 sleeping, 12 running, 1 zombie, 2 on cpu
>> CPU states: 95.2% idle,  0.5% user,  4.4% kernel,  0.0% iowait,  0.0% swap
>> Memory: 31.9G real, 199M free, 267M swap in use, 7.7G swap free
>>
>>   PID USERNAME THR PR NCE  SIZE   RES STATE   TIME FLTS    CPU COMMAND
>>   533 root      51 59   0  148M 30.6M run   520:21    0  9.77% java
>>  1210 yyyyyy     1  0   0 5248K 1048K cpu25   2:08    0  2.23% xload
>>  14720 yyyyyy     1 59   0 3248K 1256K cpu24   1:56    0  0.03% top
>>   154 root       1 59   0 4024K 1328K sleep   1:17    0  0.02% vmstat
>>  1268 yyyyyy     1 59   0 4248K 1568K sleep   1:26    0  0.01% iostat
>> ...
>>
>> VMSTAT:
>>
>> kthr      memory            page            disk          faults      cpu
>>  r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m3   in   sy   cs us sy 
>> id
>>  0 0 112 8117096 211888 55 46 0 0 425 0 912684 0 0 0 0  976  166  836  0  2 
>> 98
>>  0 0 112 8117096 211936 53 51 6 0 394 0 926702 0 0 0 0  976  167  833  0  2 
>> 98
>>
>> ARC size (B): 4065882656
>>
>> --
>> {--------1---------2---------3---------4---------5---------6---------7---------}
>> Paul Kraus
>> -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
>> -> Sound Designer: Frankenstein, A New Musical
>> (http://www.facebook.com/event.php?eid=123170297765140)
>> -> Sound Coordinator, Schenectady Light Opera Company (
>> http://www.sloctheater.org/ )
>> -> Technical Advisor, RPI Players
>>
>
>
>
> --
> {--------1---------2---------3---------4---------5---------6---------7---------}
> Paul Kraus
> -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
> -> Sound Designer: Frankenstein, A New Musical
> (http://www.facebook.com/event.php?eid=123170297765140)
> -> Sound Coordinator, Schenectady Light Opera Company (
> http://www.sloctheater.org/ )
> -> Technical Advisor, RPI Players
>



-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer: Frankenstein, A New Musical
(http://www.facebook.com/event.php?eid=123170297765140)
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import starves machine of memory

Reply via email to