Answers below...

Tuomas Leikola wrote:
The endless resilver problem still persists on OI b147. Restarts when it should complete.

I see no other solution than to copy the data to safety and recreate the array. Any hints would be appreciated as that takes days unless i can stop or pause the resilvering.

On Mon, Sep 27, 2010 at 1:13 PM, Tuomas Leikola <tuomas.leik...@gmail.com <mailto:tuomas.leik...@gmail.com>> wrote:

    Hi!

    My home server had some disk outages due to flaky cabling and
    whatnot, and started resilvering to a spare disk. During this
    another disk or two dropped, and were reinserted into the array. So
    no devices were actually lost, they just were intermittently away
    for a while each.

    The situation is currently as follows:
      pool: tank
     state: ONLINE
    status: One or more devices has experienced an unrecoverable error.  An
            attempt was made to correct the error.  Applications are
    unaffected.
    action: Determine if the device needs to be replaced, and clear the
    errors
            using 'zpool clear' or replace the device with 'zpool replace'.
       see: http://www.sun.com/msg/ZFS-8000-9P
     scrub: resilver in progress for 5h33m, 22.47% done, 19h10m to go
    config:

            NAME                       STATE     READ WRITE CKSUM
            tank                       ONLINE       0     0     0
              raidz1-0                 ONLINE       0     0     0
                c11t1d0p0              ONLINE       0     0     0
                c11t2d0                ONLINE       0     0     5
                c11t6d0p0              ONLINE       0     0     0
                spare-3                ONLINE       0     0     0
                  c11t3d0p0            ONLINE       0     0     0  106M
    resilvered
                  c9d1                 ONLINE       0     0     0  104G
    resilvered
                c11t4d0p0              ONLINE       0     0     0
                c11t0d0p0              ONLINE       0     0     0
                c11t5d0p0              ONLINE       0     0     0
                c11t7d0p0              ONLINE       0     0     0  93.6G
    resilvered
              raidz1-2                 ONLINE       0     0     0
                c6t2d0                 ONLINE       0     0     0
                c6t3d0                 ONLINE       0     0     0
                c6t4d0                 ONLINE       0     0     0  2.50K
    resilvered
                c6t5d0                 ONLINE       0     0     0
                c6t6d0                 ONLINE       0     0     0
                c6t7d0                 ONLINE       0     0     0
                c6t1d0                 ONLINE       0     0     1
            logs
              /dev/zvol/dsk/rpool/log  ONLINE       0     0     0
            cache
              c6t0d0p0                 ONLINE       0     0     0
            spares
              c9d1                     INUSE     currently in use

    errors: No known data errors

    And this has been going on for a week now, always restarting when it
    should complete.

The questions in my mind atm:
    1. How can i determine the cause for each resilver? Is there a log?

If you're running OI b147 then you should be able to do the following:

# echo "::zfs_dbgmsg" | mdb -k > /var/tmp/dbg.out

Send me the output.


    2. Why does it resilver the same data over and over, and not just
    the changed bits?

If you're having drives fail prior to the initial resilver finishing then it will restart and do all the work over again. Are drives still failing randomly for you?


    3. Can i force remove c9d1 as it is no longer needed but c11t3 can
    be resilvered instead?

You can detach the spare and let the resilver work on only c11t3. Can you send me the output of 'zdb -dddd tank 0'?

Thanks,
George
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to