On 03/02/2010 23:15, Aleksandr Levchuk wrote:
We switched to OpenSolaris + ZFS. RAID6 + hot spare on LSI Engenio san 
hardware, worked well for us. (I'm used to the san management GUI. Also, 
something that RAID-Z would not be able to do is: the san lights-up the amber 
LEDs on the drives that fail, so I know which one to replace.)

So, I wanted to try to stick to the hardware RAID for data protection. I 
understand that the end-to-end checks of ZFS make it better at detecting 
corruptions.

In my case, I can imagine that ZFS would FREEZ the whole volume when a single 
block or file is found to be corrupted.

Ideally, I would not like this to happen and instead would like to get a log 
with names of corrupted files.

What exactly does happens when
zfs detects a corrupted block/file and does not have redundancy to correct it?

Alex

Your wish is...
that's exactly what should happen - zpool status -v should provide you with list of affected files which you should be able to delete. in case of corrupted block contained meta-data zfs should actually be able to fix it on the fly for you as all meta-data related block are kept in at least two copies even if no redundancy is configured at pool level.

Let's test it:

mi...@r600:~# mkfile 128m file1
mi...@r600:~# zpool create test `pwd`/file1
mi...@r600:~# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

    NAME                        STATE     READ WRITE CKSUM
    test                        ONLINE       0     0     0
      /export/home/milek/file1  ONLINE       0     0     0

errors: No known data errors
mi...@r600:~#
mi...@r600:~# cp /bin/bash /test/file1
mi...@r600:~# cp /bin/bash /test/file2
mi...@r600:~# cp /bin/bash /test/file3
mi...@r600:~# cp /bin/bash /test/file4
mi...@r600:~# cp /bin/bash /test/file5
mi...@r600:~# cp /bin/bash /test/file6
mi...@r600:~# cp /bin/bash /test/file7
mi...@r600:~# cp /bin/bash /test/file8
mi...@r600:~# cp /bin/bash /test/file9
mi...@r600:~# sync
mi...@r600:~# dd if=/dev/zero of=file1 seek=50 count=10000 conv=notrunc
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.179617 s, 28.5 MB/s
mi...@r600:~# sync
mi...@r600:~# zpool scrub test
mi...@r600:~# zpool status -v test
  pool: test
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed after 0h0m with 7 errors on Thu Feb 4 00:18:40 2010
config:

    NAME                        STATE     READ WRITE CKSUM
    test                        DEGRADED     0     0     7
      /export/home/milek/file1  DEGRADED     0     0    29  too many errors

errors: Permanent errors have been detected in the following files:

        /test/file1
mi...@r600:~#
mi...@r600:~# rm /test/file1
mi...@r600:~# sync
mi...@r600:~# zpool scrub test
mi...@r600:~# zpool status -v test
  pool: test
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h0m with 0 errors on Thu Feb 4 00:19:55 2010
config:

    NAME                        STATE     READ WRITE CKSUM
    test                        DEGRADED     0     0     7
      /export/home/milek/file1  DEGRADED     0     0    29  too many errors

errors: No known data errors
mi...@r600:~# zpool clear test
mi...@r600:~# zpool scrub test
mi...@r600:~# zpool status -v test
  pool: test
 state: ONLINE
scrub: scrub completed after 0h0m with 0 errors on Thu Feb 4 00:20:12 2010
config:

    NAME                        STATE     READ WRITE CKSUM
    test                        ONLINE       0     0     0
      /export/home/milek/file1  ONLINE       0     0     0

errors: No known data errors
mi...@r600:~#
mi...@r600:~# ls -la /test/
total 7191
drwxr-xr-x  2 root root     10 2010-02-04 00:19 .
drwxr-xr-x 28 root root     30 2010-02-04 00:17 ..
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file2
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file3
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file4
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file5
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file6
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file7
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file8
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file9
mi...@r600:~#


--
Robert Milkowski
htpp://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to