Hi All,
I noticed a behavior on a ZFS filesystem that was confusing to me and
was hoping someone can shed some light on it. The summary is that I
created two files, waited one minute, bounced the node, and noticed the
files weren't there when the node came back. There was a bad disk at
the time, which I believe is contributing to this problem. Details below.
thanks,
peter
--
Our platform is a modified x2100 system with 4 disks. We are running
this version of Solaris:
$ more /etc/release
Solaris 10 11/06 s10x_u3wos_05a X86
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 13 September 2006
One of my 4 disks is a flaky disk (/dev/dsk/c1t0d0) that is emitting
these sorts of errors:
Jan 19 00:32:55 somehost scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],0/pci108e,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd1):
Jan 19 00:32:55 somehost Error for Command: read(10) Error Level: Retryable
Jan 19 00:32:55 somehost scsi: [ID 107833 kern.notice] Requested
Block: 23676213 Error Block: 1761607680
Jan 19 00:32:55 somehost scsi: [ID 107833 kern.notice] Vendor: ATA
Serial Number:
Jan 19 00:32:55 somehost scsi: [ID 107833 kern.notice] Sense Key:
Media Error
Jan 19 00:32:55 somehost scsi: [ID 107833 kern.notice] ASC: 0x11
(unrecovered read error), ASCQ: 0x0, FRU: 0x0
This disk participates in a pool:
$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 20.5G 1.11G 19.4G 5% ONLINE -
$ zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t0d0s3 ONLINE 0 0 0
c0t1d0s3 ONLINE 0 0 0
c1t0d0s3 ONLINE 0 0 0
c1t1d0s3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t0d0s5 ONLINE 0 0 0
c0t1d0s5 ONLINE 0 0 0
c1t0d0s5 ONLINE 0 0 0
c1t1d0s5 ONLINE 0 0 0
errors: No known data errors
The filesystem is mounted like this:
$ mount
...
/config on tank/config read/write/setuid/devices/exec/atime/dev=2d50003
on Fri Jan 19 00:39:31 2007
...
I created two files and waited 60 seconds, thinking this would be enough
time for the data to sync to disk before bouncing the node.
$ echo hi > /config/file
$ cat /config/file
hi
$ ls -l /config/file
-rw-r--r-- 1 root root 3 Jan 19 00:35 /config/file
$ echo bye > /config/otherfile
$ ls -l /config/otherfile
-rw-r--r-- 1 root root 4 Jan 19 00:35 /config/otherfile
$ more /config/otherfile
bye
$ date
Fri Jan 19 00:36:06 GMT 2007
$ sleep 60
$ date
Fri Jan 19 00:37:13 GMT 2007
$ cat /config/file
hi
$ cat /config/otherfile
bye
I caused the system to reboot abruptly (using remote power control, so
no sync during reboot happened). What I noticed is that the file was
not there after the node bounce:
$ Read from remote host somehost: Connection reset by peer
Connection to somehost closed.
$ ssh somehost
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
$ ls -l /config/file
/config/file: No such file or directory
$ ls -l /config/otherfile
/config/otherfile: No such file or directory
$ zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t0d0s3 ONLINE 0 0 0
c0t1d0s3 ONLINE 0 0 0
c1t0d0s3 ONLINE 0 0 0
c1t1d0s3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t0d0s5 ONLINE 0 0 0
c0t1d0s5 ONLINE 0 0 0
c1t0d0s5 ONLINE 0 0 0
c1t1d0s5 ONLINE 0 0 0
errors: No known data errors
$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 20.5G 1.11G 19.4G 5% ONLINE -
Note that the bad disk on the node caused a normal reboot to hang. I
also verified that sync from the command line hung. I don't know how
ZFS (or Solaris) handles situations involving bad disks...does a bad
disk block proper ZFS/OS handling of all IO, even to the other healthy
disks?
Is it reasonable to have assumed that after 60 seconds the data would
have been on persistent disk even without an explicit sync? I confess I
don't know how the underlying layers are implemented. Are there mount
options or other config parameters we should tweak to get more reliable
behavior in this case?
So far as I've seen, this behavior is reproducible, if someone on the
ZFS team wishes to take a closer look at this scenario.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss