A little more information today. I had a feeling that ZFS would continue quite
some time before giving an error, and today I've shown that you can carry on
working with the filesystem for at least half an hour with the disk removed.
I suspect on a system with little load you could carry on working for several
hours without any indication that there is a problem. It looks to me like ZFS
is caching reads & writes, and that provided requests can be fulfilled from the
cache, it doesn't care whether the disk is present or not.
I would guess that ZFS is attempting to write to the disk in the background,
and that this is silently failing.
Here's the log of the tests I did today. After removing the drive, over a
period of 30 minutes I copied folders to the filesystem, created an archive,
set permissions, and checked properties. I did this both in the command line
and with the graphical file manager tool in Solaris. Neither reported any
errors, and all the data could be read & written fine. Until the reboot, at
which point all the data was lost, again without error.
If you're not interested in the detail, please skip to the end where I've got
some thoughts on just how many problems there are here.
# zpool status test pool: test state: ONLINE scrub: none requestedconfig:
NAME STATE READ WRITE CKSUM test ONLINE
0 0 0 c2t7d0 ONLINE 0 0 0
errors: No known data errors# zfs list testNAME USED AVAIL REFER
MOUNTPOINTtest 243M 228G 242M /test# zpool list testNAME SIZE USED
AVAIL CAP HEALTH ALTROOTtest 232G 243M 232G 0% ONLINE -
-- drive removed --
# cfgadm |grep sata1/7sata1/7 sata-port empty
unconfigured ok
-- cfgadmin knows the drive is removed. How come ZFS does not? --
# cp -r /rc-pool/copytest /test/copytest# zpool list testNAME SIZE USED
AVAIL CAP HEALTH ALTROOTtest 232G 73.4M 232G 0% ONLINE -#
zfs list testNAME USED AVAIL REFER MOUNTPOINTtest 142K 228G 18K
/test
-- Yup, still up. Let's start the clock --
# dateTue Jul 29 09:31:33 BST 2008# du -hs /test/copytest 667K /test/copytest
-- 5 minutes later, still going strong --
# dateTue Jul 29 09:36:30 BST 2008# zpool list testNAME SIZE USED AVAIL
CAP HEALTH ALTROOTtest 232G 73.4M 232G 0% ONLINE -# cp -r
/rc-pool/copytest /test/copytest2# ls /testcopytest copytest2# du -h -s /test
1.3M /test# zpool list testNAME SIZE USED AVAIL CAP HEALTH
ALTROOTtest 232G 73.4M 232G 0% ONLINE -# find /test | wc -l
2669# find //test/copytest | wc -l 1334# find
/rc-pool/copytest | wc -l 1334# du -h -s /rc-pool/copytest 5.3M
/rc-pool/copytest
-- Not sure why the original pool has 5.3MB of data when I use du. --
-- File Manager reports that they both have the same size --
-- 15 minutes later it's still working. I can read data fine --
# dateTue Jul 29 09:43:04 BST 2008# chmod 777 /test/*# mkdir /rc-pool/test2# cp
-r /test/copytest2 /rc-pool/test2/copytest2# find /rc-pool/test2/copytest2 | wc
-l 1334# zpool list testNAME SIZE USED AVAIL CAP HEALTH
ALTROOTtest 232G 73.4M 232G 0% ONLINE -
-- and yup, the drive is still offline --
# cfgadm | grep sata1/7sata1/7 sata-port empty
unconfigured ok
-- And finally, after 30 minutes the pool is still going strong --
# dateTue Jul 29 09:59:56 BST 2008
# tar -cf /test/copytest.tar /test/copytest/*# ls -ltotal 3drwxrwxrwx 3 root
root 3 Jul 29 09:30 copytest-rwxrwxrwx 1 root root
4626432 Jul 29 09:59 copytest.tardrwxrwxrwx 3 root root 3 Jul
29 09:39 copytest2# zpool list testNAME SIZE USED AVAIL CAP HEALTH
ALTROOTtest 232G 73.4M 232G 0% ONLINE -
After a full 30 minutes there's no indication whatsoever of any problem.
Checking properties of the folder in File Browser reports 2665 items, totalling
9.0MB.
At this point I tried "# zfs set sharesmb=on test". I didn't really expect it
to work, and sure enough, that command hung. zpool status also hung, so I had
to reboot the server.
-- Rebooted server --
Now I found that not only are all the files I've written in the last 30 minutes
missing, but in fact files that I had deleted several minutes prior to removing
the drive have re-appeared.
-- /test mount point is still present, I'll probably have to remove that
manually --
# cd /# lsbin export media proc systemboot
home mnt rc-pool testdev kernel net
rc-usb tmpdevices lib opt root usretc
lost+found platform sbin var
-- ZFS still has the pool mounted, but at least now it realises it's not
working --
# zpool listNAME SIZE USED AVAIL CAP HEALTH ALTROOTrc-pool 2.27T
52.6G 2.21T 2% DEGRADED -test - - - - FAULTED
-# zpool status test pool: test state: UNAVAILstatus: One or more devices
could not be opened. There are insufficient replicas for the pool to continue
functioning.action: Attach the missing device and online it using 'zpool
online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requestedconfig:
NAME STATE READ WRITE CKSUM test UNAVAIL 0 0 0
insufficient replicas c2t7d0 UNAVAIL 0 0 0 cannot open
-- At least re-activating the pool is simple, but gotta love the "No known data
errors" line --
# cfgadm -c configure sata1/7# zpool status test pool: test state: ONLINE
scrub: none requestedconfig:
NAME STATE READ WRITE CKSUM test ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
errors: No known data errors
-- But of course, although ZFS thinks it's online, it didn't mount properly --
# cd /test# ls# zpool export test# rm -r /test# zpool import test# cd test#
lsvar (copy) var2
-- Now that's unexpected. Those folders should be long gone. Let's see how
many files ZFS failed to delete --
# du -h -s /test 77M /test# find /test | wc -l 19033
So in addition to working for a full half hour creating files, it's also failed
to remove 77MB of data contained in nearly 20,000 files. And it's done all
that without reporting any error or problem with the pool.
In fact, if I didn't know what I was looking for, there would be no indication
of a problem at all. Before the reboot I can't find what's going on as "zfs
status" hangs. After the reboot it says there's no problem. Both ZFS and it's
troubleshooting tools fail in a big way here.
As others have said, "zfs status" should not hang. ZFS has to know the state
of all the drives and pools it's currently using, "zfs status" should simply
report the current known status from ZFS' internal state. It shouldn't need to
scan anything. ZFS' internal state should also be checking with cfgadm so that
it knows if a disk isn't there. It should also be updated if the cache can't
be flushed to disk, and "zfs list / zpool list" needs to borrow state
information from the status commands so that they don't say 'online' when the
pool has problems.
ZFS needs to deal more intelligently with mount points when a pool has
problems. Leaving the folder lying around in a way that prevents the pool
mounting properly when the drives are recovered is not good. When the pool
appears to come back online without errors, it would be very easy for somebody
to assume the data was lost from the pool without realising that it simply
hasn't mounted and they're actually looking at an empty folder. Firstly ZFS
should be removing the mount point when problems occur, and secondly, ZFS list
or ZFS status should include information to inform you that the pool could not
be mounted properly.
ZFS status really should be warning of any ZFS errors that occur. Including
things like being unable to mount the pool, CIFS mounts failing, etc...
And finally, if ZFS does find problems writing from the cache, it really needs
to log somewhere the names of all the files affected, and the action that could
not be carried out. ZFS knows the files it was meant to delete here, it also
knows the files that were written. I can accept that with delayed writes files
may occasionally be lost when a failure happens, but I don't accept that we
need to loose all knowledge of the affected files when the filesystem has
complete knowledge of what is affected. If there are any working filesystems
on the server, ZFS should make an attempt to store a log of the problem,
failing that it should e-mail the data out. The admin really needs to know
what files have been affected so that they can notify users of the data loss.
I don't know where you would store this information, but wherever that is,
"zpool status" should be reporting the error and directing the admin to the log
file.
I would probably say this could be safely stored on the system drive. Would it
be possible to have a number of possible places to store this log? What I'm
thinking is that if the system drive is unavailable, ZFS could try each pool in
turn and attempt to store the log there.
In fact e-mail alerts or external error logging would be a great addition to
ZFS. Surely it makes sense that filesystem errors would be better off being
stored and handled externally?
Ross
> Date: Mon, 28 Jul 2008 12:28:34 -0700> From: [EMAIL PROTECTED]> Subject: Re:
> [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed> To: [EMAIL
> PROTECTED]> > I'm trying to reproduce and will let you know what I find.> --
> richard>
_________________________________________________________________
The John Lewis Clearance - save up to 50% with FREE delivery
http://clk.atdmt.com/UKM/go/101719806/direct/01/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss