>>>>> "mp" == Mattias Pantzare <[EMAIL PROTECTED]> writes:
>> This is a big one: ZFS can continue writing to an unavailable >> pool. It doesn't always generate errors (I've seen it copy >> over 100MB before erroring), and if not spotted, this *will* >> cause data loss after you reboot. mp> This is not unique for zfs. If you need to know that your mp> writes has reached stable store you have to call fsync(). seconded. How about this: * start the copy * pull the disk, without waiting for an error reported to the application * type 'lockfs -fa'. Does either lockfs hang, or you get an immediate error after requesting the lockfs? If so, I think it's ok and within the unix tradition to allow all these writes, it's just maybe a more extreme version of the tradition, which might not be an entirely bad compromise if ZFS can keep up this behavior, and actually retry the unreported failed writes, when confronted with FC, iSCSI, USB, FW targets that bounce. I'm not sure if it can ever do that yet or not, but architecturally I wouldn't want to demand that it return failure to the app too soon, so long as fsync() still behaves correctly w.r.t. power failures. However the other problems you report are things I've run into, also. 'zpool status' should not be touching the disk at all. so, we have: * 'zpool list' shows ONLINE several minutes after a drive is yanked. At the time 'zpool list' still shows ONLINE, 'zpool status' doesn't show anything at all because it hangs, so ONLINE seems too positive a report for the situation. I'd suggest: + 'zpool list' should not borrow the ONLINE terminology from 'zpool status' if the list command means something different by the word ONLINE. maybe SEEMS_TO_BE_AROUND_SOMEWHERE is more appropriate. + during this problem, 'zpool list' is available while 'zpool status' is not working. Fine, maybe, during a failure, not all status tools will be available. However it would be nice if, as a minimum, some status tool capable of reporting ``pool X is failing'' were available. In the absence of that, you may have to reboot the machine without ever knowing even which pool failed to bring it down. * maybe sometimes certain types of status and statistics aren't available, but no status-reporting tools should ever be subject to blocking inside the kernel. At worst they should refuse to give information, and return to a prompt, immediately. I'm in the habit of typing 'zpool status &' during serious problems so I don't lose control of the console. * 'zpool status' is used when things are failing. Cabling and driver state machines are among the failures from which a volume manager should protect us---that's why we say ``buy redundant controllers if possible.'' In this scenario, a read is an intrusive act, because it could provoke a problem. so even if 'zpool status' is only reading, not writing to disk nor to data structures inside the kernel, it is still not really a status tool. It's an invasive poking/pinging/restarting/breaking tool. Such tools should be segregated, and shouldn't substitute for the requirement to have true status tools that only read data structures kept in the kernel, not update kernel structures and not touch disks. This would be like if 'ps' made an implicit call to rcapd, or activated some swapping thread, or something like that. ``My machine is sluggish. I wonder what's slowing it down. ...'ps'... oh, shit, now it's not responding at all, and I'll never know why.'' There can be other tools, too, but I think LVM2 and SVM both have carefully non-invasive status tools, don't they? This principle should be followed everywhere. For example, 'iscsiadm list discovery-address' should simply list the discovery addresses. It should not implicitly attempt to contact each discovery address in its list, while I wait. -----8<----- terabithia:/# time iscsiadm list discovery-address Discovery Address: 10.100.100.135:3260 Discovery Address: 10.100.100.138:3260 real 0m45.935s user 0m0.006s sys 0m0.019s terabithia:/# jobs [1]+ Running zpool status & terabithia:/# -----8<----- now, if you're really scalable, try the above again with 100 iSCSI targets and 20 pools. A single 'iscsiadm list discovery-address' command, even if it's sort-of ``working'', can take hours to complete. This does not happen on Linux where I configure through text files and inspect status through 'cat /proc/...' In other words, it's not just that the information 'zpool status' gives is inaccurate. It's not just that some information is hidden (like how sometimes a device listed as ONLINE will say ``no valid replicas'' when you try to offline it, and sometimes it won't, and the only way to tell the difference is to attempt to offline the device---so trying to 'zpool offline' each device in turn is a way to get some more indication of pool health than what 'zpool status' gives on its own). It's also that I don't trust 'zpool status' not to affect the information it's supposed to be reporting.
pgpZVUcCuAgE7.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss