>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
c> If that's really the excuse for this situation, then ZFS is c> not ``always consistent on the disk'' for single-VDEV pools. re> I disagree with your assessment. The on-disk format (any re> on-disk format) necessarily assumes no faults on the media. The media never failed, only the connection to the media. We've every good reason to believe that every CDB that the storage controller acknowledged as complete, was completed and is still there---and that is the only statement which must be true of unfaulty media. We've no strong reason to doubt it. re> I see no evidence that the data is or is not correct. the ``evidence'' is that it was on a SAN, and the storage itself never failed, only the connection between ZFS and the storage. Remember: this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. This sort of SAN-outage happens all the time, so it's not straining my belief to suggest that probably nothing else happened other than disruption of the connection between ZFS and the storage. It's not like a controller randomly ``acted up'' or something, so that I would suspect a bad disk. c> http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html re> I have no idea what Eric is referring to, and it does not re> match my experience. unfortunately it's very easy to match the experience of ``nothing happened'' and hard to match the experience ``exactly the same thing happened to me.'' Have you been provoking ZFS in exactly the way Eric described, a single-vdev pool on FC where the FC SAN often has outages or where the storage is rebooted while ZFS is still running? If not, obviously it doesn't match your experience because you have none with this situation. OTOH if you've been doing that a lot, your not running into this problem means something. Otherwise, it's another case of the home-user defense: ``I can't tell you how close to zero the number of problems I've had with it is. It's so close to zero, it is zero, so there's virtually 0% chance what you're saying happened to you really did happen to you. and also to this other guy.'' When I say ``doesn't mathc my experience'' I meant I _do_ see Mac OS X pinwheels and for me it's ``usually'' traceable back to VM pressure or dead NFS server, not some random application-level userinterface modal-wait as others claimed: I'm selecting for the same situation you are, and gettin g a different result. that said, yeah, a CR would be nice. For such a serious problem, I'd like to think someone's collected an image of the corrupt filesystem and is trying to figure out wtf happened. I care about how safe is my data, not how pretty is your baby. I want its relative safety accurately represented based on the experience available to us. c> How about the scenario where you lose power suddenly, but only c> half of a mirrored VDEV is available when power is restored? c> Is ZFS vulnerable to this type of unfixable corruption in that c> scenario, too? re> No, this works just fine as long as one side works. But that re> is a very different case. -- richard Why do you regard this case as very different from a single vdev? I don't have confidence that it's clearly different w.r.t. whatever hypothetical bug Eric and Tom have run into. wm> If data is sent, but corruption somewhere (the SAS bus, wm> apparently, here) causes bad data to be written, ZFS can wm> generally detect but not fix that. Why would there be bad data written? The SAS bus has checksums. The problem AIUI was that the bus went away, not that it started scribbling random data all over the place. Am I wrong? Remember what Tom's SAS bus is connected to. wm> "verifywrites" The verification is the storage array returning success to the command it was issued. ZFS is supposed to, for example, delay returning from fsync() until this has happened. The same mechanism is used to write batches of things in a well-defined order to supposedly achieve the ``always-consistent''. It depends on the drive/array's ability to accurately report when data is committed to stable storage, not on rereading what was written, and this is the correct dependency because ZFS leaves write caches on, so the drive could satisfy a read from the small on-disk cache RAM even though that data would be lost if you pulled the disk's power cord. The system contains all the tools needed to keep the consistency promises even if you go around yanking SAS cables. And this is a data-loss issue, not just an availability issue like we were discussing before w.r.t. pulling drives. wm> Every filesystem is vulnerable to corruption, all the time. Every filesystem in recent history makes rigorous guarantees about what will survive if you pull the connection to the disk array, or the host's power, at any time you wish. The guarantees always include the integrity of data written before an fsync() command was called so long as power/connectivity is lost after fsync() returns. It also includes enough metadata consistency that you won't lose a whole friggin' pool like this scenaryo with some ``corrupt data, End of Line'' error. UFS+logging vxfs FFS+softdep ext3 xfs reiserfs HFS+ Disks that go bad, storage subsystems with a RAID5 write hole, PATA busses that given noisy cables autodegrade to a non-CRC mode and then corrupt data, disks that silently return bad data, controllers that go nuts and scribble random data as the 5V rail starts dropping after the cord is pulled, can, yes, all interfere with these guarantees. but NONE OF THOSE THINGS HAPPENED IN THIS CASE. We absolutely do not live in fear that we will lose whole filesystems if the cord is pulled at the wrong time. That has not been true since, like, the early 90's. ancient history. :'
pgpywJdpmRzYO.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss