>>>>> "dc" == Daniel Carosone <d...@geek.com.au> writes:
dc> single-disk laptops are a pretty common use-case. It does not help this case. It helps the case where a single laptop disk fails and you recover it with dd conv=noerror,sync. This case is uncommon because few people know how to do it, or bother. This should not ever be part of your plan even if you know how to do it because it will only help maybe half the time: it's silly to invest in this case. dc> As an aside, there can be non-device causes of this, dc> especially when sharing disks with other operating systems, dc> booting livecd's and etc. solution in search of a problem, as opposed to operational experience. The copies= feature is not so new that we need to imagine so optimistically, and in practice the advice ``it's generally not useful'' is the best you can give because it seems like it's currently misunderstood more often than it's used in a realistically helpful way. >> * drives do not immediately turn red and start brrk-brrking >> when they go bad. In the real world, they develop latent >> sector errors, dc> Yes, exactly - at this point, with copies=1, you get a signal dc> that your drive is about to go bad, and that data has been dc> lost. With copies=2, you get a signal that your drive is dc> about to go bad, but less disruption and data loss to go with dc> it. No, to repeat myself, with copies=2 you get a system that freezes and crashes oddly, sometimes runs for a while but cannot ever complete a 'zfs send' of the filesystems. With copies=1 you get the exact same thing. imagination does not match experience. This is what you get even on an x4500: many posters here report when a disk starts going bad you need to find it and entirely remove it before you can bother with any kind of recovery. dc> I dunno about BER spec, but I have seen sectors go unreadable dc> many times. yes. obviously. dc> Regardless of what you do in response, and how soon you dc> replace the drive, copies >1 can cover that interval. no, you are caught in taxonomic obsession again, because the exposure is not that parts of the disk gradually go bad in a predictable/controllable way with gradually rising probability and a bit of clumpyness you can avoid by spraying your copies randomly LBA-wise. It's that the disk slowly accumulates software landmines that prevent it from responding to commands in a reasonable way (increase the response time to each individual command from 30ms to 30 seconds), and confuse the storage stack above it into seemingly-arbitrary and highly controller-dependent odd behavior (causing crashes or multiplying the 30 seconds to somewhere between 180 seconds and a couple hours). Once teh disk starts going bad, anything you can recover from it is luck, and aside from disks with maybe like one bad sector where you can note which file you were reading when the system froze, reboot, and not read that file any more, I just don't think it matches experience to believe you will get a chance to read the second copy your copies=2 wrote. Remember, if the machine is still functioning but its performance is reduced 1000-fold, it's functionally equivalent to frozen for all but the most pedantic purposes.
pgpVpT7FEOEye.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss