On May 23, 2012, at 2:56 PM, Jim Klimov wrote: > Thanks again, > > 2012-05-24 1:01, Richard Elling wrote: >>> At least the textual error message infers that if a hotspare >>> were available for the pool, it would kick in and invalidate >>> the device I am scrubbing to update into the pool after the >>> DD-phase (well, it was not DD but a hung-up resilver in this >>> case, but that is not substantial). >> >> The man page is clear on this topic, IMHO > > Indeed, even in snv_117 the zpool man page says that. But the > console/dmesg message was also quite clear, so go figure whom > to trust (or fear) more ;)
The FMA message is consistent with the man page. > > fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, > SEVERITY: Major > EVENT-TIME: Wed May 16 03:27:31 MSK 2012 > PLATFORM: Sun Fire X4500, CSN: 0804AMT023 , HOSTNAME: thumper > SOURCE: zfs-diagnosis, REV: 1.0 > EVENT-ID: cc25a316-4018-4f13-c675-d1d84c6325c3 > DESC: The number of checksum errors associated with a ZFS device > exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more > information. > AUTO-RESPONSE: The device has been marked as degraded. An attempt > will be made to activate a hot spare if available. > IMPACT: Fault tolerance of the pool may be compromised. > REC-ACTION: Run 'zpool status -x' and replace the bad device. > > > >>> > dd, or simular dumb block copiers, should work fine. >>> > However, they are inefficient... >>> >>> Define efficient? In terms of transferring the 900Gb payload >>> of a 1Tb HDD used for ZFS for a year - DD would beat resilver >>> anytime, in terms of getting most or (less likely) all of the >>> valid bits with data onto the new device. It is the next phase >>> (getting the rest of the bits into valid state) that needs >>> some attention, manual or automated. >> >> speed != efficiency > > Ummm... this is likely to start a flame war with other posters, > and you did not say what efficiency is to you? How can we compare > apples to meat, not even knowing whether the latter is a steak or > a pork knee? Efficiency allows use of denominators other than time. Speed is restricted to a denominator of time. There is no flame war here, look elsewhere. > I, for now, choose to stand by a statement that reduction of the > timeframe that the old disk needs to be in the system is a good > thing, as well as that changing the IO pattern from random writes > into (mostly) sequential writes and after that random reads may > be also somewhat more efficient, especially under other loads > (interfering less with them). Even though the whole replacement > process may take more wallclock time, there are cases when I'd > likely trust it to do a better job than original resilvering. > > I think, someone with equipment could stage an experiment and > compare the two procedures (existing and proposed) on a nearly > full and somewhat fragmented pool. Operationally, your method loses every time. > > Maybe you can disenchant me (not with vague phrases but either > theory or practice) and I would then see that my trust is blind, > misdirected and without basement. =) >> IMHO, this is too operationally complex for most folks. KISS wins. > > That's why I proposed to tuck this scenario under the zfs hood > (DD + selective scrub + ditto writes during the process, > as an optional alternative to current resilver), or explain > coherently why this should not be done - not for any situation. > Implementing it as a standard supported command would be KISS ;) > > Especially if it is known that with some quirks this procedure > works, and may be beneficial to some cases, i.e. by reducing > the timeframe that a pool with a flaky disk in place is exposed > to potential loss of redundancy and large amounts of data, and > in the worst case the loss is constrained to those sectors > which couldn't be (correctly) read by DD from the source disk > and couldn't be reconstructed by raidz/mirror redundancies due > to whatever overlaying problems (i.e. a sector from same block > died on another disk too). You have not made a case for why this hybrid and failure-prone procedure is required. What problem are you trying to solve? >> What is it about error counters that frightens you enough to want to clear >> them often? > > In this case, mostly, the fright of having the device kicked > out of the pool automatically instead of getting it "synced" > ("resilvered" is an improper term here, I guess) to proper state. Why not follow the well-designed existing procedure? > In general - since this is a part of some migration procedure > which is, again, expected to have errors, we don't really care > for signalling them. Why doesn't the original resilver signal > several million CKSUM errors per new empty disk when it does > reconstruction of sectors onto it? I'd say this is functionally > identical. (At least, would be - if it were part of a supported > procedure as I suggest). > > Thanks, > //Jim Klimov > > PS: I pondered for a while if I should make up an argument that > on a dying disk mechanics, lots of random IO (resilver) instead > of sequential IO (DD) would cause it to die faster, but that's > just a FUD not backed by any scientific data or statistics - > which you likely have, and perhaps opposing this argument indeed. The failure data does not support your hypothesis. -- richard -- ZFS and performance consulting http://www.RichardElling.com SCALE 10x, Los Angeles, Jan 20-22, 2012 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss