big assumption below...

On May 24, 2012, at 6:06 AM, Jim Klimov wrote:

> Let me try to formulate my idea again... You called a similar
> process "pushing the rope" some time ago, I think.
> 
> I feel like I'm passing some exam and am trying to pick answers
> for a discipline like philosophy and I have no idea about the
> examinator's preferences - is he an ex-Communism teacher or an
> eager new religion fanatic? The same answer can lead to an A
> or to an F on a state exam. Ah, that was some fun experience :)
> 
> Well, what we know is what remains after we forget everything
> that we were taught, while the exams are our last chance to
> learn something at all =)
> 
> 2012-05-24 10:28, Richard Elling wrote:
>> You have not made a case for why this hybrid and failure-prone
>> procedure is required. What problem are you trying to solve?
> 
> Bigger-better-faster? ;)
> 
> The original proposal in this thread was about understanding
> how resilvers and scrubs work, why they are so dog slow on
> HDDs in comparison to sequential reads, and thinking aloud
> what can be improved in this area.
> 
> One of the later posts was about improving the disk replacement
> (where the original is still responsive, but may be imperfect)
> for filled-up fragmented pools by including a stage of fast
> data transfer and a different IO pattern for verification and
> updating of the new disk image, in comparison with current
> resilver's IO patterns.
> 
> This may or may not have some benefits in certain (corner?)
> cases which are of practical interest to some users on this
> list, and if this discussion leads to a POC made by a competent
> ZFS programmer, which can be tested on a variety of ZFS pools
> (without risking one's only pool on a homeNAS) - so much the
> better. Then we would see if this scenario is viable or utterly
> useless and bad in every tested case.
> 
> The practical numbers I have from the same box and disks are:
> * Copy from a 250Gb raidz1 (9*(4+1)) pool to a single-disk 3Tb
>  test pool took 24 hours to fill the new disk - including the
>  ZFS overheads.
> * Copying of one raw 250(232)Gb partition takes under 2 hours
>  (if it can sustain about 70Mb/s reads from the source without
>  distractions like other pool IO - then 1 hour).
> * Proper resilvering (reading all BP-tree from the original pool,
>  reading all blocks from the TLVDEV, writing reconstructed(?)
>  sectors to the target disk) from one partition to another
>  took 17 hours.
> * Full scrubbing (reading all blocks from the pool, fixing
>  checksum mismatches) takes 25-27 hours.
> * Selective scrubbing - unimplemented, timeframe unknown
>  (reading all BP-tree from the original pool, reading all
>  blocks from the TLVDEV including the target disk and the
>  original disk, fixing checksum mismatches without panicky
>  messages and/or hotspares kicking in).
>  I *guess* it would have similar speed to a resilver, but
>  less bound to random write IO patterns, which may be better
>  for latencies of other tasks on the system.
> 
> So, in case of original resilver, I replace the not-yet-dead
> disk with a hotspare, and after 17 hours of waiting I see if
> it was successfully resilvered or not. During this time the
> disk can die for example, leaving my pool with lowered
> protection (or lack thereof in case of raidz1 or two-way
> mirrors).
> 
> In case of the new method proposed for a POC implementation,
> after 1 hour I'd already have a somewhat reliable copy of
> that vdev (a few blocks may have mismatches,

This is a big assumption -- that the disk will operate normally, even
for data it cannot read. In my experience, this assumption is not valid
for the majority of HDD failure modes. Also, in the case of consumer-grade
disks, a single sector media error could take a very long time to retry/fail.

> but if the
> source disk dies or is taken away now - not the whole TLVDEV
> or pool is degraded and has compromised protection). Then
> after the same +17 hours for scrubs I'd be certain that
> this copy is good.
> 
> If the new writes incoming to this TLVDEV between start of
> DD and end of scrub are directed to be written on both the
> source disk and its copy, then there are less (down to zero)
> checksum discrepancies that the scrub phase would find.
> 
>> Why not follow the well-designed existing procedure?
> 
> First it was a theoretical speculation, but a couple of days
> later the incomplete resilver made me a practical experiment
> of the idea.
> 
>> The failure data does not support your hypothesis.
> Ok, then my made-up and dismissed argument does not stand ;)
> 
> Thanks for the discussion,

np 
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to