Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

Tonmaus Thu, 04 Feb 2010 02:09:18 -0800

Hi Simon

> I.e. you'll have to manually intervene
> if a consumer drive causes the system to hang, and
> replace it, whereas the RAID edition drives will
> probably report the error quickly and then ZFS will
> rewrite the data elsewhere, and thus maybe not kick
> the drive.


IMHO the relevant aspects are if ZFS is able to give accurate account on cache 
flush status and even realize if a drive is not responsive. That being said, I 
have no seen a specific report that ZFS would kick green drives at random or at 
pattern, like the poor SoHo storage enclosure users do all the time.

> 
> So it sounds preferable to have TLER in operation, if
> one can find a consumer-priced drive that allows it,
> or just take the hit and go with whatever non-TLER
> drive you choose and expect to have to manually
> intervene if a drive plays up. OK for home user where
> he is not too affected, but not good for businesses
> which need to have something recovered quickly.

One point about TLER is that two error correction schemes concur in the case 
you run a consumer drive on an active RAID controller that has its own 
mechanisms. When you run ZFS on a RAID controller in contrast to the best 
practise recommendations, an analogue question arises. On the other hand, if 
you run a green consumer drive on a dumb HBA , I wouldn't know what is wrong 
with it in the first place. 
As much as for manual interventions, the only one I am aware of would be to 
re-attach a single drive. Not an option if you are really affected like those 
miserable Thecus N7000 users that see the entire array of only a handful of 
drives drop out within hours - over and over again, or not even get to finish 
formatting the stripe set.
The dire consequences of the gossiped TLER problems let me believe that there 
would be much more and quite specific reports in this place if this was a 
systematic issue with ZFS. Other than that, we are operating outside supported 
specs when running consumer level drives in large arrays. So far at least the 
perspective of Seagate and WD.

> 
> > That all rather points to singular issues with
> > firmware bugs or similar than to a systematic
> issue,
> > doesn't it?
> 
> I'm not sure. Some people in the WDC threads seem to
> report problems with pauses during media streaming
> etc. 

This was again for SoHo storage enclosures - not for ZFS, right?

>  when the
> 32MB+ cache is empty, then it loads another 32MB into
> cache etc and so on? 

I am not sure if any current disk will have such a simplistic cache management 
that will draw upon completely cycling the buffer content, let alone for reads 
that belong to a single file (a disk basically is agnostic of files). Moreover, 
such a buffer management would be completely useless for a striped array. I 
don't know much better what a disk cache does either, but I am afraid that 
direction is probably not helpful to understanding certain phenomenons people 
have reported.

I think that at this time we are seeing a quite large amount of evolutions 
going on in disk storage, whereas many established assumptions are being 
abandoned while backwards compatibility is not always taken care of. SAS 6G 
(will my controller really work in a PCIe 1.1 slot?) and 4k clusters are 
certainly only prominent examples. It's probably even more true than ever to 
fall back to established technologies in such times, including of biting the 
bullet of cost premium on occasion.

Best regards

Tonmaus
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

Reply via email to