Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

Brandon High Thu, 06 May 2010 09:56:42 -0700

On Wed, May 5, 2010 at 8:47 PM, Michael Sullivan
<michael.p.sulli...@mac.com> wrote:
> While it explains how to implement these, there is no information regarding 
> failure of a device in a striped L2ARC set of SSD's.  I have been hard 
> pressed to find this information anywhere, short of testing it myself, but I 
> don't have the necessary hardware in a lab to test correctly.  If someone has 
> pointers to references, could you please provide them to chapter and verse, 
> rather than the advice to "Go read the manual."


Yes, but the answer is in the man page. So reading it is a good idea:

"If a read error is encountered on a cache device, that read I/O is
reissued to the original storage pool  device,  which  might be part
of a mirrored or raidz configuration."

> I'm running 2009.11 which is the latest OpenSolaris.  I should have made that 
> clear, and that I don't intend this to be on Solaris 10 system, and am 
> waiting for the next production build anyway.  As you say, it does not exist 
> in 2009.06, this is not the latest production Opensolaris which is 2009.11, 
> and I'd be more interested in its behavior than an older release.

The "latest" is b134, which contains many, many fixes over 2009.11,
though it's a dev release.

> From the information I've been reading about the loss of a ZIL device, it 
> will be relocated to the storage pool it is assigned to.  I'm not sure which 
> version this is in, but it would be nice if someone could provide the release 
> number it is included in (and actually works), it would be nice.  Also, will 
> this functionality be included in the mythical 2010.03 release?

It's went into somewhere around b118 I think, so it will be in the
next scheduled release.

> Also, I'd be interested to know what features along these lines will be 
> available in 2010.03 if it ever sees the light of day.

Look at the latest dev release. b134 was originally slated to be
2010.03, so the feature set of the final release should be very close.

> So what you are saying is that if a single device fails in a striped L2ARC 
> VDEV, then the entire VDEV is taken offline and the fallback is to simply use 
> the regular ARC and fetch from the pool whenever there is a cache miss.

The strict interpretation of the documentation is that the read is
re-issued. My understanding is that the block that failed to be read
would then be read from the original pool.

> Or, does what you are saying here mean that if I have a 4 SSD's in a stripe 
> for my L2ARC, and one device fails, the L2ARC will be reconfigured 
> dynamically using the remaining SSD's for L2ARC.

Auto-healing in zfs would resilver the block that failed to be read,
either onto the same device or another cache device in the pool,
exactly as if a read failed on a normal pool device. It wouldn't
reconfigure the cache devices, but each failed read would cause the
blocks to be reallocated to a functioning device which has the same
effect in the end.

-B

-- 
Brandon High : bh...@freaks.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

Reply via email to