Edward Ned Harvey wrote:
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Neil Perrin
This is a consequence of the design for performance of the ZIL code.
Intent log blocks are dynamically allocated and chained together.
When reading the intent log we read each block and checksum it
with the embedded checksum within the same block. If we can't read
a block due to an IO error then that is reported, but if the checksum
does
not match then we assume it's the end of the intent log chain.
Using this design means we the minimum number of writes to add
write an intent log record is just one write.
So corruption of an intent log is not going to generate any errors.
I didn't know that. Very interesting. This raises another question ...
It's commonly stated, that even with log device removal supported, the most
common failure mode for an SSD is to blindly write without reporting any
errors, and only detect that the device is failed upon read. So ... If an
SSD is in this failure mode, you won't detect it? At bootup, the checksum
will simply mismatch, and we'll chug along forward, having lost the data ...
(nothing can prevent that) ... but we don't know that we've lost data?
If the drive's firmware isn't returning back a write error of any kind
then there isn't much that ZFS can really do here (regardless of whether
this is an SSD or not). Turning every write into a read/write operation
would totally defeat the purpose of the ZIL. It's my understanding that
SSDs will eventually transition to read-only devices once they've
exceeded their spare reallocation blocks. This should propagate to the
OS as an EIO which means that ZFS will instead store the ZIL data on the
main storage pool.
Worse yet ... In preparation for the above SSD failure mode, it's commonly
recommended to still mirror your log device, even if you have log device
removal. If you have a mirror, and the data on each half of the mirror
doesn't match each other (one device failed, and the other device is good)
... Do you read the data from *both* sides of the mirror, in order to
discover the corrupted log device, and correctly move forward without data
loss?
Yes, we read all sides of the mirror when we claim (i.e. read) the log
blocks for a log device. This is exactly what a scrub would do for a
mirrored data device.
- George
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss