Your point is well taken that ZFS should not duplicate functionality 
that is already or should be available at the device driver level.    In 
this case, I think it misses the point of what ZFS should be doing that 
it is not.

ZFS does its own periodic commits to the disk, and it knows if those 
commit points have reached the disk or not, or whether they are getting 
errors.    In this particular case, those commits to disk are presumably 
failing, because one of the disks they depend on has been removed from 
the system.   (If the writes are not being marked as failures, that 
would definitely be an error in the device driver, as you say.)  In this 
case, however, the ZIL log has stopped being updated, but ZFS does 
nothing to announce that this has happened, or to indicate that a remedy 
is required.

At the very least, it would be extremely helpful if  ZFS had a status to 
report that indicates that the ZIL log is out of date, or that there are 
troubles writing to the ZIL log, or something like that.

An additional feature would be to have user-selectable behavior when the 
ZIL log is significantly out of date.    For example, if the ZIL log is 
more than X seconds out of date, then new writes to the system should 
pause, or give errors or continue to silently succeed.

In an earlier phase of my career when I worked for a database company, I 
was responsible for a similar bug.   It caused a major customer to lose 
a major amount of data when a system rebooted when not all good data had 
been successfully committed to disk.    The resulting stink caused us to 
add a feature to detect the cases when the writing-to-disk process had 
fallen too far behind, and to pause new writes to the database until the 
situation was resolved.

Peter

Bob Friesenhahn wrote:
> While I do believe that device drivers. or the fault system, should 
> notify ZFS when a device fails (and ZFS should appropriately react), I 
> don't think that ZFS should be responsible for fault monitoring.  ZFS 
> is in a rather poor position for device fault monitoring, and if it 
> attempts to do so then it will be slow and may misbehave in other 
> ways.  The software which communicates with the device (i.e. the 
> device driver) is in the best position to monitor the device.
>
> The primary goal of ZFS is to be able to correctly read data which was 
> successfully committed to disk.  There are programming interfaces 
> (e.g. fsync(), msync()) which may be used to ensure that data is 
> committed to disk, and which should return an error if there is a 
> problem.  If you were performing your tests over an NFS mount then the 
> results should be considerably different since NFS requests that its 
> data be committed to disk.
>
> Bob
> ======================================
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to