Re: [zfs-discuss] ZIL reliability/replication questions

Eric Schrock Thu, 18 Oct 2007 13:57:19 -0700

On Thu, Oct 18, 2007 at 02:29:27PM -0600, Neil Perrin wrote:
> 
> > So, the only way to lose transactions would be a crash or power loss,
> > leaving outstanding transactions in the log, followed by the log
> > device failing to start up on reboot?  I assume that that would that
> > be handled relatively cleanly (files have out of data data), as
> > opposed to something nasty like the pool fails to start up.
> 
> I just checked on the behaviour of this. The log is treated as part
> of the main pool. If it is not replicated and disappears then the pool
> can't be opened - just like any unreplicated device in the main pool.
> If the slog is found but can't be opened or is corrupted then then the
> pool will be opened but the slog isn't used.
> This seems a bit inconsistent.
>


It's worth noting that this is a generic problem.  In the world of
metadata replication (ditto blocks), even an unreplicated normal device
does not necessarily render a pool completely faulted.  The code needs
to be modified across the board so that the root vdev never ends up in
the FAULTED state, and then pool health is based solely on the ability
to read some basic piece of information, such as a successful
dsl_pool_open().  From the looks of things, this will "just work" if we
get rid of the too_many_errors() call and associated code in
vdev_root.c, but I'm sure there would be some odd edge conditions.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZIL reliability/replication questions

Reply via email to