Joe -

We definitely don't do great accounting of the 'vdev_islog' state here,
and it's possible to create a situation where the parent replacing vdev
has the state set but the children do not, but I have been unable to
reproduce the behavior you saw.  I have rebooted the system during
resilver, manually detached the replacing vdev, and a variety of other
things, but I've never seen the behavior you describe.  In all cases,
the log state is kept with the replacing vdev and restored when the
resilver completes.  I have also not observed the resilver failing with
a bad log device.

Can you provide more information about how to reproduce this problem?
Perhaps without rebooting into B70 in the middle?

Thanks,

- Eric

On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote:
> Yeah, I noticed this the other day while I was working on an unrelated
> problem.  The basic problem is that log devices are kept within the
> normal vdev tree, and are only distinguished by a bit indicating that
> they are log devices (and is the source for a number of other
> inconsistencies that Pwel has encountered).
> 
> When doing a replacement, the userland code is responsible for creating
> the vdev configuration to use for the newly attached vdev.  In this
> case, it doesn't preserve the 'is_log' bit correctly.  This should be
> enforced in the kernel - it doesn't make sense to replace a log device
> with a non-log device, ever.
> 
> I have a workspace with some other random ZFS changes, so I'll try to
> include this as well.
> 
> FWIW, removing log devices is significantly easier than removing
> arbitrary devices, since there is no data to migrate (after the current
> txg is synced).  At one point there were plans to do this as a separate
> piece of work (since the vdev changes are needed for the general case
> anyway), but I don't know whether this is still the case.
> 
> - Eric
> 
> On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote:
> > This past weekend, but holiday was ruined due to a log device
> > "replacement" gone awry.
> > 
> > I posted all about it here:
> > 
> > http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html
> > 
> > In a nutshell, an resilver of a single log device with itself, due to
> > the fact one can't remove a log device from a pool once defined, cause
> > ZFS to fully resilver but then attach the log device as as stripe to
> > the volume, and no longer as a log device. The subsequent pool failure
> > was exceptionally bad as the volume could no longer be imported and
> > required read-only mounting of the remaining filesystems that I could
> > to recover data. It would appear that log resilvers are broken, at
> > least up to B85. I haven't seen code changes in this space so I
> > presume this is likely an unaddressed problem.
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> --
> Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to