Joe - We definitely don't do great accounting of the 'vdev_islog' state here, and it's possible to create a situation where the parent replacing vdev has the state set but the children do not, but I have been unable to reproduce the behavior you saw. I have rebooted the system during resilver, manually detached the replacing vdev, and a variety of other things, but I've never seen the behavior you describe. In all cases, the log state is kept with the replacing vdev and restored when the resilver completes. I have also not observed the resilver failing with a bad log device.
Can you provide more information about how to reproduce this problem? Perhaps without rebooting into B70 in the middle? Thanks, - Eric On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote: > Yeah, I noticed this the other day while I was working on an unrelated > problem. The basic problem is that log devices are kept within the > normal vdev tree, and are only distinguished by a bit indicating that > they are log devices (and is the source for a number of other > inconsistencies that Pwel has encountered). > > When doing a replacement, the userland code is responsible for creating > the vdev configuration to use for the newly attached vdev. In this > case, it doesn't preserve the 'is_log' bit correctly. This should be > enforced in the kernel - it doesn't make sense to replace a log device > with a non-log device, ever. > > I have a workspace with some other random ZFS changes, so I'll try to > include this as well. > > FWIW, removing log devices is significantly easier than removing > arbitrary devices, since there is no data to migrate (after the current > txg is synced). At one point there were plans to do this as a separate > piece of work (since the vdev changes are needed for the general case > anyway), but I don't know whether this is still the case. > > - Eric > > On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: > > This past weekend, but holiday was ruined due to a log device > > "replacement" gone awry. > > > > I posted all about it here: > > > > http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html > > > > In a nutshell, an resilver of a single log device with itself, due to > > the fact one can't remove a log device from a pool once defined, cause > > ZFS to fully resilver but then attach the log device as as stripe to > > the volume, and no longer as a log device. The subsequent pool failure > > was exceptionally bad as the volume could no longer be imported and > > required read-only mounting of the remaining filesystems that I could > > to recover data. It would appear that log resilvers are broken, at > > least up to B85. I haven't seen code changes in this space so I > > presume this is likely an unaddressed problem. > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss