Pete French presumably uttered the following on 07/21/08 07:08:
The *big* issue I have right now is dealing with the slave machine going
down. Once the master no longer has a connection to the ggated devices,
all processes trying to use the device hang in D status. I have tried
pkill'ing ggatec to no avail and ggatec destroy returns a message of
gctl being busy. Trying to ggatec destroy -f panics the machine.
Oddly enough, this was the issue I had with iscsi which made me move
to using ggated instead. On our machines I use '-t 10' as an argument to
ggatec, and this makes it timeout once the connection has been down for
a certain amount of time. I am using gmirror on top, not ZFS, and this
handled the drive vanishing from the mirror quite happily. I haven't
tried it with ZFS, which may not like having the device suddenly dissapear.
-pete.
What I have found is that the master machine will lock up if the slave disappears
during a large file transfer. I tested this by setting up zpool mirror on the master
using a ggatec device from the slave. Then I:
pkill'ed ggated on the slave machine.
dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192 [128MB] on the master
The dd command finished and the /var/log/messages showed I/O errors to the slave
drive as expected. Messages also showed ggatec trying to reconnect every 10 seconds
(ggatec was started with the -t 10 parameter).
Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u
0 without getting the "ioctl(/dev/ggctl): Device busy" error message. (By the way,
using ggatec destroy does not kill the "ggatec create" that created the process to
begin with, I had to pkill ggatec to get that stop - bug?)
The above behavior would be acceptable for multi-machine mirroring as it would be
scriptable.
The problem comes with Large writes. I tried to repeat the above with
dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB]
which then locks zfs, and ultimately the system itself. It seems once the write
size/buffer is full, zfs is unable to fail/unavail the slave drive and the entire
system becomes unresponsive (cannot even ssh into it).
The bottom line is that without some type of "timeout" or "time to fail" (bad I/O to
fail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually a
shame as the recover process swapping from master to slave and back again was so
much cleaner and faster than using gmirror.
Sven
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"