[zfs-discuss] Bad results from importing a pool on two machines at once

Chris Siebenmann Tue, 03 Jun 2008 08:52:22 -0700

 As part of testing for our planned iSCSI + ZFS NFS server environment,
I wanted to see what would happen if I imported a ZFS pool on two
machines at once (as might happen someday in, for example, a failover
scenario gone horribly wrong).


 What I expected was something between a pool with damage and a pool
that was unrecoverable. What I appear to have got is a a ZFS pool
that panics the system whenever you try to import it. The panic is a
'bad checksum (read on <unknown> off 0: ... [L0 packed nvlist]' error
from zfs:zfsctl_ops_root (I've put the whole thing at the end of this
message).

 I got this without doing very much to the dual-imported pool:
        - import on both systems (-f'ing on one)
        - read a large file a few times on both systems
        - zpool export on one system
        - zpool scrub on the other; system panics
        - zpool import now panics either system

 One system was Solaris 10 U4 server with relatively current patches;
the other was Solaris 10 U5 with current patches.  (Both 64-bit x86.)

 What appears to be the same issue was reported back in April 2007 on
the mailing list, in the message
http://mail.opensolaris.org/pipermail/zfs-discuss/2007-April/039238.html,
but I don't see any followups.

 Is this a known and filed bug? Is there any idea when it might be fixed
(or the fix appear in Solaris 10)?

 I have to say that I'm disappointed with ZFS's behavior here; I don't
expect a filesystem that claims to have all sorts of checksums and
survive all sorts of disk corruptions to *ever* panic because it doesn't
like the data on the disk. That is very definitely not 'surviving disk
corruption', especially since it seems to have happened to someone who
was not doing violence to their ZFS pools the way I was.

        - cks
[The full panic:
Jun  3 11:05:14 sansol2 genunix: [ID 809409 kern.notice] ZFS: bad checksum 
(read on <unknown> off 0: zio ffffffff8e508340 [L0 packed nvlist] 4000L/600P 
DVA[0]=<0:a8000c000:600> DVA[1]=<0:1040003000:600> fletcher4 lzjb LE contiguous 
birth=119286 fill=1 
cksum=6e160f6970:632da4719324:3057ff16f69527:10e6e1af42eb9b10): error 50
Jun  3 11:05:14 sansol2 unix: [ID 100000 kern.notice] 
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9dac0 
zfs:zfsctl_ops_root+3003724c ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9dad0 
zfs:zio_next_stage+65 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9db00 
zfs:zio_wait_for_children+49 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9db10 
zfs:zio_wait_children_done+15 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9db20 
zfs:zio_next_stage+65 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9db60 
zfs:zio_vdev_io_assess+84 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9db70 
zfs:zio_next_stage+65 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9dbd0 
zfs:vdev_mirror_io_done+c1 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9dbe0 
zfs:zio_vdev_io_done+14 ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9dc60 
genunix:taskq_thread+bc ()
Jun  3 11:05:14 sansol2 genunix: [ID 655072 kern.notice] fffffe8000f9dc70 
unix:thread_start+8 ()
]
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Bad results from importing a pool on two machines at once

Reply via email to