2012-02-04 18:27, Jim Klimov wrote:
panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff0010a5e920 addr=30 occurred in module "zfs" due to a NULL pointer dereference panicstack = unix:die+dd () | unix:trap+1799 () | unix:cmntrap+e6 () | zfs:ddt_phys_decref+c () | zfs:zio_ddt_free+5c () | zfs:zio_execute+8d () | genunix:taskq_thread+285 () | unix:thread_start+8 () | crashtime = 1328314064 panic-time = February 4, 2012 04:07:44 AM MSK MSK (end fault-list[0])
I've finally looked into the code, and the problem seems weird indeed. With "properly" wrong on-disk data the code can take such a path that it tries to dereference a NULL pointer. This is a bug, methinks: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zio.c#zio_ddt_free 2100 static int 2101 zio_ddt_free(zio_t *zio) 2102 { 2103 spa_t *spa = zio->io_spa; 2104 blkptr_t *bp = zio->io_bp; 2105 ddt_t *ddt = ddt_select(spa, bp); 2106 ddt_entry_t *dde; 2107 ddt_phys_t *ddp; 2108 2109 ASSERT(BP_GET_DEDUP(bp)); 2110 ASSERT(zio->io_child_type == ZIO_CHILD_LOGICAL); 2111 2112 ddt_enter(ddt); 2113 freedde = dde = ddt_lookup(ddt, bp, B_TRUE); 2114 ddp = ddt_phys_select(dde, bp); 2115 ddt_phys_decref(ddp); 2116 ddt_exit(ddt); 2117 2118 return (ZIO_PIPELINE_CONTINUE); 2119 } The way I see it, this function: 2109) asserts that the dedup bit is set on the blockpointer, 2113) selects a DDT entry matching the BP with ddt_lookup() Strangely for me, it requests creation of a new DDT entry if a matching one doesn't exist (B_TRUE) - even though this is a DDT-free routine. I doubt this autocreated entry gets a DVA assigned, but I did not trace the code that far to be certain. 2114) then it tries to find the "ddp" pointer to the DDT entry with ddt_phys_select(), which can validly return NULL if no DDT entries, with same DVA[0] and same phys_birth as the bp, exist. I speculated above that an autocreated DDT entry might have no DVA yet, maybe. I also wonder about cases where phys_birth TXGs would differ for some reason, even if an otherwise matching DDT entry exists (BP rewrite in future, some lag in committing-to-disk today)? http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/ddt.c#ddt_phys_select 2115) ddt_phys_decref() just tries to use ddp as a pointer to structure without checking if it is NULL itself: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/ddt.c#ddt_phys_decref 309 void 310 ddt_phys_decref(ddt_phys_t *ddp) 311 { 312 ASSERT((int64_t)ddp->ddp_refcnt > 0); 313 ddp->ddp_refcnt--; 314 } 315 According to my stacktrace in the kernel panic message, this should be where the system panics and fails. Seemingly this should work if on-disk data is consistent. Since I've had some "strangeness" regarding corrupt data blocks pointed to by DDT, with some hickups and reboots during or soon-after repair attempts, I can't vouch that my on-disk data is consistent, or that the whole block tree is readable and accessible to the OS now. But the code allows for failure is something goes wrong. I'd add a lot more NULL checks for values (especially pointers) returned by functions called from zio_ddt_free() and/or sanity checks for values fed into such functions. Actually, I'd rewrite this part of zio_ddt_free() like this: - freedde = dde = ddt_lookup(ddt, bp, B_TRUE); + freedde = dde = ddt_lookup(ddt, bp, B_FALSE); + if (dde) { ddp = ddt_phys_select(dde, bp); + if (ddp) { ddt_phys_decref(ddp); + } + } ddt_exit(ddt); If ddt_lookup() finds no matching DDE, fine - we're freed. If ddt_phys_select() finds no DDP, not so fine, but that's not a good reason to kernel-panic. Perhaps log a message and maybe leave a leaked block? Or relax the phys_birth check? Am I mistaken anywhere? :) Thanks, //Jim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss