2012-02-09 22:43, Jim Klimov пишет:
2012-02-04 18:27, Jim Klimov wrote:
panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff0010a5e920
addr=30 occurred in module "zfs" due to a NULL pointer dereference
panicstack = unix:die+dd () | unix:trap+1799 () | unix:cmntrap+e6
() | zfs:ddt_phys_decref+c () | zfs:zio_ddt_free+5c () |
zfs:zio_execute+8d () | genunix:taskq_thread+285 () |
unix:thread_start+8 () | crashtime = 1328314064
panic-time = February 4, 2012 04:07:44 AM MSK MSK
(end fault-list[0])

I've finally looked into the code, and the problem seems
weird indeed. With "properly" wrong on-disk data the code
can take such a path that it tries to dereference a NULL
pointer. (...)

Actually, I'd rewrite this part of zio_ddt_free() like
this (...)
If ddt_lookup() finds no matching DDE, fine - we're freed.
If ddt_phys_select() finds no DDP, not so fine, but that's
not a good reason to kernel-panic. Perhaps log a message
and maybe leave a leaked block? Or relax the phys_birth check?
Am I mistaken anywhere? :)

To follow up, I couraged up and implemented this simple fix
on my (now tweaked) oi_151a installation (diff below).

While importing the pool and then while rsyncing the remaining
problematic files into it, I've got no more kernel panics.
There were several (expected) hits of the diagnostic message
(embedded into ddt_phys_select() below) reporting that it
has indeed failed to find a DDP matching the requested BP,
and returns a NULL. Further checks in ddt_phys_decref() now
skip dereferencing if the pointer is NULL, and the thing
"just works" okay ;)

Possibly, the ASSERTs added below into other functions
should be replaced by if() blocks to fire regardless of
debug/nondebug kernels. Anyhow, I still believe that
functions should do sanity-checks on input values, even
if at cost of some performance - so as to not dereference
NULL pointers, at least... ;)

I am still open to suggestions from ZFS coding gurus as to
why my solution or overall approach might be a bad idea,
or to accept the solution into upstream code ;)

I see that the zfs-code mailing list is essentially dead
for months, and the recent active writers were people from
zfs-dscuss, so I just post here...

===================
Under /code/illumos-gate/usr/src/uts/common/fs/zfs :

--- ddt.c.orig  2012-01-25 20:20:11.000000000 +0400
+++ ddt.c       2012-02-24 02:14:29.273050980 +0400
@@ -297,20 +297,25 @@
 void
 ddt_phys_clear(ddt_phys_t *ddp)
 {
+       ASSERT(ddp != NULL);
        bzero(ddp, sizeof (*ddp));
 }

 void
 ddt_phys_addref(ddt_phys_t *ddp)
 {
+       ASSERT(ddp != NULL);
        ddp->ddp_refcnt++;
 }

 void
 ddt_phys_decref(ddt_phys_t *ddp)
 {
-       ASSERT((int64_t)ddp->ddp_refcnt > 0);
-       ddp->ddp_refcnt--;
+//     ASSERT(ddp != NULL);
+       if (ddp) {
+               ASSERT((int64_t)ddp->ddp_refcnt > 0);
+               ddp->ddp_refcnt--;
+       }
 }

 void
@@ -333,6 +338,17 @@
                    BP_PHYSICAL_BIRTH(bp) == ddp->ddp_phys_birth)
                        return (ddp);
        }
+       (void) printf("ddt_phys_select() found nothing for \n"
+           "\tDVA[BP] =<%llu:%llx:%llx> and phys_birth[BP]  %llu and\n"
+           "\tDVA[DDP]=<%llu:%llx:%llx> and phys_birth[DDP] %llu\n",
+           (u_longlong_t)DVA_GET_VDEV(BP_IDENTITY(bp)),
+           (u_longlong_t)DVA_GET_OFFSET(BP_IDENTITY(bp)),
+           (u_longlong_t)DVA_GET_ASIZE(BP_IDENTITY(bp)),
+           BP_PHYSICAL_BIRTH(bp),
+           (u_longlong_t)DVA_GET_VDEV(&ddp->ddp_dva[0]),
+           (u_longlong_t)DVA_GET_OFFSET(&ddp->ddp_dva[0]),
+           (u_longlong_t)DVA_GET_ASIZE(&ddp->ddp_dva[0]),
+           ddp->ddp_phys_birth);
        return (NULL);
 }

--- zio.c.orig  2012-01-25 20:20:12.000000000 +0400
+++ zio.c       2012-02-10 20:58:02.745614986 +0400
@@ -2111,8 +2111,12 @@

        ddt_enter(ddt);
        freedde = dde = ddt_lookup(ddt, bp, B_TRUE);
-       ddp = ddt_phys_select(dde, bp);
-       ddt_phys_decref(ddp);
+       if (dde) {
+               ddp = ddt_phys_select(dde, bp);
+               if (ddp) {
+                       ddt_phys_decref(ddp);
+               }
+       }
        ddt_exit(ddt);

        return (ZIO_PIPELINE_CONTINUE);

===================

HTH,
//Jim Klimov
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to