Greetings, I see repeatable crashes on some systems after upgrading.. the signature is always the same:
operating system: 5.11 snv_139 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff00175f88c0 addr=0 occurred in module "genunix" due to a NULL pointer dereference list_remove+0x1b(ffffff03e19339f0, ffffff03e0814640) zfs_acl_release_nodes+0x34(ffffff03e19339c0) zfs_acl_free+0x16(ffffff03e19339c0) zfs_znode_free+0x5e(ffffff03e17fa600) zfs_zinactive+0x9b(ffffff03e17fa600) zfs_inactive+0x11c(ffffff03e17f8500, ffffff03ee867528, 0) fop_inactive+0xaf(ffffff03e17f8500, ffffff03ee867528, 0) vn_rele_dnlc+0x6c(ffffff03e17f8500) dnlc_purge+0x175() nfs_idmap_args+0x5e(ffffff00175f8c38) nfssys+0x1e1(12, 8047dd8) The stack always looks like the above, the vnode involved is sometimes a file, sometimes a directory. e.g.: I have seen the /boot/acpi directory and the /kernel/drv/amd64/acpi_driver fie in the vnode's path field. looking at the data, I notice that the z_acl.list_head indicates a single member in the list ( presuming that is the case, because list_prev and list_next point to the same address): (ffffff03e19339c0)::print zfs_acl_t { z_acl_count = 0x6 z_acl_bytes = 0x30 z_version = 0x1 z_next_ace = 0xffffff03e171d210 z_hints = 0 z_curr_node = 0xffffff03e0814640 z_acl = { list_size = 0x40 list_offset = 0 list_head = { list_next = 0xffffff03e0814640 list_prev = 0xffffff03e0814640 } } This member's next pointer is bad ( sometimes zero, sometimes a low number, eg. 0x10) The null pointer crash happens trying to follow the list_prev pointer: 0xffffff03e0814640::print zfs_acl_node_t { z_next = { list_next = 0 list_prev = 0 } z_acldata = 0xffffff03e10b6230 z_allocdata = 0xffffff03e171d200 z_allocsize = 0x30 z_size = 0x30 z_ace_count = 0x6 z_ace_idx = 0x2 } This is a repeating pattern, seems to me always a single zfs_acl_node in the list, with null / garbaged out list_next and list_prev pointers. e.g.: in another instance of this crash, the zfs_acl_node looks like this: ::stack list_remove+0x1b(ffffff03e10d24f0, ffffff03e0fc9a00) zfs_acl_release_nodes+0x34(ffffff03e10d24c0) zfs_acl_free+0x16(ffffff03e10d24c0) zfs_znode_free+0x5e(ffffff03e10cc200) zfs_zinactive+0x9b(ffffff03e10cc200) zfs_inactive+0x11c(ffffff03e1281840, ffffff03ea5c7010, 0) fop_inactive+0xaf(ffffff03e1281840, ffffff03ea5c7010, 0) vn_rele_dnlc+0x6c(ffffff03e1281840) dnlc_purge+0x175() nfs_idmap_args+0x5e(ffffff001811ac38) nfssys+0x1e1(12, 8047dd8) _sys_sysenter_post_swapgs+0x149() > ::status ... panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff001811a8c0 addr=10 occurred in module "genunix" due to a NULL pointer dereference > ffffff03e0fc9a00::print zfs_acl_node_t { z_next = { list_next = 0xffffff03e10e1cd9 list_prev = 0x10 } z_acldata = 0 z_allocdata = 0xffffff03e10cb5d0 z_allocsize = 0x30 z_size = 0x30 z_ace_count = 0x6 z_ace_idx = 0x2 } Looks to me the crash here is the same, and list_next / list_prev are garbage. Anybody seen this? Am I skipping too many versions when I am image-updating? I am hoping someone who knows this code would chime in. Steve -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss