Patrick Mansfield wrote:
Do you have slab poisoning on (CONFIG_DEBUG_SLAB)?

No, not yet...

I reported the following problem, it looks like nodemgr had a similar
patch to change list_for_each_safe to device_for_each_child, but
device_for_each_child is not "safe", see this thread:

http://marc.theaimsgroup.com/?t=111931541100002&r=1&w=2

With nothing more from Greg ...

I think DEBUG_SLAB will catch any use after frees there. I haven't tried
to run with *out* DEBUG_SLAB or analyze what might happen, so don't know
the symptoms for fibre channel removal (the call in
scsi_sysfs.c:scsi_remove_target()).

The patch you mention changed nodemgr_remove_host_dev which is
called when a FireWire controller is removed AFAIU. But when a
FireWire device is unplugged or switched off, a different code
path is followed in nodemgr:

static void nodemgr_suspend_ne(struct node_entry *ne)
{
        struct class_device *cdev;
        struct unit_directory *ud;

        HPSB_DEBUG("Node suspended: ID:BUS[" NODE_BUS_FMT "]  GUID[%016Lx]",
                   NODE_BUS_ARGS(ne->host, ne->nodeid), (unsigned long 
long)ne->guid);

        ne->in_limbo = 1;
        device_create_file(&ne->device, &dev_attr_ne_in_limbo);

        down_write(&ne->device.bus->subsys.rwsem);
        list_for_each_entry(cdev, &nodemgr_ud_class.children, node) {
                ud = container_of(cdev, struct unit_directory, class_dev);

                if (ud->ne != ne)
                        continue;

                if (ud->device.driver &&
                    (!ud->device.driver->suspend ||
                      ud->device.driver->suspend(&ud->device, PMSG_SUSPEND, 0)))
                        device_release_driver(&ud->device);
        }
        up_write(&ne->device.bus->subsys.rwsem);
}

If I understand it correctly, the call of device_release_driver()
leads to sbp2_remove() which calls scsi_remove_device() which, in
case of RBC disks, seems to hang in sd_shutdown()/ sd_sync_cache()/
scsi_wait_req().

Since ne->device.bus->subsys.rwsem is down, all other FireWire
device additions or removals cannot be served until
device_release_driver() returned, even everything that happens
on a second FireWire adapter. (I have two FireWire adapters, and
the other knodemgrd_# never wakes up while the first knodemgrd_#
is locked up.)

May ieee1394's rwsem cause a deadlock in scsi's device removals?
It would surprise me.
--
Stefan Richter
-=====-=-=-= -=== =====
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to