On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote: > On 4/24/2014 10:59 AM, Li Zhong wrote: > > On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote: > >> On 4/23/2014 4:23 PM, Tejun Heo wrote: > >>> Hello, Rafael. > >> Hi, > >> > >>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote: > >>>> Can you please elaborate a bit? > >>> Because it can get involved in larger locking dependency issues by > >>> joining dependency graphs of two otherwise largely disjoint > >>> subsystems. It has potential to create possible deadlocks which don't > >>> need to exist. > >> Well, I do my best not to add unnecessary locks if that's what you mean. > >> > >>>> It is there to protect hotplug operations involving multiple devices > >>>> (in different subsystems) from racing with each other. Why exactly > >>>> is it bad? > >>> But why would different subsystems, say cpu and memory, use the same > >>> lock? Wouldn't those subsystems already have proper locking inside > >>> their own subsystems? > >> That locking is not sufficient. > >> > >>> Why add this additional global lock across multiple subsystems? > >> That basically is because of how eject works when it is triggered via ACPI. > >> > >> It is signaled for a device at the top of a subtree. It may be a > >> container of some sort and the eject involves everything below that > >> device in the ACPI namespace. That may involve multiple subsystem > >> (CPUs, memory, PCI host bridge, etc.). > >> > >> We do that in two steps, offline (which can fail) and eject proper > >> (which can't fail and makes all of the involved devices go away). All > >> that has to be done in one go with respect to the sysfs-triggered > >> offline/online and that's why the lock is there. > > Thank you for the education. It's more clear to me now why we need this > > lock. > > > > I still have some small questions about when this lock is needed: > > > > I could understand sysfs-triggered online is not acceptable when > > removing devices in multiple subsystems. But maybe concurrent offline > > and remove(with proper per subsystem locks) seems not harmful? > > > > And if we just want to remove some devices in a specific subsystem, e.g. > > like writing /cpu/release, if it just wants to offline and remove some > > cpus, then maybe we don't require the device_hotplug_lock to be taken? > > No and no. > > If the offline phase fails for any device in the subtree, we roll back > the operation > and online devices that have already been offlined by it. Also the ACPI > hot-addition > needs to acquire device_hotplug_lock so that it doesn't race with ejects > and so > that lock needs to be taken by sysfs-triggered offline too.
I can understand that hot-addition needs the device_hotplug lock, but still not very clear about the offline. I guess your are describing following scenario: user A: (trying remove cpu 1 and memory 1-10) lock_device_hotplug offline cpu with cpu locks -- successful offline memories with memory locks -- failed, e.g. for memory8 online cpu and memory with their locks unlock_device_hotplug user B: (trying offline cpu 1) offline cpu with cpu locks But I don't see any problem for user B not taking the device_hotplug lock. The result may be different for user B to take or not take the lock. But I think it could be seen as concurrent online/offline for cpu1 under cpu hotplug locks, which just depends on which is executed last? Or did I miss something here? Thanks, Zhong > Thanks, > Rafael > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/