Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink
On Wed, Apr 20, 2022 at 10:54:59PM -0700, Christoph Hellwig wrote: > On Thu, Apr 21, 2022 at 02:35:02PM +1000, Dave Chinner wrote: > > Sure, I'm not a maintainer and just the stand-in patch shepherd for > > a single release. However, being unable to cleanly merge code we > > need integrated into our local subsystem tree for integration > > testing because a patch dependency with another subsystem won't gain > > a stable commit ID until the next merge window is distinctly > > suboptimal. > > Yes. Which is why we've taken a lot of mm patchs through other trees, > sometimes specilly crafted for that. So I guess in this case we'll > just need to take non-trivial dependencies into the XFS tree, and just > deal with small merge conflicts for the trivial ones. OK. As Naoyo has pointed out, the first dependency/conflict Ruan has listed looks trivial to resolve. The second dependency, OTOH, is on a new function added in the patch pointed to. That said, at first glance it looks to be independent of the first two patches in that series so I might just be able to pull that one patch in and have that leave us with a working fsdax+reflink tree. Regardless, I'll wait to see how much work the updated XFS/DAX reflink enablement patchset still requires when Ruan posts it before deciding what to do here. If it isn't going to be a merge candidate, what to do with this patchset is moot because there's little to test without reflink enabled... Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH v13 2/7] mm: factor helpers for memory_failure_dev_pagemap
在 2022/4/21 14:13, HORIGUCHI NAOYA(堀口 直也) 写道: On Tue, Apr 19, 2022 at 12:50:40PM +0800, Shiyang Ruan wrote: memory_failure_dev_pagemap code is a bit complex before introduce RMAP feature for fsdax. So it is needed to factor some helper functions to simplify these code. Signed-off-by: Shiyang Ruan Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Dan Williams Thanks for the refactoring. As I commented to 0/7, the conflict with "mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()" can be trivially resolved. Another few comment below ... --- mm/memory-failure.c | 157 1 file changed, 87 insertions(+), 70 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index e3fbff5bd467..7c8c047bfdc8 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1498,6 +1498,90 @@ static int try_to_split_thp_page(struct page *page, const char *msg) return 0; } +static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn, + struct address_space *mapping, pgoff_t index, int flags) +{ + struct to_kill *tk; + unsigned long size = 0; + + list_for_each_entry(tk, to_kill, nd) + if (tk->size_shift) + size = max(size, 1UL << tk->size_shift); + + if (size) { + /* +* Unmap the largest mapping to avoid breaking up device-dax +* mappings which are constant size. The actual size of the +* mapping being torn down is communicated in siginfo, see +* kill_proc() +*/ + loff_t start = (index << PAGE_SHIFT) & ~(size - 1); + + unmap_mapping_range(mapping, start, size, 0); + } + + kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags); +} + +static int mf_generic_kill_procs(unsigned long long pfn, int flags, + struct dev_pagemap *pgmap) +{ + struct page *page = pfn_to_page(pfn); + LIST_HEAD(to_kill); + dax_entry_t cookie; + int rc = 0; + + /* +* Pages instantiated by device-dax (not filesystem-dax) +* may be compound pages. +*/ + page = compound_head(page); + + /* +* Prevent the inode from being freed while we are interrogating +* the address_space, typically this would be handled by +* lock_page(), but dax pages do not use the page lock. This +* also prevents changes to the mapping of this pfn until +* poison signaling is complete. +*/ + cookie = dax_lock_page(page); + if (!cookie) + return -EBUSY; + + if (hwpoison_filter(page)) { + rc = -EOPNOTSUPP; + goto unlock; + } + + if (pgmap->type == MEMORY_DEVICE_PRIVATE) { + /* +* TODO: Handle HMM pages which may need coordination +* with device-side memory. +*/ + return -EBUSY; Don't we need to go to dax_unlock_page() as the origincal code do? + } + + /* +* Use this flag as an indication that the dax page has been +* remapped UC to prevent speculative consumption of poison. +*/ + SetPageHWPoison(page); + + /* +* Unlike System-RAM there is no possibility to swap in a +* different physical page at a given virtual address, so all +* userspace consumption of ZONE_DEVICE memory necessitates +* SIGBUS (i.e. MF_MUST_KILL) +*/ + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; + collect_procs(page, &to_kill, true); + + unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); +unlock: + dax_unlock_page(page, cookie); + return rc; +} + /* * Called from hugetlb code with hugetlb_lock held. * @@ -1644,12 +1728,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, struct dev_pagemap *pgmap) { struct page *page = pfn_to_page(pfn); - unsigned long size = 0; - struct to_kill *tk; LIST_HEAD(tokill); Is this variable unused in this function? Yes, this one and the one above are mistakes I didn't notice when I resolving conflicts with the newer next- branch. I'll fix them in next version. -- Thanks, Ruan. Thanks, Naoya Horiguchi
Re: [PATCH v13 5/7] mm: Introduce mf_dax_kill_procs() for fsdax case
On Tue, Apr 19, 2022 at 12:50:43PM +0800, Shiyang Ruan wrote: > This new function is a variant of mf_generic_kill_procs that accepts a > file, offset pair instead of a struct to support multiple files sharing > a DAX mapping. It is intended to be called by the file systems as part > of the memory_failure handler after the file system performed a reverse > mapping from the storage address to the file and file offset. > > Signed-off-by: Shiyang Ruan > Reviewed-by: Dan Williams > Reviewed-by: Christoph Hellwig > --- ... > @@ -539,13 +548,41 @@ static void collect_procs_file(struct page *page, > struct list_head *to_kill, >* to be informed of all such data corruptions. >*/ > if (vma->vm_mm == t->mm) > - add_to_kill(t, page, vma, to_kill); > + add_to_kill(t, page, 0, vma, to_kill); > } > } > read_unlock(&tasklist_lock); > i_mmap_unlock_read(mapping); > } > > +#if IS_ENABLED(CONFIG_FS_DAX) This macro is equivalent with "#ifdef CONFIG_FS_DAX", and you also add "#ifdef" below, so aligning to either (I prefer "#ifdef") might be better. > +/* > + * Collect processes when the error hit a fsdax page. > + */ > +static void collect_procs_fsdax(struct page *page, > + struct address_space *mapping, pgoff_t pgoff, > + struct list_head *to_kill) Unlike collect_procs_file(), this new function does not have parameter force_early, and passes true unconditionally to task_early_kill(). Looking at the current code, I noticed the following code and comment: /* * Unlike System-RAM there is no possibility to swap in a * different physical page at a given virtual address, so all * userspace consumption of ZONE_DEVICE memory necessitates * SIGBUS (i.e. MF_MUST_KILL) */ flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; , which forcibly sets MF_ACTION_REQUIRED and I guess that this is the reason for passing true above. But now I'm not sure that setting these flags for all error events on NVDIMM is really right thing. The above comment sounds to me that memory_failure_dev_pagemap() is called to handle the event when the data on ZONE_DEVICE memory is about to be accessed/consumed, but is that the only case? I thought that memory_failure() can be called by memory scrubbing *before* some process actually access to it, and MCE handler judges whether action is required or not based on MSRs. Does this not happen on NVDIMM error? Anyway this question might be a little off-topic, so not to be a blocker for this patchset. Thanks, Naoya Horiguchi
[PATCH v3 2/8] cxl/acpi: Add root device lockdep validation
The CXL "root" device, ACPI0017, is an attach point for coordinating platform level CXL resources and is the parent device for a CXL port topology tree. As such it has distinct locking rules relative to other CXL subsystem objects, but because it is an ACPI device the lock class is established well before it is given to the cxl_acpi driver. However, the lockdep API does support changing the lock class "live" for situations like this. Add a device_lock_set_class() helper that a driver can use in ->probe() to set a custom lock class, and device_lock_reset_class() to return to the default "no validate" class before the custom lock class key goes out of scope after ->remove(). Note the helpers are all macros to support dead code elimination in the CONFIG_PROVE_LOCKING=n case. Suggested-by: Peter Zijlstra Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Ingo Molnar Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Cc: Ben Widawsky Cc: Jonathan Cameron Signed-off-by: Dan Williams --- drivers/cxl/acpi.c | 15 +++ include/linux/device.h | 25 + 2 files changed, 40 insertions(+) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index d15a6aec0331..e19cea27387e 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -275,6 +275,15 @@ static int add_root_nvdimm_bridge(struct device *match, void *data) return 1; } +static struct lock_class_key cxl_root_key; + +static void cxl_acpi_lock_reset_class(void *_dev) +{ + struct device *dev = _dev; + + device_lock_reset_class(dev); +} + static int cxl_acpi_probe(struct platform_device *pdev) { int rc; @@ -283,6 +292,12 @@ static int cxl_acpi_probe(struct platform_device *pdev) struct acpi_device *adev = ACPI_COMPANION(host); struct cxl_cfmws_context ctx; + device_lock_set_class(&pdev->dev, &cxl_root_key); + rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class, + &pdev->dev); + if (rc) + return rc; + root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL); if (IS_ERR(root_port)) return PTR_ERR(root_port); diff --git a/include/linux/device.h b/include/linux/device.h index 93459724dcde..82c9d307e7bd 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -850,6 +850,31 @@ static inline bool device_supports_offline(struct device *dev) return dev->bus && dev->bus->offline && dev->bus->online; } +#define __device_lock_set_class(dev, name, key) \ + lock_set_class(&(dev)->mutex.dep_map, name, key, 0, _THIS_IP_) + +/** + * device_lock_set_class - Specify a temporary lock class while a device + *is attached to a driver + * @dev: device to modify + * @key: lock class key data + * + * This must be called with the device_lock() already held, for example + * from driver ->probe(). + */ +#define device_lock_set_class(dev, key)\ + __device_lock_set_class(dev, #key, key) + +/** + * device_lock_reset_class - Return a device to the default lockdep novalidate state + * @dev: device to modify + * + * This must be called with the device_lock() already held, for example + * from driver ->remove(). + */ +#define device_lock_reset_class(dev) \ + device_lock_set_class(dev, &__lockdep_no_validate__) + void lock_device_hotplug(void); void unlock_device_hotplug(void); int lock_device_hotplug_sysfs(void);
[PATCH v3 3/8] cxl: Drop cxl_device_lock()
Now that all CXL subsystem locking is validated with custom lock classes, there is no need for the custom usage of the lockdep_mutex. Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Cc: Ben Widawsky Cc: Jonathan Cameron Signed-off-by: Dan Williams --- drivers/cxl/core/pmem.c |4 +- drivers/cxl/core/port.c | 55 ++--- drivers/cxl/cxl.h | 78 --- drivers/cxl/mem.c |4 +- drivers/cxl/pmem.c | 12 --- lib/Kconfig.debug |6 6 files changed, 33 insertions(+), 126 deletions(-) diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index e825e261278d..bec7cfb54ebf 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -124,10 +124,10 @@ static void unregister_nvb(void *_cxl_nvb) * work to flush. Once the state has been changed to 'dead' then no new * work can be queued by user-triggered bind. */ - cxl_device_lock(&cxl_nvb->dev); + device_lock(&cxl_nvb->dev); flush = cxl_nvb->state != CXL_NVB_NEW; cxl_nvb->state = CXL_NVB_DEAD; - cxl_device_unlock(&cxl_nvb->dev); + device_unlock(&cxl_nvb->dev); /* * Even though the device core will trigger device_release_driver() diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 750aac95ed5f..ea60abda6500 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -312,10 +312,10 @@ static void cxl_port_release(struct device *dev) struct cxl_port *port = to_cxl_port(dev); struct cxl_ep *ep, *_e; - cxl_device_lock(dev); + device_lock(dev); list_for_each_entry_safe(ep, _e, &port->endpoints, list) cxl_ep_release(ep); - cxl_device_unlock(dev); + device_unlock(dev); ida_free(&cxl_port_ida, port->id); kfree(port); } @@ -556,7 +556,7 @@ static int match_root_child(struct device *dev, const void *match) return 0; port = to_cxl_port(dev); - cxl_device_lock(dev); + device_lock(dev); list_for_each_entry(dport, &port->dports, list) { iter = match; while (iter) { @@ -566,7 +566,7 @@ static int match_root_child(struct device *dev, const void *match) } } out: - cxl_device_unlock(dev); + device_unlock(dev); return !!iter; } @@ -625,13 +625,13 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) static void cond_cxl_root_lock(struct cxl_port *port) { if (is_cxl_root(port)) - cxl_device_lock(&port->dev); + device_lock(&port->dev); } static void cond_cxl_root_unlock(struct cxl_port *port) { if (is_cxl_root(port)) - cxl_device_unlock(&port->dev); + device_unlock(&port->dev); } static void cxl_dport_remove(void *data) @@ -738,15 +738,15 @@ static int add_ep(struct cxl_port *port, struct cxl_ep *new) { struct cxl_ep *dup; - cxl_device_lock(&port->dev); + device_lock(&port->dev); if (port->dead) { - cxl_device_unlock(&port->dev); + device_unlock(&port->dev); return -ENXIO; } dup = find_ep(port, new->ep); if (!dup) list_add_tail(&new->list, &port->endpoints); - cxl_device_unlock(&port->dev); + device_unlock(&port->dev); return dup ? -EEXIST : 0; } @@ -856,12 +856,12 @@ static void delete_endpoint(void *data) goto out; parent = &parent_port->dev; - cxl_device_lock(parent); + device_lock(parent); if (parent->driver && endpoint->uport) { devm_release_action(parent, cxl_unlink_uport, endpoint); devm_release_action(parent, unregister_port, endpoint); } - cxl_device_unlock(parent); + device_unlock(parent); put_device(parent); out: put_device(&endpoint->dev); @@ -922,7 +922,7 @@ static void cxl_detach_ep(void *data) } parent_port = to_cxl_port(port->dev.parent); - cxl_device_lock(&parent_port->dev); + device_lock(&parent_port->dev); if (!parent_port->dev.driver) { /* * The bottom-up race to delete the port lost to a @@ -930,12 +930,12 @@ static void cxl_detach_ep(void *data) * parent_port ->remove() will have cleaned up all * descendants. */ - cxl_device_unlock(&parent_port->dev); + device_unlock(&parent_port->dev); put_device(&port->dev); continue; } - cxl_device_lock(&port->dev); + device_lock(&port->dev); ep = find_ep(port, &cxlmd->dev);
[PATCH v3 4/8] nvdimm: Replace lockdep_mutex with local lock classes
In response to an attempt to expand dev->lockdep_mutex for device_lock() validation [1], Peter points out [2] that the lockdep API already has the ability to assign a dedicated lock class per subsystem device-type. Use lockdep_set_class() to override the default device_lock() '__lockdep_no_validate__' class for each NVDIMM subsystem device-type. This enables lockdep to detect deadlocks and recursive locking within the device-driver core and the subsystem. Link: https://lore.kernel.org/r/164982968798.684294.15817853329823976469.st...@dwillia2-desk3.amr.corp.intel.com [1] Link: https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [2] Suggested-by: Peter Zijlstra Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/nvdimm/btt_devs.c |7 +-- drivers/nvdimm/bus.c| 14 +++--- drivers/nvdimm/dax_devs.c |4 ++-- drivers/nvdimm/dimm_devs.c |4 drivers/nvdimm/namespace_devs.c | 10 +- drivers/nvdimm/nd-core.h|2 +- drivers/nvdimm/pfn_devs.c |7 +-- drivers/nvdimm/region_devs.c|4 8 files changed, 37 insertions(+), 15 deletions(-) diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c index e5a58520d398..120821796668 100644 --- a/drivers/nvdimm/btt_devs.c +++ b/drivers/nvdimm/btt_devs.c @@ -178,6 +178,8 @@ bool is_nd_btt(struct device *dev) } EXPORT_SYMBOL(is_nd_btt); +static struct lock_class_key nvdimm_btt_key; + static struct device *__nd_btt_create(struct nd_region *nd_region, unsigned long lbasize, uuid_t *uuid, struct nd_namespace_common *ndns) @@ -205,6 +207,7 @@ static struct device *__nd_btt_create(struct nd_region *nd_region, dev->parent = &nd_region->dev; dev->type = &nd_btt_device_type; device_initialize(&nd_btt->dev); + lockdep_set_class(&nd_btt->dev.mutex, &nvdimm_btt_key); if (ndns && !__nd_attach_ndns(&nd_btt->dev, ndns, &nd_btt->ndns)) { dev_dbg(&ndns->dev, "failed, already claimed by %s\n", dev_name(ndns->claim)); @@ -225,7 +228,7 @@ struct device *nd_btt_create(struct nd_region *nd_region) { struct device *dev = __nd_btt_create(nd_region, 0, NULL, NULL); - __nd_device_register(dev); + nd_device_register(dev); return dev; } @@ -324,7 +327,7 @@ static int __nd_btt_probe(struct nd_btt *nd_btt, if (!nd_btt->uuid) return -ENOMEM; - __nd_device_register(&nd_btt->dev); + nd_device_register(&nd_btt->dev); return 0; } diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 7b0d1443217a..85ffa04e93c2 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -334,6 +334,8 @@ struct nvdimm_bus *nvdimm_to_bus(struct nvdimm *nvdimm) } EXPORT_SYMBOL_GPL(nvdimm_to_bus); +static struct lock_class_key nvdimm_bus_key; + struct nvdimm_bus *nvdimm_bus_register(struct device *parent, struct nvdimm_bus_descriptor *nd_desc) { @@ -360,6 +362,7 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent, nvdimm_bus->dev.bus = &nvdimm_bus_type; nvdimm_bus->dev.of_node = nd_desc->of_node; device_initialize(&nvdimm_bus->dev); + lockdep_set_class(&nvdimm_bus->dev.mutex, &nvdimm_bus_key); device_set_pm_not_required(&nvdimm_bus->dev); rc = dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id); if (rc) @@ -511,7 +514,7 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie) put_device(dev); } -void __nd_device_register(struct device *dev) +void nd_device_register(struct device *dev) { if (!dev) return; @@ -537,12 +540,6 @@ void __nd_device_register(struct device *dev) async_schedule_dev_domain(nd_async_device_register, dev, &nd_async_domain); } - -void nd_device_register(struct device *dev) -{ - device_initialize(dev); - __nd_device_register(dev); -} EXPORT_SYMBOL(nd_device_register); void nd_device_unregister(struct device *dev, enum nd_async_mode mode) @@ -724,6 +721,8 @@ static void ndctl_release(struct device *dev) kfree(dev); } +static struct lock_class_key nvdimm_ndctl_key; + int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus) { dev_t devt = MKDEV(nvdimm_bus_major, nvdimm_bus->id); @@ -734,6 +733,7 @@ int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus) if (!dev) return -ENOMEM; device_initialize(dev); + lockdep_set_class(&dev->mutex, &nvdimm_ndctl_key); device_set_pm_not_required(dev); dev->class = nd_class; dev->parent = &nvdimm_bus->dev; diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c index 99965077bac4..7f4a9d28b670 100644 --- a/drivers/nvdimm/dax_devs.c +++ b/drivers/n
[PATCH v3 6/8] nvdimm: Drop nd_device_lock()
Now that all NVDIMM subsystem locking is validated with custom lock classes, there is no need for the custom usage of the lockdep_mutex. Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/nvdimm/btt_devs.c | 16 + drivers/nvdimm/bus.c| 24 +- drivers/nvdimm/core.c | 10 +++--- drivers/nvdimm/dimm_devs.c |8 ++--- drivers/nvdimm/namespace_devs.c | 36 +++-- drivers/nvdimm/nd-core.h| 66 --- drivers/nvdimm/pfn_devs.c | 24 +++--- drivers/nvdimm/pmem.c |2 + drivers/nvdimm/region.c |2 + drivers/nvdimm/region_devs.c| 16 + lib/Kconfig.debug | 17 -- 11 files changed, 66 insertions(+), 155 deletions(-) diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c index 120821796668..fabbb31f2c35 100644 --- a/drivers/nvdimm/btt_devs.c +++ b/drivers/nvdimm/btt_devs.c @@ -50,14 +50,14 @@ static ssize_t sector_size_store(struct device *dev, struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; - nd_device_lock(dev); + device_lock(dev); nvdimm_bus_lock(dev); rc = nd_size_select_store(dev, buf, &nd_btt->lbasize, btt_lbasize_supported); dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf, buf[len - 1] == '\n' ? "" : "\n"); nvdimm_bus_unlock(dev); - nd_device_unlock(dev); + device_unlock(dev); return rc ? rc : len; } @@ -79,11 +79,11 @@ static ssize_t uuid_store(struct device *dev, struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; - nd_device_lock(dev); + device_lock(dev); rc = nd_uuid_store(dev, &nd_btt->uuid, buf, len); dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf, buf[len - 1] == '\n' ? "" : "\n"); - nd_device_unlock(dev); + device_unlock(dev); return rc ? rc : len; } @@ -108,13 +108,13 @@ static ssize_t namespace_store(struct device *dev, struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; - nd_device_lock(dev); + device_lock(dev); nvdimm_bus_lock(dev); rc = nd_namespace_store(dev, &nd_btt->ndns, buf, len); dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf, buf[len - 1] == '\n' ? "" : "\n"); nvdimm_bus_unlock(dev); - nd_device_unlock(dev); + device_unlock(dev); return rc; } @@ -126,14 +126,14 @@ static ssize_t size_show(struct device *dev, struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; - nd_device_lock(dev); + device_lock(dev); if (dev->driver) rc = sprintf(buf, "%llu\n", nd_btt->size); else { /* no size to convey if the btt instance is disabled */ rc = -ENXIO; } - nd_device_unlock(dev); + device_unlock(dev); return rc; } diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 85ffa04e93c2..a4fc17db707c 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -88,10 +88,7 @@ static int nvdimm_bus_probe(struct device *dev) dev->driver->name, dev_name(dev)); nvdimm_bus_probe_start(nvdimm_bus); - debug_nvdimm_lock(dev); rc = nd_drv->probe(dev); - debug_nvdimm_unlock(dev); - if ((rc == 0 || rc == -EOPNOTSUPP) && dev->parent && is_nd_region(dev->parent)) nd_region_advance_seeds(to_nd_region(dev->parent), dev); @@ -111,11 +108,8 @@ static void nvdimm_bus_remove(struct device *dev) struct module *provider = to_bus_provider(dev); struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev); - if (nd_drv->remove) { - debug_nvdimm_lock(dev); + if (nd_drv->remove) nd_drv->remove(dev); - debug_nvdimm_unlock(dev); - } dev_dbg(&nvdimm_bus->dev, "%s.remove(%s)\n", dev->driver->name, dev_name(dev)); @@ -139,7 +133,7 @@ static void nvdimm_bus_shutdown(struct device *dev) void nd_device_notify(struct device *dev, enum nvdimm_event event) { - nd_device_lock(dev); + device_lock(dev); if (dev->driver) { struct nd_device_driver *nd_drv; @@ -147,7 +141,7 @@ void nd_device_notify(struct device *dev, enum nvdimm_event event) if (nd_drv->notify) nd_drv->notify(dev, event); } - nd_device_unlock(dev); + device_unlock(dev); } EXPORT_SYMBOL(nd_device_notify); @@ -569,9 +563,9 @@ void nd_device_unregister(struct device *dev, enum nd_async_mode mode) * or otherwise let the async path handle it if the * unregistration was already queued. */ - nd_device_l
[PATCH v3 7/8] device-core: Kill the lockdep_mutex
Per Peter [1], the lockdep API has native support for all the use cases lockdep_mutex was attempting to enable. Now that all lockdep_mutex users have been converted to those APIs, drop this lock. Link: https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [1] Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Suggested-by: Peter Zijlstra Signed-off-by: Dan Williams --- drivers/base/core.c|3 --- include/linux/device.h |5 - 2 files changed, 8 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 3d6430eb0c6a..2eede2ec3d64 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2864,9 +2864,6 @@ void device_initialize(struct device *dev) kobject_init(&dev->kobj, &device_ktype); INIT_LIST_HEAD(&dev->dma_pools); mutex_init(&dev->mutex); -#ifdef CONFIG_PROVE_LOCKING - mutex_init(&dev->lockdep_mutex); -#endif lockdep_set_novalidate_class(&dev->mutex); spin_lock_init(&dev->devres_lock); INIT_LIST_HEAD(&dev->devres_head); diff --git a/include/linux/device.h b/include/linux/device.h index 82c9d307e7bd..c00ab223da50 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -400,8 +400,6 @@ struct dev_msi_info { * This identifies the device type and carries type-specific * information. * @mutex: Mutex to synchronize calls to its driver. - * @lockdep_mutex: An optional debug lock that a subsystem can use as a - * peer lock to gain localized lockdep coverage of the device_lock. * @bus: Type of bus device is on. * @driver:Which driver has allocated this * @platform_data: Platform data specific to the device. @@ -499,9 +497,6 @@ struct device { core doesn't touch it */ void*driver_data; /* Driver data, set and get with dev_set_drvdata/dev_get_drvdata */ -#ifdef CONFIG_PROVE_LOCKING - struct mutexlockdep_mutex; -#endif struct mutexmutex; /* mutex to synchronize calls to * its driver. */
[PATCH v3 1/8] cxl: Replace lockdep_mutex with local lock classes
In response to an attempt to expand dev->lockdep_mutex for device_lock() validation [1], Peter points out [2] that the lockdep API already has the ability to assign a dedicated lock class per subsystem device-type. Use lockdep_set_class() to override the default device_lock() '__lockdep_no_validate__' class for each CXL subsystem device-type. This enables lockdep to detect deadlocks and recursive locking within the device-driver core and the subsystem. The lockdep_set_class_and_subclass() API is used for port objects that recursively lock the 'cxl_port_key' class by hierarchical topology depth. Link: https://lore.kernel.org/r/164982968798.684294.15817853329823976469.st...@dwillia2-desk3.amr.corp.intel.com [1] Link: https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [2] Suggested-by: Peter Zijlstra Cc: Ingo Molnar Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Cc: Ben Widawsky Cc: Jonathan Cameron Signed-off-by: Dan Williams --- drivers/cxl/core/memdev.c |3 +++ drivers/cxl/core/pmem.c |6 ++ drivers/cxl/core/port.c | 13 + 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 1f76b28f9826..f7cdcd33504a 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -228,6 +228,8 @@ static void detach_memdev(struct work_struct *work) put_device(&cxlmd->dev); } +static struct lock_class_key cxl_memdev_key; + static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, const struct file_operations *fops) { @@ -247,6 +249,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, dev = &cxlmd->dev; device_initialize(dev); + lockdep_set_class(&dev->mutex, &cxl_memdev_key); dev->parent = cxlds->dev; dev->bus = &cxl_bus_type; dev->devt = MKDEV(cxl_mem_major, cxlmd->id); diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index 8de240c4d96b..e825e261278d 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -80,6 +80,8 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd) } EXPORT_SYMBOL_NS_GPL(cxl_find_nvdimm_bridge, CXL); +static struct lock_class_key cxl_nvdimm_bridge_key; + static struct cxl_nvdimm_bridge *cxl_nvdimm_bridge_alloc(struct cxl_port *port) { struct cxl_nvdimm_bridge *cxl_nvb; @@ -99,6 +101,7 @@ static struct cxl_nvdimm_bridge *cxl_nvdimm_bridge_alloc(struct cxl_port *port) cxl_nvb->port = port; cxl_nvb->state = CXL_NVB_NEW; device_initialize(dev); + lockdep_set_class(&dev->mutex, &cxl_nvdimm_bridge_key); device_set_pm_not_required(dev); dev->parent = &port->dev; dev->bus = &cxl_bus_type; @@ -214,6 +217,8 @@ struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev) } EXPORT_SYMBOL_NS_GPL(to_cxl_nvdimm, CXL); +static struct lock_class_key cxl_nvdimm_key; + static struct cxl_nvdimm *cxl_nvdimm_alloc(struct cxl_memdev *cxlmd) { struct cxl_nvdimm *cxl_nvd; @@ -226,6 +231,7 @@ static struct cxl_nvdimm *cxl_nvdimm_alloc(struct cxl_memdev *cxlmd) dev = &cxl_nvd->dev; cxl_nvd->cxlmd = cxlmd; device_initialize(dev); + lockdep_set_class(&dev->mutex, &cxl_nvdimm_key); device_set_pm_not_required(dev); dev->parent = &cxlmd->dev; dev->bus = &cxl_bus_type; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 2ab1ba4499b3..750aac95ed5f 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -391,6 +391,8 @@ static int devm_cxl_link_uport(struct device *host, struct cxl_port *port) return devm_add_action_or_reset(host, cxl_unlink_uport, port); } +static struct lock_class_key cxl_port_key; + static struct cxl_port *cxl_port_alloc(struct device *uport, resource_size_t component_reg_phys, struct cxl_port *parent_port) @@ -415,9 +417,10 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, * description. */ dev = &port->dev; - if (parent_port) + if (parent_port) { dev->parent = &parent_port->dev; - else + port->depth = parent_port->depth + 1; + } else dev->parent = uport; port->uport = uport; @@ -427,6 +430,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, INIT_LIST_HEAD(&port->endpoints); device_initialize(dev); + lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth); device_set_pm_not_required(dev); dev->bus = &cxl_bus_type; dev->type = &cxl_port_type; @@ -457,8 +461,6 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport, if (IS_ERR(port))
[PATCH v3 0/8] device-core: Enable device_lock() lockdep validation
Changes since v2 [1] - Use lockdep_set_class(), lockdep_set_class_and_subclass(), and lock_set_class() instead of a 'lockdep_mutex' in 'struct device'. (Peter and Waiman) - Include a fix identifed by this new infrastructure [1]: https://lore.kernel.org/r/164982968798.684294.15817853329823976469.st...@dwillia2-desk3.amr.corp.intel.com The device_lock() uses lockdep_set_novalidate_class() because it is taken in too many contexts that cannot be described by a single mutex lock class. The lack of lockdep coverage leads to deadlock scenarios landing upstream. To mitigate that problem the lockdep_mutex was added [2]. The lockdep_mutex, however, is an unscalable hack that overlooks advancements in the lockdep API to change a given lock's lock class [3]. With lockdep_set_class() a device subsystem can initialize a dedicated lock class per device type at device creation time, with lock_set_class() a device-driver can temporarily override a lockdep class after-the-fact. Use lockdep class assignment APIs to replace the usage of lockdep_mutex in the CXL and NVDIMM subsystems, and delete lockdep_mutex. [2]: commit 87a30e1f05d7 ("driver-core, libnvdimm: Let device subsystems add local lockdep coverage") [3]: https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net --- Dan Williams (8): cxl: Replace lockdep_mutex with local lock classes cxl/acpi: Add root device lockdep validation cxl: Drop cxl_device_lock() nvdimm: Replace lockdep_mutex with local lock classes ACPI: NFIT: Drop nfit_device_lock() nvdimm: Drop nd_device_lock() device-core: Kill the lockdep_mutex nvdimm: Fix firmware activation deadlock scenarios drivers/acpi/nfit/core.c| 30 --- drivers/acpi/nfit/nfit.h| 24 drivers/base/core.c |3 -- drivers/cxl/acpi.c | 15 drivers/cxl/core/memdev.c |3 ++ drivers/cxl/core/pmem.c | 10 - drivers/cxl/core/port.c | 68 -- drivers/cxl/cxl.h | 78 --- drivers/cxl/mem.c |4 +- drivers/cxl/pmem.c | 12 +++--- drivers/nvdimm/btt_devs.c | 23 +++- drivers/nvdimm/bus.c| 38 --- drivers/nvdimm/core.c | 14 +++ drivers/nvdimm/dax_devs.c |4 +- drivers/nvdimm/dimm_devs.c | 12 -- drivers/nvdimm/namespace_devs.c | 46 ++- drivers/nvdimm/nd-core.h| 68 +- drivers/nvdimm/pfn_devs.c | 31 +--- drivers/nvdimm/pmem.c |2 + drivers/nvdimm/region.c |2 + drivers/nvdimm/region_devs.c| 20 ++ include/linux/device.h | 30 +-- lib/Kconfig.debug | 23 23 files changed, 209 insertions(+), 351 deletions(-) base-commit: ce522ba9ef7e2d9fb22a39eb3371c0c64e2a433e
[PATCH v3 8/8] nvdimm: Fix firmware activation deadlock scenarios
Lockdep reports the following deadlock scenarios for CXL root device power-management, device_prepare(), operations, and device_shutdown() operations for 'nd_region' devices: --- Chain exists of: &nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex Possible unsafe locking scenario: CPU0CPU1 lock(system_transition_mutex); lock(&nvdimm_bus->reconfig_mutex); lock(system_transition_mutex); lock(&nvdimm_region_key); -- Chain exists of: &cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key Possible unsafe locking scenario: CPU0CPU1 lock(&cxl_root_key); lock(acpi_scan_lock); lock(&cxl_root_key); lock(&cxl_nvdimm_bridge_key); --- These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec() which walks the entire system device topology taking device_lock() along the way. The nvdimm_bus_lock() is protecting against unregistration, multiple simultaneous ops callers, and preventing activate_show() from racing activate_store(). For the first 2, the lock is redundant. Unregistration already flushes all ops users, and sysfs already prevents multiple threads to be active in an ops handler at the same time. For the last userspace should already be waiting for its last activate_store() to complete, and does not need activate_show() to flush the write side, so this lock usage can be deleted in these attributes. Fixes: 48001ea50d17 ("PM, libnvdimm: Add runtime firmware activation support") Signed-off-by: Dan Williams --- drivers/nvdimm/core.c |4 1 file changed, 4 deletions(-) diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c index 144926b7451c..7c7f4a43fd4f 100644 --- a/drivers/nvdimm/core.c +++ b/drivers/nvdimm/core.c @@ -395,10 +395,8 @@ static ssize_t activate_show(struct device *dev, if (!nd_desc->fw_ops) return -EOPNOTSUPP; - nvdimm_bus_lock(dev); cap = nd_desc->fw_ops->capability(nd_desc); state = nd_desc->fw_ops->activate_state(nd_desc); - nvdimm_bus_unlock(dev); if (cap < NVDIMM_FWA_CAP_QUIESCE) return -EOPNOTSUPP; @@ -443,7 +441,6 @@ static ssize_t activate_store(struct device *dev, else return -EINVAL; - nvdimm_bus_lock(dev); state = nd_desc->fw_ops->activate_state(nd_desc); switch (state) { @@ -461,7 +458,6 @@ static ssize_t activate_store(struct device *dev, default: rc = -ENXIO; } - nvdimm_bus_unlock(dev); if (rc == 0) rc = len;
[PATCH v3 5/8] ACPI: NFIT: Drop nfit_device_lock()
The nfit_device_lock() helper was added to provide lockdep coverage for the NFIT driver's usage of device_lock() on the nvdimm_bus object. Now that nvdimm_bus objects have their own lock class this wrapper can be dropped. Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/acpi/nfit/core.c | 30 +++--- drivers/acpi/nfit/nfit.h | 24 2 files changed, 15 insertions(+), 39 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index fe61f617a943..ae5f4acf2675 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1230,7 +1230,7 @@ static ssize_t hw_error_scrub_store(struct device *dev, if (rc) return rc; - nfit_device_lock(dev); + device_lock(dev); nd_desc = dev_get_drvdata(dev); if (nd_desc) { struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc); @@ -1247,7 +1247,7 @@ static ssize_t hw_error_scrub_store(struct device *dev, break; } } - nfit_device_unlock(dev); + device_unlock(dev); if (rc) return rc; return size; @@ -1267,10 +1267,10 @@ static ssize_t scrub_show(struct device *dev, ssize_t rc = -ENXIO; bool busy; - nfit_device_lock(dev); + device_lock(dev); nd_desc = dev_get_drvdata(dev); if (!nd_desc) { - nfit_device_unlock(dev); + device_unlock(dev); return rc; } acpi_desc = to_acpi_desc(nd_desc); @@ -1287,7 +1287,7 @@ static ssize_t scrub_show(struct device *dev, } mutex_unlock(&acpi_desc->init_mutex); - nfit_device_unlock(dev); + device_unlock(dev); return rc; } @@ -1304,14 +1304,14 @@ static ssize_t scrub_store(struct device *dev, if (val != 1) return -EINVAL; - nfit_device_lock(dev); + device_lock(dev); nd_desc = dev_get_drvdata(dev); if (nd_desc) { struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc); rc = acpi_nfit_ars_rescan(acpi_desc, ARS_REQ_LONG); } - nfit_device_unlock(dev); + device_unlock(dev); if (rc) return rc; return size; @@ -1697,9 +1697,9 @@ static void acpi_nvdimm_notify(acpi_handle handle, u32 event, void *data) struct acpi_device *adev = data; struct device *dev = &adev->dev; - nfit_device_lock(dev->parent); + device_lock(dev->parent); __acpi_nvdimm_notify(dev, event); - nfit_device_unlock(dev->parent); + device_unlock(dev->parent); } static bool acpi_nvdimm_has_method(struct acpi_device *adev, char *method) @@ -3152,8 +3152,8 @@ static int acpi_nfit_flush_probe(struct nvdimm_bus_descriptor *nd_desc) struct device *dev = acpi_desc->dev; /* Bounce the device lock to flush acpi_nfit_add / acpi_nfit_notify */ - nfit_device_lock(dev); - nfit_device_unlock(dev); + device_lock(dev); + device_unlock(dev); /* Bounce the init_mutex to complete initial registration */ mutex_lock(&acpi_desc->init_mutex); @@ -3305,8 +3305,8 @@ void acpi_nfit_shutdown(void *data) * acpi_nfit_ars_rescan() submissions have had a chance to * either submit or see ->cancel set. */ - nfit_device_lock(bus_dev); - nfit_device_unlock(bus_dev); + device_lock(bus_dev); + device_unlock(bus_dev); flush_workqueue(nfit_wq); } @@ -3449,9 +3449,9 @@ EXPORT_SYMBOL_GPL(__acpi_nfit_notify); static void acpi_nfit_notify(struct acpi_device *adev, u32 event) { - nfit_device_lock(&adev->dev); + device_lock(&adev->dev); __acpi_nfit_notify(&adev->dev, adev->handle, event); - nfit_device_unlock(&adev->dev); + device_unlock(&adev->dev); } static const struct acpi_device_id acpi_nfit_ids[] = { diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h index 50882bdbeb96..6023ad61831a 100644 --- a/drivers/acpi/nfit/nfit.h +++ b/drivers/acpi/nfit/nfit.h @@ -337,30 +337,6 @@ static inline struct acpi_nfit_desc *to_acpi_desc( return container_of(nd_desc, struct acpi_nfit_desc, nd_desc); } -#ifdef CONFIG_PROVE_LOCKING -static inline void nfit_device_lock(struct device *dev) -{ - device_lock(dev); - mutex_lock(&dev->lockdep_mutex); -} - -static inline void nfit_device_unlock(struct device *dev) -{ - mutex_unlock(&dev->lockdep_mutex); - device_unlock(dev); -} -#else -static inline void nfit_device_lock(struct device *dev) -{ - device_lock(dev); -} - -static inline void nfit_device_unlock(struct device *dev) -{ - device_unlock(dev); -} -#endif - const guid_t *to_nfit_uuid(enum nfit_uuids id); int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *nfit, acpi_size sz); void acpi_nfit_shutdown(void *
[PATCH] libnvdimm/security: change __nvdimm_security_overwrite_query from global to static
Smatch reports this issue security.c:416:6: warning: symbol '__nvdimm_security_overwrite_query' was not declared. Should it be static? __nvdimm_security_overwrite_query is only used in security.c so change its storage-class specifier to static Signed-off-by: Tom Rix --- drivers/nvdimm/security.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index 4b80150e4afa..d3e782662bf4 100644 --- a/drivers/nvdimm/security.c +++ b/drivers/nvdimm/security.c @@ -413,7 +413,7 @@ static int security_overwrite(struct nvdimm *nvdimm, unsigned int keyid) return rc; } -void __nvdimm_security_overwrite_query(struct nvdimm *nvdimm) +static void __nvdimm_security_overwrite_query(struct nvdimm *nvdimm) { struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(&nvdimm->dev); int rc; -- 2.27.0
Re: [PATCH v3 7/8] device-core: Kill the lockdep_mutex
On Thu, Apr 21, 2022 at 08:33:45AM -0700, Dan Williams wrote: > Per Peter [1], the lockdep API has native support for all the use cases > lockdep_mutex was attempting to enable. Now that all lockdep_mutex users > have been converted to those APIs, drop this lock. > > Link: > https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [1] > Cc: Greg Kroah-Hartman > Cc: "Rafael J. Wysocki" > Suggested-by: Peter Zijlstra > Signed-off-by: Dan Williams YES! Acked-by: Greg Kroah-Hartman Nice work.
Re: [PATCH v3 2/8] cxl/acpi: Add root device lockdep validation
On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote: > The CXL "root" device, ACPI0017, is an attach point for coordinating > platform level CXL resources and is the parent device for a CXL port > topology tree. As such it has distinct locking rules relative to other > CXL subsystem objects, but because it is an ACPI device the lock class > is established well before it is given to the cxl_acpi driver. > > However, the lockdep API does support changing the lock class "live" for > situations like this. Add a device_lock_set_class() helper that a driver > can use in ->probe() to set a custom lock class, and > device_lock_reset_class() to return to the default "no validate" class > before the custom lock class key goes out of scope after ->remove(). > > Note the helpers are all macros to support dead code elimination in the > CONFIG_PROVE_LOCKING=n case. > > Suggested-by: Peter Zijlstra > Cc: Greg Kroah-Hartman > Cc: "Rafael J. Wysocki" > Cc: Ingo Molnar > Cc: Will Deacon > Cc: Waiman Long > Cc: Boqun Feng > Cc: Alison Schofield > Cc: Vishal Verma > Cc: Ira Weiny > Cc: Ben Widawsky > Cc: Jonathan Cameron > Signed-off-by: Dan Williams > --- > drivers/cxl/acpi.c | 15 +++ > include/linux/device.h | 25 + > 2 files changed, 40 insertions(+) Much simpler, great work. Reviewed-by: Greg Kroah-Hartman