(2014/12/03 5:46), K. Y. Srinivasan wrote:
> Andy Whitcroft <a...@canonical.com> initially saw this deadlock. We have
> seen this as well. Here is the original description of the problem (and a
> potential solution) from Andy:
> 
> https://lkml.org/lkml/2014/3/14/451
> 
> Here is an excerpt from that mail:
> 
> "We are seeing machines lockup with what appears to be an ABBA deadlock in
> the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
> The hv_balloon driver is adding memory using add_memory() which takes the
> hotplug lock, and then emits a udev event, and then attempts to lock the
> sysfs device.  In response to the udev event udev opens the sysfs device
> and locks it, then attempts to grab the hotplug lock to online the memory.
> This seems to be inverted nesting in the two cases, leading to the hangs 
> below:
> 
> [  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
> [  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 
> seconds.
> 
> I note that the device hotplug locking allows complete retries (via
> ERESTARTSYS) and if we could detect this at the online stage it
> could be used to get us out.  But before I go down this road I wanted
> to make sure I am reading this right.  Or indeed if the hv_balloon driver
> is just doing this wrong."
> 
> This patch is based on Andy's analysis and suggestion.

How about use lock_device_hotplug() before calling add_memory() in 
hv_mem_hot_add()?
Commit 0f1cfe9d0d06 (mm/hotplug: remove stop_machine() from try_offline_node()) 
said:

  ---
    lock_device_hotplug() serializes hotplug & online/offline operations.  The
    lock is held in common sysfs online/offline interfaces and ACPI hotplug
    code paths.

    And here are the code paths:

    - CPU & Mem online/offline via sysfs online
        store_online()->lock_device_hotplug()

    - Mem online via sysfs state:
        store_mem_state()->lock_device_hotplug()

    - ACPI CPU & Mem hot-add:
        acpi_scan_bus_device_check()->lock_device_hotplug()

    - ACPI CPU & Mem hot-delete:
        acpi_scan_hot_remove()->lock_device_hotplug()
  ---

CPU & Memory online/offline/hotplug are serialized by lock_device_hotplug().
So using lock_device_hotplug() solves the ABBA issue.

Thanks,
Yasuaki Ishimatsu

> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---
>   mm/memory_hotplug.c |   24 +++++++++++++++++-------
>   1 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9fab107..e195269 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -104,19 +104,27 @@ void put_online_mems(void)
>   
>   }
>   
> -static void mem_hotplug_begin(void)
> +static int mem_hotplug_begin(bool trylock)
>   {
>       mem_hotplug.active_writer = current;
>   
>       memhp_lock_acquire();
>       for (;;) {
> -             mutex_lock(&mem_hotplug.lock);
> +             if (trylock) {
> +                     if (!mutex_trylock(&mem_hotplug.lock)) {
> +                             mem_hotplug.active_writer = NULL;
> +                             return -ERESTARTSYS;
> +                     }
> +             } else {
> +                     mutex_lock(&mem_hotplug.lock);
> +             }
>               if (likely(!mem_hotplug.refcount))
>                       break;
>               __set_current_state(TASK_UNINTERRUPTIBLE);
>               mutex_unlock(&mem_hotplug.lock);
>               schedule();
>       }
> +     return 0;
>   }
>   
>   static void mem_hotplug_done(void)
> @@ -969,7 +977,9 @@ int __ref online_pages(unsigned long pfn, unsigned long 
> nr_pages, int online_typ
>       int ret;
>       struct memory_notify arg;
>   
> -     mem_hotplug_begin();
> +     ret = mem_hotplug_begin(true);
> +     if (ret)
> +             return ret;
>       /*
>        * This doesn't need a lock to do pfn_to_page().
>        * The section can't be removed here because of the
> @@ -1146,7 +1156,7 @@ int try_online_node(int nid)
>       if (node_online(nid))
>               return 0;
>   
> -     mem_hotplug_begin();
> +     mem_hotplug_begin(false);
>       pgdat = hotadd_new_pgdat(nid, 0);
>       if (!pgdat) {
>               pr_err("Cannot online node %d due to NULL pgdat\n", nid);
> @@ -1236,7 +1246,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
>               new_pgdat = !p;
>       }
>   
> -     mem_hotplug_begin();
> +     mem_hotplug_begin(false);
>   
>       new_node = !node_online(nid);
>       if (new_node) {
> @@ -1684,7 +1694,7 @@ static int __ref __offline_pages(unsigned long 
> start_pfn,
>       if (!test_pages_in_a_zone(start_pfn, end_pfn))
>               return -EINVAL;
>   
> -     mem_hotplug_begin();
> +     mem_hotplug_begin(false);
>   
>       zone = page_zone(pfn_to_page(start_pfn));
>       node = zone_to_nid(zone);
> @@ -2002,7 +2012,7 @@ void __ref remove_memory(int nid, u64 start, u64 size)
>   
>       BUG_ON(check_hotplug_memory_range(start, size));
>   
> -     mem_hotplug_begin();
> +     mem_hotplug_begin(false);
>   
>       /*
>        * All memory blocks must be offlined before removing memory.  Check
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to