Currently, all newly added memory blocks remain in 'offline' state unless
someone onlines them, some linux distributions carry special udev rules
like:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

to make this happen automatically. This is not a great solution for virtual
machines where memory hotplug is being used to address high memory pressure
situations as such onlining is slow and a userspace process doing this
(udev) has a chance of being killed by the OOM killer as it will probably
require to allocate some memory.

Introduce default policy for the newly added memory blocks in
/sys/devices/system/memory/hotplug_autoonline file with two possible
values: "offline" which preserves the current behavior and "online" which
causes all newly added memory blocks to go online as soon as they're added.
The default is "online" when MEMORY_HOTPLUG_AUTOONLINE kernel config option
is selected.

Cc: Jonathan Corbet <cor...@lwn.net>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Daniel Kiper <daniel.ki...@oracle.com>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Tang Chen <tangc...@cn.fujitsu.com>
Cc: David Vrabel <david.vra...@citrix.com>
Cc: David Rientjes <rient...@google.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Naoya Horiguchi <n-horigu...@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxi...@huawei.com>
Cc: Mel Gorman <mgor...@techsingularity.net>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: Kay Sievers <k...@vrfy.org>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
- Changes since 'v1':
  Add 'online' parameter to add_memory_resource() as it is being used by
  xen ballon driver and it adds "empty" memory pages [David Vrabel].
  (I don't completely understand what prevents manual onlining in this
   case as we still have all newly added blocks in sysfs ... this is the
   discussion point.)

- Changes since 'RFC':
  It seems nobody is strongly opposed to the idea, thus non-RFC.
  Change memhp_autoonline to bool, we support only MMOP_ONLINE_KEEP
  and MMOP_OFFLINE for the auto-onlining policy, eliminate 'unknown'
  from show_memhp_autoonline(). [Daniel Kiper]
  Put everything under CONFIG_MEMORY_HOTPLUG_AUTOONLINE, enable the
  feature by default (when the config option is selected) and add
  kernel parameter (nomemhp_autoonline) to disable the functionality
  upon boot when needed.

- RFC:
  I was able to find previous attempts to fix the issue, e.g.:
  http://marc.info/?l=linux-kernel&m=137425951924598&w=2
  http://marc.info/?l=linux-acpi&m=127186488905382
  but I'm not completely sure why it didn't work out and the solution
  I suggest is not 'smart enough', thus 'RFC'.
---
 Documentation/kernel-parameters.txt |  2 ++
 Documentation/memory-hotplug.txt    | 26 ++++++++++++++++++++------
 drivers/base/memory.c               | 36 ++++++++++++++++++++++++++++++++++++
 drivers/xen/balloon.c               |  2 +-
 include/linux/memory_hotplug.h      |  6 +++++-
 mm/Kconfig                          |  9 +++++++++
 mm/memory_hotplug.c                 | 25 +++++++++++++++++++++++--
 7 files changed, 96 insertions(+), 10 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 742f69d..652efe1 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2537,6 +2537,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
                        shutdown the other cpus.  Instead use the REBOOT_VECTOR
                        irq.
 
+       nomemhp_autoonline      Don't automatically online newly added memory.
+
        nomodule        Disable module load
 
        nopat           [X86] Disable PAT (page attribute table extension of
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index ce2cfcf..041efac 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -111,8 +111,9 @@ To use memory hotplug feature, kernel must be compiled with 
following
 config options.
 
 - For all memory hotplug
-    Memory model -> Sparse Memory  (CONFIG_SPARSEMEM)
-    Allow for memory hot-add       (CONFIG_MEMORY_HOTPLUG)
+    Memory model -> Sparse Memory         (CONFIG_SPARSEMEM)
+    Allow for memory hot-add              (CONFIG_MEMORY_HOTPLUG)
+    Automatically online hot-added memory (CONFIG_MEMORY_HOTPLUG_AUTOONLINE)
 
 - To enable memory removal, the followings are also necessary
     Allow for memory hot remove    (CONFIG_MEMORY_HOTREMOVE)
@@ -254,12 +255,25 @@ If the memory block is online, you'll read "online".
 If the memory block is offline, you'll read "offline".
 
 
-5.2. How to online memory
+5.2. Memory onlining
 ------------
-Even if the memory is hot-added, it is not at ready-to-use state.
-For using newly added memory, you have to "online" the memory block.
+When the memory is hot-added, the kernel decides whether or not to "online"
+it according to the policy which can be read from "hotplug_autoonline" file
+(requires CONFIG_MEMORY_HOTPLUG_AUTOONLINE):
 
-For onlining, you have to write "online" to the memory block's state file as:
+% cat /sys/devices/system/memory/hotplug_autoonline
+
+The default is "online" which means the newly added memory will be onlined
+after adding. Automatic onlining can be disabled by writing "offline" to the
+"hotplug_autoonline" file:
+
+% echo offline > /sys/devices/system/memory/hotplug_autoonline
+
+or by booting the kernel with "nomemhp_autoonline" parameter.
+
+If the automatic onlining wasn't requested or some memory block was offlined
+it is possible to change the individual block's state by writing to the "state"
+file:
 
 % echo online > /sys/devices/system/memory/memoryXXX/state
 
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 25425d3..6f9ce3a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -438,6 +438,39 @@ print_block_size(struct device *dev, struct 
device_attribute *attr,
 
 static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL);
 
+#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE
+/*
+ * Memory auto online policy.
+ */
+
+static ssize_t
+show_memhp_autoonline(struct device *dev, struct device_attribute *attr,
+                     char *buf)
+{
+       if (memhp_autoonline)
+               return sprintf(buf, "online\n");
+       else
+               return sprintf(buf, "offline\n");
+}
+
+static ssize_t
+store_memhp_autoonline(struct device *dev, struct device_attribute *attr,
+                      const char *buf, size_t count)
+{
+       if (sysfs_streq(buf, "online"))
+               memhp_autoonline = true;
+       else if (sysfs_streq(buf, "offline"))
+               memhp_autoonline = false;
+       else
+               return -EINVAL;
+
+       return count;
+}
+
+static DEVICE_ATTR(hotplug_autoonline, 0644, show_memhp_autoonline,
+                  store_memhp_autoonline);
+#endif
+
 /*
  * Some architectures will have custom drivers to do this, and
  * will not need to do it from userspace.  The fake hot-add code
@@ -737,6 +770,9 @@ static struct attribute *memory_root_attrs[] = {
 #endif
 
        &dev_attr_block_size_bytes.attr,
+#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE
+       &dev_attr_hotplug_autoonline.attr,
+#endif
        NULL
 };
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 12eab50..890c3b5 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -338,7 +338,7 @@ static enum bp_state reserve_additional_memory(void)
        }
 #endif
 
-       rc = add_memory_resource(nid, resource);
+       rc = add_memory_resource(nid, resource, false);
        if (rc) {
                pr_warn("Cannot add additional memory (%i)\n", rc);
                goto err;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2ea574f..367e7d2 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -99,6 +99,10 @@ extern void __online_page_free(struct page *page);
 
 extern int try_online_node(int nid);
 
+#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE
+extern bool memhp_autoonline;
+#endif
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern bool is_pageblock_removable_nolock(struct page *page);
 extern int arch_remove_memory(u64 start, u64 size);
@@ -267,7 +271,7 @@ static inline void remove_memory(int nid, u64 start, u64 
size) {}
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
                void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int add_memory_resource(int nid, struct resource *resource, bool 
online);
 extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default,
                bool for_device);
 extern int arch_add_memory(int nid, u64 start, u64 size, bool for_device);
diff --git a/mm/Kconfig b/mm/Kconfig
index 97a4e06..dd1b8ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -200,6 +200,15 @@ config MEMORY_HOTREMOVE
        depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
        depends on MIGRATION
 
+config MEMORY_HOTPLUG_AUTOONLINE
+       bool "Automatically online hot-added memory"
+       depends on MEMORY_HOTPLUG_SPARSE
+       help
+         When memory is hot-added, it is not at ready-to-use state, a special
+         userspace action is required to online the newly added blocks. With
+         this option enabled, the kernel will try to online all newly added
+         memory automatically.
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 67d488a..32a7b7c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -76,6 +76,18 @@ static struct {
 #define memhp_lock_acquire()      lock_map_acquire(&mem_hotplug.dep_map)
 #define memhp_lock_release()      lock_map_release(&mem_hotplug.dep_map)
 
+#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE
+bool memhp_autoonline = true;
+EXPORT_SYMBOL_GPL(memhp_autoonline);
+
+static int __init setup_memhp_autoonline(char *str)
+{
+       memhp_autoonline = false;
+       return 0;
+}
+__setup("nomemhp_autoonline", setup_memhp_autoonline);
+#endif
+
 void get_online_mems(void)
 {
        might_sleep();
@@ -1232,7 +1244,7 @@ int zone_for_memory(int nid, u64 start, u64 size, int 
zone_default,
 }
 
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, bool online)
 {
        u64 start, size;
        pg_data_t *pgdat = NULL;
@@ -1292,6 +1304,11 @@ int __ref add_memory_resource(int nid, struct resource 
*res)
        /* create new memmap entry */
        firmware_map_add_hotplug(start, start + size, "System RAM");
 
+       /* online pages if requested */
+       if (online)
+               online_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT,
+                            MMOP_ONLINE_KEEP);
+
        goto out;
 
 error:
@@ -1315,7 +1332,11 @@ int __ref add_memory(int nid, u64 start, u64 size)
        if (!res)
                return -EEXIST;
 
-       ret = add_memory_resource(nid, res);
+#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE
+       ret = add_memory_resource(nid, res, memhp_autoonline);
+#else
+       ret = add_memory_resource(nid, res, false);
+#endif
        if (ret < 0)
                release_memory_resource(res);
        return ret;
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to