Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-29 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node when some memory sections are not removed
  [RFC PATCH v8 20/20]
* create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
  [RFC PATCH v7 4/19]
* do not continue if acpi_memory_device_remove_memory() fails.
  [RFC PATCH v7 15/19]
* handle usemap in register_page_bootmem_info_section() too.

change log of v6:
  [RFC PATCH v6 12/19]
* fix building error on other archtitectures than x86

  [RFC PATCH v6 15-16/19]
* fix building error on other archtitectures than x86

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.

  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
from the patch series, since the patch is a bugfix. It is being disccussed
on other thread. But for testing the patch series, the patch is needed.
So I added the patch as [PATCH 0/13].

  [RFC PATCH v4 2/13]
* check memory is online or not at remove_memory()
*

[PATCH] powerpc/mpc85xx: Change spin table to cached memory

2012-09-29 Thread York Sun
ePAPR v1.1 requires the spin table to be in cached memory. So we need
to change the call argument of ioremap to enable cache and coherence.
We also flush the cache after writing to spin table to keep it compatible
with previous cache-inhibit spin table. Flushing before and after
accessing spin table is recommended by ePAPR.

Signed-off-by: York Sun 
Acked-by: Timur Tabi 
---
This patch applies to 
git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git next branch.

 arch/powerpc/platforms/85xx/smp.c |   49 +++--
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index 6fcfa12..148c2f2 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -128,6 +128,19 @@ static void __cpuinit smp_85xx_mach_cpu_die(void)
 }
 #endif
 
+static inline void flush_spin_table(void *spin_table)
+{
+   flush_dcache_range((ulong)spin_table,
+   (ulong)spin_table + sizeof(struct epapr_spin_table));
+}
+
+static inline u32 read_spin_table_addr_l(void *spin_table)
+{
+   flush_dcache_range((ulong)spin_table,
+   (ulong)spin_table + sizeof(struct epapr_spin_table));
+   return in_be32(&((struct epapr_spin_table *)spin_table)->addr_l);
+}
+
 static int __cpuinit smp_85xx_kick_cpu(int nr)
 {
unsigned long flags;
@@ -161,8 +174,8 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
 
/* Map the spin table */
if (ioremappable)
-   spin_table = ioremap(*cpu_rel_addr,
-   sizeof(struct epapr_spin_table));
+   spin_table = ioremap_prot(*cpu_rel_addr,
+   sizeof(struct epapr_spin_table), _PAGE_COHERENT);
else
spin_table = phys_to_virt(*cpu_rel_addr);
 
@@ -173,7 +186,16 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
generic_set_cpu_up(nr);
 
if (system_state == SYSTEM_RUNNING) {
+   /*
+* To keep it compatible with old boot program which uses
+* cache-inhibit spin table, we need to flush the cache
+* before accessing spin table to invalidate any staled data.
+* We also need to flush the cache after writing to spin
+* table to push data out.
+*/
+   flush_spin_table(spin_table);
out_be32(&spin_table->addr_l, 0);
+   flush_spin_table(spin_table);
 
/*
 * We don't set the BPTR register here since it already points
@@ -181,9 +203,14 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
 */
mpic_reset_core(hw_cpu);
 
-   /* wait until core is ready... */
-   if (!spin_event_timeout(in_be32(&spin_table->addr_l) == 1,
-   1, 100)) {
+   /*
+* wait until core is ready...
+* We need to invalidate the stale data, in case the boot
+* loader uses a cache-inhibited spin table.
+*/
+   if (!spin_event_timeout(
+   read_spin_table_addr_l(spin_table) == 1,
+   1, 100)) {
pr_err("%s: timeout waiting for core %d to reset\n",
__func__, hw_cpu);
ret = -ENOENT;
@@ -194,12 +221,10 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
__secondary_hold_acknowledge = -1;
}
 #endif
+   flush_spin_table(spin_table);
out_be32(&spin_table->pir, hw_cpu);
out_be32(&spin_table->addr_l, __pa(__early_start));
-
-   if (!ioremappable)
-   flush_dcache_range((ulong)spin_table,
-   (ulong)spin_table + sizeof(struct epapr_spin_table));
+   flush_spin_table(spin_table);
 
/* Wait a bit for the CPU to ack. */
if (!spin_event_timeout(__secondary_hold_acknowledge == hw_cpu,
@@ -213,13 +238,11 @@ out:
 #else
smp_generic_kick_cpu(nr);
 
+   flush_spin_table(spin_table);
out_be32(&spin_table->pir, hw_cpu);
out_be64((u64 *)(&spin_table->addr_h),
  __pa((u64)*((unsigned long long *)generic_secondary_smp_init)));
-
-   if (!ioremappable)
-   flush_dcache_range((ulong)spin_table,
-   (ulong)spin_table + sizeof(struct epapr_spin_table));
+   flush_spin_table(spin_table);
 #endif
 
local_irq_restore(flags);
-- 
1.7.9.5


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev