Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node when some memory sections are not removed [RFC PATCH v8 20/20] * create new patch: clear hwpoisoned flag when onlining pages change log of v7: [RFC PATCH v7 4/19] * do not continue if acpi_memory_device_remove_memory() fails. [RFC PATCH v7 15/19] * handle usemap in register_page_bootmem_info_section() too. change log of v6: [RFC PATCH v6 12/19] * fix building error on other archtitectures than x86 [RFC PATCH v6 15-16/19] * fix building error on other archtitectures than x86 change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug" from the patch series, since the patch is a bugfix. It is being disccussed on other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATCH 0/13]. [RFC PATCH v4 2/13] * check memory is online or not at remove_memory() *
[PATCH] powerpc/mpc85xx: Change spin table to cached memory
ePAPR v1.1 requires the spin table to be in cached memory. So we need to change the call argument of ioremap to enable cache and coherence. We also flush the cache after writing to spin table to keep it compatible with previous cache-inhibit spin table. Flushing before and after accessing spin table is recommended by ePAPR. Signed-off-by: York Sun Acked-by: Timur Tabi --- This patch applies to git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git next branch. arch/powerpc/platforms/85xx/smp.c | 49 +++-- 1 file changed, 36 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index 6fcfa12..148c2f2 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -128,6 +128,19 @@ static void __cpuinit smp_85xx_mach_cpu_die(void) } #endif +static inline void flush_spin_table(void *spin_table) +{ + flush_dcache_range((ulong)spin_table, + (ulong)spin_table + sizeof(struct epapr_spin_table)); +} + +static inline u32 read_spin_table_addr_l(void *spin_table) +{ + flush_dcache_range((ulong)spin_table, + (ulong)spin_table + sizeof(struct epapr_spin_table)); + return in_be32(&((struct epapr_spin_table *)spin_table)->addr_l); +} + static int __cpuinit smp_85xx_kick_cpu(int nr) { unsigned long flags; @@ -161,8 +174,8 @@ static int __cpuinit smp_85xx_kick_cpu(int nr) /* Map the spin table */ if (ioremappable) - spin_table = ioremap(*cpu_rel_addr, - sizeof(struct epapr_spin_table)); + spin_table = ioremap_prot(*cpu_rel_addr, + sizeof(struct epapr_spin_table), _PAGE_COHERENT); else spin_table = phys_to_virt(*cpu_rel_addr); @@ -173,7 +186,16 @@ static int __cpuinit smp_85xx_kick_cpu(int nr) generic_set_cpu_up(nr); if (system_state == SYSTEM_RUNNING) { + /* +* To keep it compatible with old boot program which uses +* cache-inhibit spin table, we need to flush the cache +* before accessing spin table to invalidate any staled data. +* We also need to flush the cache after writing to spin +* table to push data out. +*/ + flush_spin_table(spin_table); out_be32(&spin_table->addr_l, 0); + flush_spin_table(spin_table); /* * We don't set the BPTR register here since it already points @@ -181,9 +203,14 @@ static int __cpuinit smp_85xx_kick_cpu(int nr) */ mpic_reset_core(hw_cpu); - /* wait until core is ready... */ - if (!spin_event_timeout(in_be32(&spin_table->addr_l) == 1, - 1, 100)) { + /* +* wait until core is ready... +* We need to invalidate the stale data, in case the boot +* loader uses a cache-inhibited spin table. +*/ + if (!spin_event_timeout( + read_spin_table_addr_l(spin_table) == 1, + 1, 100)) { pr_err("%s: timeout waiting for core %d to reset\n", __func__, hw_cpu); ret = -ENOENT; @@ -194,12 +221,10 @@ static int __cpuinit smp_85xx_kick_cpu(int nr) __secondary_hold_acknowledge = -1; } #endif + flush_spin_table(spin_table); out_be32(&spin_table->pir, hw_cpu); out_be32(&spin_table->addr_l, __pa(__early_start)); - - if (!ioremappable) - flush_dcache_range((ulong)spin_table, - (ulong)spin_table + sizeof(struct epapr_spin_table)); + flush_spin_table(spin_table); /* Wait a bit for the CPU to ack. */ if (!spin_event_timeout(__secondary_hold_acknowledge == hw_cpu, @@ -213,13 +238,11 @@ out: #else smp_generic_kick_cpu(nr); + flush_spin_table(spin_table); out_be32(&spin_table->pir, hw_cpu); out_be64((u64 *)(&spin_table->addr_h), __pa((u64)*((unsigned long long *)generic_secondary_smp_init))); - - if (!ioremappable) - flush_dcache_range((ulong)spin_table, - (ulong)spin_table + sizeof(struct epapr_spin_table)); + flush_spin_table(spin_table); #endif local_irq_restore(flags); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev