On Tue, 15 Apr 2014 14:37:01 +0800 Hu Tao <hu...@cn.fujitsu.com> wrote:
> On Mon, Apr 14, 2014 at 06:44:42PM +0200, Igor Mammedov wrote: > > On Mon, 14 Apr 2014 15:25:01 +0800 > > Hu Tao <hu...@cn.fujitsu.com> wrote: > > > > > On Fri, Apr 04, 2014 at 03:36:58PM +0200, Igor Mammedov wrote: > > > > Needed for Windows to use hotplugged memory device, otherwise > > > > it complains that server is not configured for memory hotplug. > > > > Tests shows that aftewards it uses dynamically provided > > > > proximity value from _PXM() method if available. > > > > > > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com> > > > > --- > > > > hw/i386/acpi-build.c | 14 ++++++++++++++ > > > > 1 file changed, 14 insertions(+) > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > > > > index ef89e99..012b100 100644 > > > > --- a/hw/i386/acpi-build.c > > > > +++ b/hw/i386/acpi-build.c > > > > @@ -1197,6 +1197,8 @@ build_srat(GArray *table_data, GArray *linker, > > > > uint64_t curnode; > > > > int srat_start, numa_start, slots; > > > > uint64_t mem_len, mem_base, next_base; > > > > + PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); > > > > + ram_addr_t hotplug_as_size = > > > > memory_region_size(&pcms->hotplug_memory); > > > > > > > > srat_start = table_data->len; > > > > > > > > @@ -1261,6 +1263,18 @@ build_srat(GArray *table_data, GArray *linker, > > > > acpi_build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS); > > > > } > > > > > > > > + /* > > > > + * Fake entry required by Windows to enable memory hotplug in OS. > > > > + * Individual DIMM devices override proximity set here via _PXM > > > > method, > > > > + * which returns associated with it NUMA node id. > > > > + */ > > > > + if (hotplug_as_size) { > > > > + numamem = acpi_data_push(table_data, sizeof *numamem); > > > > + acpi_build_srat_memory(numamem, pcms->hotplug_memory_base, > > > > + hotplug_as_size, 0, > > > > MEM_AFFINITY_HOTPLUGGABLE | > > > > + MEM_AFFINITY_ENABLED); > > > > + } > > > > + > > > > > > Hi Igor, > > > > > > With the faked entry, memory unplug doesn't work. Entries should be set > > > up for each node with correct flags(enable, hotpluggable) to make memory > > > unplug work. > > Could you be more specific, what and how doesn't work and why there is > > need for SRAT entries per DIMM? > > I've briefly tested with your unplug patches and linux seemed be ok with > > unplug, > > i.e. device node was removed from /sys after receiving remove notification. > > > Following are fail cases: > I did some testing using upstream kernel with hot-remove enabled. tested only "this patch" case > ------------------------------------------------------------------------+---------------------------------------------- > guest commands | > this patch | hacked SRAT > ------------------------------------------------------------------------+---------------------------------------------- > echo 'online' > /sys/devices/system/memory/memory32/state && \ | > | > echo 'offline' > /sys/devices/system/memory/memory32/state | > fail | success works for me, but it might/allowed to fail offline since page migration may fail if memory section or its part is not movable. > ------------------------------------------------------------------------+---------------------------------------------- > echo 'online' > /sys/devices/system/memory/memory32/state && \ | > | > echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject | > fail | success the same as #1 > ------------------------------------------------------------------------+---------------------------------------------- > echo 'online_movable' > /sys/devices/system/memory/memory32/state | > fail[first memory block] | fail it's linux implementation specific, should be fixed in guest and has nothing to do with qemu side. PS: all hot-added memory sections could be onlined with 'online_movable' in reverse order. > ------------------------------------------------------------------------+---------------------------------------------- > echo 'online_movable' > /sys/devices/system/memory/memory35/state && \ | > | > echo 'offline' > /sys/devices/system/memory/memory35/state | > success[last memory block] | success > ------------------------------------------------------------------------+---------------------------------------------- > echo 'online_movable' > /sys/devices/system/memory/memory32/state && \ | > | > echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject | > success[last memory block] | success > ------------------------------------------------------------------------+---------------------------------------------- movable memory section is guarantied to succeed, hence no issue. Reading upstream kernel code, it honors PNP0C80._PXM value and overrides anything that was provided in SRAT. So I don't see why hacked SRAT would make any difference. Could you verify with the latest upstream kernel? PS: do not forget to check "removable" attribute before marking case as failed. One time, I've seen guest panic on "successful" eject of ZONE_NORMAL memory section since it was still using it (so there is still hot-remove bugs in kernel) and "removable" doesn't guarantee anything for ZONE_NORMAL memory section. > > Hacke SRAT memory entry: > > PXM: 0 > range: 4G ~ 4G + 512M > flags: Enabled Hot-Pluggable > > PXM: 1 > range: 4G + 512M ~ 5G > flags: Enabled Hot-Pluggable > > So I think we should add maxmem to -numa and build SRAT accordingly. > But there is something I'm not sure with. I added dimm in node 1, but > it's memory range fell in node 0. Users always can cause the mismatch > with dimm,start,node. > > > > This is the relevent part in command line: > > qemu command line: -m 512M,slots=4,maxmem=2G \ > -object memory-ram,id=foo,size=512M \ > -numa node,id=n0,mem=256M -numa node,id=n1,mem=256M > > (qemu monitor) device_add dimm,id=d0,memdev=foo,node=1 > > > > > > > > > Windows has not been tested yet. I encountered a problem that there is > > > no SRAT in Windows so even memory hotplug doesn't work. (but there is > > > in Linux with the same configuration). > > For Windows to work one needs to add "-numa node" CLI option so that > > SRAT would be exposed to guest. > > Thanks. I need to double-check. > > > Paolo suggested to enable -numa node by default, I guess we can do it > > once NUMA re-factoring is merged. > > > > That said, I haven't found any information that Windows supports > > memory hot-remove. Google tells that only hot-add is supported > > for up to WS2008R2. I've tested WS2012R2, it doesn't work either, > > i.e. it sees but ignores Notify request. > > > > > > > > Regards, > > > Hu Tao > > > -- Regards, Igor