Hello, On Wed, Mar 26, 2025 at 09:46:11AM -0500, Gaurav Batra wrote: > Hello Michal, > > In the patch to fix the pmemory bug, I made some changes to the code that > determines Max memory an LPAR can have (excluding pmemory). This information > is needed while creating Dynamic DMA Window (DDW). These changes are in the > main line code path of DDW creation. This might have irritated QEMU somehow, > no idea yet on how.
Yes, it's defeinitely something with the DDW code. Using the disable_ddw=1 kernel parameter avoids the qemu crash. The kernels in https://download.opensuse.org/repositories/Kernel:/SLE15-SP7/pool/ppc64le/ have the patch applied. Booting the kernel inside qemu VM with a PCI device (such as the USB hub) and then rebooting the VM crashes qemu. Thanks Michal > > Thanks, > > Gaurav > > On 3/19/25 12:29 PM, Michal Suchánek wrote: > > Hello, > > > > looks like this upsets some assumption qemu has about these windows. > > > > https://lists.nongnu.org/archive/html/qemu-devel/2025-03/msg05137.html > > > > When Linux kernel that has this patch applied is running inside a qemu > > VM with a PCI device and the VM is rebooted qemu crashes shortly after > > the next Linux kernel starts. > > > > This is quite curious since qemu does AFAIK not support pmemory at all. > > > > Any idea what went wrong there? > > > > Thanks > > > > Michal > > > > On Thu, Jan 30, 2025 at 12:38:54PM -0600, Gaurav Batra wrote: > > > iommu_mem_notifier() is invoked when RAM is dynamically added/removed. > > > This > > > notifier call is responsible to add/remove TCEs from the Dynamic DMA > > > Window > > > (DDW) when TCEs are pre-mapped. TCEs are pre-mapped only for RAM and not > > > for persistent memory (pmemory). For DMA buffers in pmemory, TCEs are > > > dynamically mapped when the device driver instructs to do so. > > > > > > The issue is 'daxctl' command is capable of adding pmemory as "System RAM" > > > after LPAR boot. The command to do so is - > > > > > > daxctl reconfigure-device --mode=system-ram dax0.0 --force > > > > > > This will dynamically add pmemory range to LPAR RAM eventually invoking > > > iommu_mem_notifier(). The address range of pmemory is way beyond the Max > > > RAM that the LPAR can have. Which means, this range is beyond the DDW > > > created for the device, at device initialization time. > > > > > > As a result when TCEs are pre-mapped for the pmemory range, by > > > iommu_mem_notifier(), PHYP HCALL returns H_PARAMETER. This failed the > > > command, daxctl, to add pmemory as RAM. > > > > > > The solution is to not pre-map TCEs for pmemory. > > > > > > Signed-off-by: Gaurav Batra <gba...@linux.ibm.com> > > > --- > > > arch/powerpc/include/asm/mmzone.h | 1 + > > > arch/powerpc/mm/numa.c | 2 +- > > > arch/powerpc/platforms/pseries/iommu.c | 29 ++++++++++++++------------ > > > 3 files changed, 18 insertions(+), 14 deletions(-) > > > > > > diff --git a/arch/powerpc/include/asm/mmzone.h > > > b/arch/powerpc/include/asm/mmzone.h > > > index d99863cd6cde..049152f8d597 100644 > > > --- a/arch/powerpc/include/asm/mmzone.h > > > +++ b/arch/powerpc/include/asm/mmzone.h > > > @@ -29,6 +29,7 @@ extern cpumask_var_t node_to_cpumask_map[]; > > > #ifdef CONFIG_MEMORY_HOTPLUG > > > extern unsigned long max_pfn; > > > u64 memory_hotplug_max(void); > > > +u64 hot_add_drconf_memory_max(void); > > > #else > > > #define memory_hotplug_max() memblock_end_of_DRAM() > > > #endif > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > > index 3c1da08304d0..603a0f652ba6 100644 > > > --- a/arch/powerpc/mm/numa.c > > > +++ b/arch/powerpc/mm/numa.c > > > @@ -1336,7 +1336,7 @@ int hot_add_scn_to_nid(unsigned long scn_addr) > > > return nid; > > > } > > > -static u64 hot_add_drconf_memory_max(void) > > > +u64 hot_add_drconf_memory_max(void) > > > { > > > struct device_node *memory = NULL; > > > struct device_node *dn = NULL; > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c > > > b/arch/powerpc/platforms/pseries/iommu.c > > > index 29f1a0cc59cd..abd9529a8f41 100644 > > > --- a/arch/powerpc/platforms/pseries/iommu.c > > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > > @@ -1284,17 +1284,13 @@ static LIST_HEAD(failed_ddw_pdn_list); > > > static phys_addr_t ddw_memory_hotplug_max(void) > > > { > > > - resource_size_t max_addr = memory_hotplug_max(); > > > - struct device_node *memory; > > > + resource_size_t max_addr; > > > - for_each_node_by_type(memory, "memory") { > > > - struct resource res; > > > - > > > - if (of_address_to_resource(memory, 0, &res)) > > > - continue; > > > - > > > - max_addr = max_t(resource_size_t, max_addr, res.end + 1); > > > - } > > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG) > > > + max_addr = hot_add_drconf_memory_max(); > > > +#else > > > + max_addr = memblock_end_of_DRAM(); > > > +#endif > > > return max_addr; > > > } > > > @@ -1600,7 +1596,7 @@ static bool enable_ddw(struct pci_dev *dev, struct > > > device_node *pdn) > > > if (direct_mapping) { > > > /* DDW maps the whole partition, so enable direct DMA > > > mapping */ > > > - ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> > > > PAGE_SHIFT, > > > + ret = walk_system_ram_range(0, ddw_memory_hotplug_max() >> > > > PAGE_SHIFT, > > > win64->value, > > > tce_setrange_multi_pSeriesLP_walk); > > > if (ret) { > > > dev_info(&dev->dev, "failed to map DMA window > > > for %pOF: %d\n", > > > @@ -2346,11 +2342,17 @@ static int iommu_mem_notifier(struct > > > notifier_block *nb, unsigned long action, > > > struct memory_notify *arg = data; > > > int ret = 0; > > > + /* This notifier can get called when onlining persistent memory as well. > > > + * TCEs are not pre-mapped for persistent memory. Persistent memory will > > > + * always be above ddw_memory_hotplug_max() > > > + */ > > > + > > > switch (action) { > > > case MEM_GOING_ONLINE: > > > spin_lock(&dma_win_list_lock); > > > list_for_each_entry(window, &dma_win_list, list) { > > > - if (window->direct) { > > > + if (window->direct && (arg->start_pfn << PAGE_SHIFT) < > > > + ddw_memory_hotplug_max()) { > > > ret |= > > > tce_setrange_multi_pSeriesLP(arg->start_pfn, > > > arg->nr_pages, > > > window->prop); > > > } > > > @@ -2362,7 +2364,8 @@ static int iommu_mem_notifier(struct notifier_block > > > *nb, unsigned long action, > > > case MEM_OFFLINE: > > > spin_lock(&dma_win_list_lock); > > > list_for_each_entry(window, &dma_win_list, list) { > > > - if (window->direct) { > > > + if (window->direct && (arg->start_pfn << PAGE_SHIFT) < > > > + ddw_memory_hotplug_max()) { > > > ret |= > > > tce_clearrange_multi_pSeriesLP(arg->start_pfn, > > > arg->nr_pages, > > > window->prop); > > > } > > > > > > base-commit: 95ec54a420b8f445e04a7ca0ea8deb72c51fe1d3 > > > -- > > > 2.39.3 (Apple Git-146) > > > > > >