* Balbir Singh <balb...@nvidia.com> wrote:
> > Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory > > resource > > afe00000000-affffffffff : 0000:03:00.0 > > is gone. > > > > If one would add a max_pyhs_addr argument to devm_request_free_mem_region() > > (which return the resource addr in kgd2kfd_init_zone_device()) one could > > keep > > the memory below the 44bit limit with CONFIG_HSA_AMD_SVM enabled. > > > > Thanks for reporting the result, does this patch work > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 01ea7c6df303..14f42f8012ab 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -968,8 +968,9 @@ int add_pages(int nid, unsigned long start_pfn, unsigned > long nr_pages, > WARN_ON_ONCE(ret); > > /* update max_pfn, max_low_pfn and high_memory */ > - update_end_of_memory_vars(start_pfn << PAGE_SHIFT, > - nr_pages << PAGE_SHIFT); > + if (!params->pgmap) > + update_end_of_memory_vars(start_pfn << PAGE_SHIFT, > + nr_pages << PAGE_SHIFT); > > return ret; > } > > It basically prevents max_pfn from moving when the inserted memory is > zone_device. > > FYI: It's a test patch and will still create issues if the amount of > present memory (physically) is very high, because the driver need to > enable use_dma32 in that case. So this patch does the trick for Bert, and I'm wondering what the best fix here would be overall, because it's a tricky situation. Am I correct in assuming that with enough physical memory this bug would trigger, with and without nokaslr? I *think* the best approach going forward would be to add the above quirk the the x86 memory setup code, but also issue a kernel warning at that point with all the relevant information included, so that the driver's use_dma32 bug can at least be indicated? That might also trigger for other systems, because if this scenario is so spurious, I doubt it's the only affected driver ... Thanks, Ingo