Sourabh Jain <sourabhj...@linux.ibm.com> writes: > Hello Ritesh, > > > On 04/03/25 10:27, Ritesh Harjani (IBM) wrote: >> Sourabh Jain <sourabhj...@linux.ibm.com> writes: >> >>> Hello Ritesh, >>> >>> Thanks for the review. >>> >>> On 02/03/25 12:05, Ritesh Harjani (IBM) wrote: >>>> Sourabh Jain <sourabhj...@linux.ibm.com> writes: >>>> >>>>> The fadump kernel boots with limited memory solely to collect the kernel >>>>> core dump. Having gigantic hugepages in the fadump kernel is of no use. >>>> Sure got it. >>>> >>>>> Many times, the fadump kernel encounters OOM (Out of Memory) issues if >>>>> gigantic hugepages are allocated. >>>>> >>>>> To address this, disable gigantic hugepages if fadump is active by >>>>> returning early from arch_hugetlb_valid_size() using >>>>> hugepages_supported(). When fadump is active, the global variable >>>>> hugetlb_disabled is set to true, which is later used by the >>>>> PowerPC-specific hugepages_supported() function to determine hugepage >>>>> support. >>>>> >>>>> Returning early from arch_hugetlb_vali_size() not only disables >>>>> gigantic hugepages but also avoids unnecessary hstate initialization for >>>>> every hugepage size supported by the platform. >>>>> >>>>> kernel logs related to hugepages with this patch included: >>>>> kernel argument passed: hugepagesz=1G hugepages=1 >>>>> >>>>> First kernel: gigantic hugepage got allocated >>>>> ============================================== >>>>> >>>>> dmesg | grep -i "hugetlb" >>>>> ------------------------- >>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages >>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page >>>>> HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages >>>>> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page >>>>> >>>>> $ cat /proc/meminfo | grep -i "hugetlb" >>>>> ------------------------------------- >>>>> Hugetlb: 1048576 kB >>>> Was this tested with patch [1] in your local tree? >>>> >>>> [1]: >>>> https://web.git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d629d7a8efc33 >>>> >>>> IIUC, this patch [1] disables the boot time allocation of hugepages. >>>> Isn't it also disabling the boot time allocation for gigantic huge pages >>>> passed by the cmdline params like hugepagesz=1G and hugepages=2 ? >>> Yes, I had the patch [1] in my tree. >>> >>> My understanding is that gigantic pages are allocated before normal huge >>> pages. >>> >>> In hugepages_setup() in hugetlb.c, we have: >>> >>> if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate)) >>> hugetlb_hstate_alloc_pages(parsed_hstate); >>> >>> I believe the above code allocates memory for gigantic pages, and >>> hugetlb_init() is >>> called later because it is a subsys_initcall. >>> >>> So, by the time the kernel reaches hugetlb_init(), the gigantic pages >>> are already >>> allocated. Isn't that right? >>> >>> Please let me know your opinion. >> Yes, you are right. We are allocating hugepages from memblock, however >> this isn't getting advertized anywhere. i.e. there is no way one can >> know from any user interface on whether hugepages were allocated or not. >> i.e. for fadump kernel when hugepagesz= and hugepages= params are >> passed, though it will allocate gigantic pages, it won't advertize this >> in meminfo or anywhere else. This was adding the confusion when I tested >> this (which wasn't clear from the commit msg either). >> >> And I guess this is happening during fadump kernel because of our patch >> [1], which added a check to see whether hugetlb_disabled is true in >> hugepages_supported(). Due to this hugetlb_init() is now not doing the >> rest of the initialization for those gigantic pages which were allocated >> due to cmdline options from hugepages_setup(). >> >> [1]: >> https://lore.kernel.org/linuxppc-dev/20241202054310.928610-1-sourabhj...@linux.ibm.com/ >> >> Now as we know from below that fadump can set hugetlb_disabled call in >> early_setup(). >> i.e. fadump can mark hugetlb_disabled to true in >> early_setup() -> early_init_devtree() -> fadump_reserve_mem() >> >> And hugepages_setup() and hugepagesz_setup() gets called late in >> start_kernel() -> parse_args() >> >> >> And we already check for hugepages_supported() in all necessary calls in >> mm/hugetlb.c. So IMO, this check should go in mm/hugetlb.c in >> hugepagesz_setup() and hugepages_setup(). Because otherwise every arch >> implementation will end up duplicating this by adding >> hugepages_supported() check in their arch implementation of >> arch_hugetlb_valid_size(). >> >> e.g. references of hugepages_supported() checks in mm/hugetlb.c >> >> mm/hugetlb.c hugetlb_show_meminfo_node 4959 if (!hugepages_supported()) >> mm/hugetlb.c hugetlb_report_node_meminfo 4943 if (!hugepages_supported()) >> mm/hugetlb.c hugetlb_report_meminfo 4914 if (!hugepages_supported()) >> mm/hugetlb.c hugetlb_overcommit_handler 4848 if (!hugepages_supported()) >> mm/hugetlb.c hugetlb_sysctl_handler_common 4809 if (!hugepages_supported()) >> mm/hugetlb.c hugetlb_init 4461 if (!hugepages_supported()) { >> mm/hugetlb.c dissolve_free_hugetlb_folios 2211 if (!hugepages_supported()) >> fs/hugetlbfs/inode.c init_hugetlbfs_fs 1604 if (!hugepages_supported()) { >> >> >> Let me also see the history on why this wasn't done earlier though... >> >> ... Oh actually there is more history to this. See [2]. We already had >> hugepages_supported() check in hugepages_setup() and other places >> earlier which was removed to fix some other problem in ppc ;) >> >> [2]: >> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2833a5bf75b3657c4dd20b3709c8c702754cb1f >> >> >> Hence I believe this needs a wider cleanup than just fixing it for our >> arch. I see there is a patch series already fixing these code paths, >> which is also cleaning up the path of gigantic hugepage allocation in >> hugepages_setup(). I think it is in mm-unstable branch too. Can we >> please review & test that to make sure that the fadump usecase of >> disabling hugepages & gigantic are getting covered properly? >> >> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-f...@google.com/ > > I evaluated the patch series [3] for the fadump issue, and here are my > observations: > > Currently, the patch series [3] does not fix the issue I am trying to > address with this patch. > > With patch series [3] applied, I see the following logs when booting the > fadump kernel with > hugepagesz=1G hugepages=1 > | > | > With just Patch series [3]: > ------------------------------------ > > kdump:/# dmesg | grep -i HugeTLB > [ 0.000000] HugeTLB: allocating 10 of page size 1.00 GiB failed. > Only allocated 9 hugepages. > [ 0.405964] HugeTLB support is disabled! > [ 0.409162] HugeTLB: huge pages not supported, ignoring associated > command-line parameters > [ 0.437740] hugetlbfs: disabling because there are no supported > hugepage sizes > > One good thing is that the kernel now at least reports the gigantic > pages allocated, which was > not the case before. I think patch series [3] introduced that improvement. > > Now, on top of patch series [3], I applied this fix, and the kernel > prints the following logs: > > Patch series [3] + this fix: > ------------------------------------ > > [ 0.000000] HugeTLB: unsupported hugepagesz=1G > [ 0.000000] HugeTLB: hugepages=10 does not follow a valid hugepagesz, > ignoring > [ 0.366158] HugeTLB support is disabled! > [ 0.398004] hugetlbfs: disabling because there are no supported > hugepage sizes > > With these logs, one can clearly identify what is happening. > > What are your thoughts on this fix now? > > Do you still think handling this in generic code is better?
I believe so yes (unless we have a valid reason for not doing that). hugepages_supported() is an arch specific call. If you see the prints above what we are essentially saying is that this is not a valid hugepagesz. But that's not the case really right. What it should just reflect is that the hugepages support is disabled. That being said, I will have to go and look into that series to suggest, where in that path it should use hugepages_supported() arch call to see if the hugepages are supported or not before initializing. And hopefully as you suggested since our cmdline parsing problem was already solved, it should not be a problem in using hugepages_supported() during cmdline parsing phase. But let me check that series and get back. > Given that I was already advised to handle things in arch > code. [4] > > Or should we keep it this way? > One advantage handling things in arch_hugetlb_valid_size() is that it helps > avoid populating hstates since it is not required anyway. I am not claiming > that it is not possible in generic code. IMO, even adding hugepages_supported() check at the right place should avoid populating hstates too. But let's confirm that. -ritesh > > Thoughts? > > Thanks, > Sourabh Jain > > > [3]: https://lore.kernel.org/all/20250228182928.2645936-1-f...@google.com/ > [4]: https://lore.kernel.org/all/20250122150613.28a92438@thinkpad-T15/