Sourabh Jain <sourabhj...@linux.ibm.com> writes:

> Hello Ritesh,
>
>
> On 04/03/25 10:27, Ritesh Harjani (IBM) wrote:
>> Sourabh Jain <sourabhj...@linux.ibm.com> writes:
>>
>>> Hello Ritesh,
>>>
>>> Thanks for the review.
>>>
>>> On 02/03/25 12:05, Ritesh Harjani (IBM) wrote:
>>>> Sourabh Jain <sourabhj...@linux.ibm.com> writes:
>>>>
>>>>> The fadump kernel boots with limited memory solely to collect the kernel
>>>>> core dump. Having gigantic hugepages in the fadump kernel is of no use.
>>>> Sure got it.
>>>>
>>>>> Many times, the fadump kernel encounters OOM (Out of Memory) issues if
>>>>> gigantic hugepages are allocated.
>>>>>
>>>>> To address this, disable gigantic hugepages if fadump is active by
>>>>> returning early from arch_hugetlb_valid_size() using
>>>>> hugepages_supported(). When fadump is active, the global variable
>>>>> hugetlb_disabled is set to true, which is later used by the
>>>>> PowerPC-specific hugepages_supported() function to determine hugepage
>>>>> support.
>>>>>
>>>>> Returning early from arch_hugetlb_vali_size() not only disables
>>>>> gigantic hugepages but also avoids unnecessary hstate initialization for
>>>>> every hugepage size supported by the platform.
>>>>>
>>>>> kernel logs related to hugepages with this patch included:
>>>>> kernel argument passed: hugepagesz=1G hugepages=1
>>>>>
>>>>> First kernel: gigantic hugepage got allocated
>>>>> ==============================================
>>>>>
>>>>> dmesg | grep -i "hugetlb"
>>>>> -------------------------
>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>> HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>>>>>
>>>>> $ cat /proc/meminfo | grep -i "hugetlb"
>>>>> -------------------------------------
>>>>> Hugetlb:         1048576 kB
>>>> Was this tested with patch [1] in your local tree?
>>>>
>>>> [1]: 
>>>> https://web.git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d629d7a8efc33
>>>>
>>>> IIUC, this patch [1] disables the boot time allocation of hugepages.
>>>> Isn't it also disabling the boot time allocation for gigantic huge pages
>>>> passed by the cmdline params like hugepagesz=1G and hugepages=2 ?
>>> Yes, I had the patch [1] in my tree.
>>>
>>> My understanding is that gigantic pages are allocated before normal huge
>>> pages.
>>>
>>> In hugepages_setup() in hugetlb.c, we have:
>>>
>>>       if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
>>>           hugetlb_hstate_alloc_pages(parsed_hstate);
>>>
>>> I believe the above code allocates memory for gigantic pages, and
>>> hugetlb_init() is
>>> called later because it is a subsys_initcall.
>>>
>>> So, by the time the kernel reaches hugetlb_init(), the gigantic pages
>>> are already
>>> allocated. Isn't that right?
>>>
>>> Please let me know your opinion.
>> Yes, you are right. We are allocating hugepages from memblock, however
>> this isn't getting advertized anywhere. i.e. there is no way one can
>> know from any user interface on whether hugepages were allocated or not.
>> i.e. for fadump kernel when hugepagesz= and hugepages= params are
>> passed, though it will allocate gigantic pages, it won't advertize this
>> in meminfo or anywhere else. This was adding the confusion when I tested
>> this (which wasn't clear from the commit msg either).
>>
>> And I guess this is happening during fadump kernel because of our patch
>> [1], which added a check to see whether hugetlb_disabled is true in
>> hugepages_supported(). Due to this hugetlb_init() is now not doing the
>> rest of the initialization for those gigantic pages which were allocated
>> due to cmdline options from hugepages_setup().
>>
>> [1]: 
>> https://lore.kernel.org/linuxppc-dev/20241202054310.928610-1-sourabhj...@linux.ibm.com/
>>
>> Now as we know from below that fadump can set hugetlb_disabled call in 
>> early_setup().
>> i.e. fadump can mark hugetlb_disabled to true in
>> early_setup() -> early_init_devtree() -> fadump_reserve_mem()
>>
>> And hugepages_setup() and hugepagesz_setup() gets called late in
>> start_kernel() -> parse_args()
>>
>>
>> And we already check for hugepages_supported() in all necessary calls in
>> mm/hugetlb.c. So IMO, this check should go in mm/hugetlb.c in
>> hugepagesz_setup() and hugepages_setup(). Because otherwise every arch
>> implementation will end up duplicating this by adding
>> hugepages_supported() check in their arch implementation of
>> arch_hugetlb_valid_size().
>>
>> e.g. references of hugepages_supported() checks in mm/hugetlb.c
>>
>> mm/hugetlb.c hugetlb_show_meminfo_node 4959 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_report_node_meminfo 4943 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_report_meminfo 4914 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_overcommit_handler 4848 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_sysctl_handler_common 4809 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_init 4461 if (!hugepages_supported()) {
>> mm/hugetlb.c dissolve_free_hugetlb_folios 2211 if (!hugepages_supported())
>> fs/hugetlbfs/inode.c init_hugetlbfs_fs 1604 if (!hugepages_supported()) {
>>
>>
>> Let me also see the history on why this wasn't done earlier though...
>>
>> ... Oh actually there is more history to this. See [2]. We already had
>> hugepages_supported() check in hugepages_setup() and other places
>> earlier which was removed to fix some other problem in ppc ;)
>>
>> [2]: 
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2833a5bf75b3657c4dd20b3709c8c702754cb1f
>>
>>
>> Hence I believe this needs a wider cleanup than just fixing it for our
>> arch. I see there is a patch series already fixing these code paths,
>> which is also cleaning up the path of gigantic hugepage allocation in
>> hugepages_setup(). I think it is in mm-unstable branch too. Can we
>> please review & test that to make sure that the fadump usecase of
>> disabling hugepages & gigantic are getting covered properly?
>>
>> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-f...@google.com/
>
> I evaluated the patch series [3] for the fadump issue, and here are my 
> observations:
>
> Currently, the patch series [3] does not fix the issue I am trying to 
> address with this patch.
>
> With patch series [3] applied, I see the following logs when booting the 
> fadump kernel with
> hugepagesz=1G hugepages=1
> |
> |
> With just Patch series [3]:
> ------------------------------------
>
> kdump:/# dmesg | grep -i HugeTLB
> [    0.000000] HugeTLB: allocating 10 of page size 1.00 GiB failed.  
> Only allocated 9 hugepages.
> [    0.405964] HugeTLB support is disabled!
> [    0.409162] HugeTLB: huge pages not supported, ignoring associated 
> command-line parameters
> [    0.437740] hugetlbfs: disabling because there are no supported 
> hugepage sizes
>
> One good thing is that the kernel now at least reports the gigantic 
> pages allocated, which was
> not the case before. I think patch series [3] introduced that improvement.
>
> Now, on top of patch series [3], I applied this fix, and the kernel 
> prints the following logs:
>
> Patch series [3] + this fix:
> ------------------------------------
>
> [    0.000000] HugeTLB: unsupported hugepagesz=1G
> [    0.000000] HugeTLB: hugepages=10 does not follow a valid hugepagesz, 
> ignoring
> [    0.366158] HugeTLB support is disabled!
> [    0.398004] hugetlbfs: disabling because there are no supported 
> hugepage sizes
>
> With these logs, one can clearly identify what is happening.
>
> What are your thoughts on this fix now?
>
> Do you still think handling this in generic code is better?

I believe so yes (unless we have a valid reason for not doing that).
hugepages_supported() is an arch specific call. If you see the prints
above what we are essentially saying is that this is not a valid
hugepagesz. But that's not the case really right. What it should just
reflect is that the hugepages support is disabled. 

That being said, I will have to go and look into that series to suggest,
where in that path it should use hugepages_supported() arch call to see
if the hugepages are supported or not before initializing. And hopefully
as you suggested since our cmdline parsing problem was already solved,
it should not be a problem in using hugepages_supported() during cmdline
parsing phase.
But let me check that series and get back.


> Given that I was already advised to handle things in arch
> code. [4]
>
> Or should we keep it this way?
> One advantage handling things in arch_hugetlb_valid_size() is that it helps
> avoid populating hstates since it is not required anyway. I am not claiming
> that it is not possible in generic code.

IMO, even adding hugepages_supported() check at the right place should avoid
populating hstates too. But let's confirm that.

-ritesh

>
> Thoughts?
>
> Thanks,
> Sourabh Jain
>
>
> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-f...@google.com/
> [4]: https://lore.kernel.org/all/20250122150613.28a92438@thinkpad-T15/

Reply via email to