Hi all,

Thanks for the detailed explanations!

So, what we understood from that, is the following (please correct, if it is 
wrong):
Before 18.05 version:
- Dividing huge pages between NUMAs was based, by default, on Linux good will.
- Enforcing Linux to divide huge pages between NUMAs, required enabling 
configuration option "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES".
- The enforcement was done via "libnuma" library.

From 18.05 version:
- The mentioned configuration option is ignored, so that by default, all huge 
pages are allocated on NUMA 0.
- if "libnuma" library exists in system, then huge pages will be divided 
between NUMAs, without any special configuration.
- The above is relevant to architectures that support NUMA, e.g. X86 (which we 
use).

Thanks,
Asaf

-----Original Message-----
From: Ilya Maximets <i.maxim...@samsung.com> 
Sent: Tuesday, November 27, 2018 06:50 PM
To: Burakov, Anatoly <anatoly.bura...@intel.com>; Hemant Agrawal 
<hemant.agra...@nxp.com>; Asaf Sinai <asa...@radware.com>; dev@dpdk.org; Thomas 
Monjalon <tho...@monjalon.net>
Cc: Ilia Ferdman <il...@radware.com>; Sasha Hodos <sas...@radware.com>
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in 
memory pool allocations, when enabling/disabling this configuration

On 27.11.2018 13:33, Burakov, Anatoly wrote:
> On 27-Nov-18 10:26 AM, Hemant Agrawal wrote:
>>
>> On 11/26/2018 8:55 PM, Asaf Sinai wrote:
>>> +CC Ilia & Sasha.
>>>
>>> -----Original Message-----
>>> From: Burakov, Anatoly <anatoly.bura...@intel.com>
>>> Sent: Monday, November 26, 2018 04:57 PM
>>> To: Ilya Maximets <i.maxim...@samsung.com>; Asaf Sinai 
>>> <asa...@radware.com>; dev@dpdk.org; Thomas Monjalon 
>>> <tho...@monjalon.net>
>>> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no 
>>> difference in memory pool allocations, when enabling/disabling this 
>>> configuration
>>>
>>> On 26-Nov-18 2:32 PM, Ilya Maximets wrote:
>>>> On 26.11.2018 17:21, Burakov, Anatoly wrote:
>>>>> On 26-Nov-18 2:10 PM, Ilya Maximets wrote:
>>>>>> On 26.11.2018 16:42, Burakov, Anatoly wrote:
>>>>>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote:
>>>>>>>> On 26.11.2018 16:16, Ilya Maximets wrote:
>>>>>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote:
>>>>>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
>>>>>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
>>>>>>>>>>>> Hi Anatoly,
>>>>>>>>>>>>
>>>>>>>>>>>> We did not check it with "testpmd", only with our application.
>>>>>>>>>>>>       From the beginning, we did not enable this configuration 
>>>>>>>>>>>> (look at attached files), and everything works fine.
>>>>>>>>>>>> Of course we rebuild DPDK, when we change configuration.
>>>>>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works 
>>>>>>>>>>>> fine?
>>>>>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are 
>>>>>>>>>>> describing. This is not intended behavior. I will look into it.
>>>>>>>>>>>
>>>>>>>>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
>>>>>>>>>>
>>>>>>>>>> Looking at the code, i think this config option needs to be reworked 
>>>>>>>>>> and we should clarify what we mean by this option. It appears that 
>>>>>>>>>> i've misunderstood what this option actually intended to do, and i 
>>>>>>>>>> also think it's naming could be improved because it's confusing and 
>>>>>>>>>> misleading.
>>>>>>>>>>
>>>>>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it 
>>>>>>>>>> merely disables using libnuma to perform memory allocation. This 
>>>>>>>>>> looks like intended (if counter-intuitive) behavior - disabling this 
>>>>>>>>>> option will simply revert DPDK to working as it did before this 
>>>>>>>>>> option was introduced (i.e. best-effort allocation). This is why 
>>>>>>>>>> your code still works - because EAL still does allocate memory on 
>>>>>>>>>> socket 1, and *knows* that it's socket 1 memory. It still supports 
>>>>>>>>>> NUMA.
>>>>>>>>>>
>>>>>>>>>> The commit message for these changes states that the actual purpose 
>>>>>>>>>> of this option is to enable "balanced" hugepage allocation. In case 
>>>>>>>>>> of cgroups limitations, previously, DPDK would've exhausted all 
>>>>>>>>>> hugepages on master core's socket before attempting to allocate from 
>>>>>>>>>> other sockets, but by the time we've reached cgroups limits on 
>>>>>>>>>> numbers of hugepages, we might not have reached socket 1 and thus 
>>>>>>>>>> missed out on the pages we could've allocated, but didn't. Using 
>>>>>>>>>> libnuma solves this issue, because now we can allocate pages on 
>>>>>>>>>> sockets we want, instead of hoping we won't run out of hugepages 
>>>>>>>>>> before we get the memory we need.
>>>>>>>>>>
>>>>>>>>>> In 18.05 onwards, this option works differently (and arguably 
>>>>>>>>>> wrong). More specifically, it disallows allocations on sockets other 
>>>>>>>>>> than 0, and it also makes it so that EAL does not check which socket 
>>>>>>>>>> the memory *actually* came from. So, not only allocating memory from 
>>>>>>>>>> socket 1 is disabled, but allocating from socket 0 may even get you 
>>>>>>>>>> memory from socket 1!
>>>>>>>>> I'd consider this as a bug.
>>>>>>>>>
>>>>>>>>>> +CC Thomas
>>>>>>>>>>
>>>>>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, 
>>>>>>>>>> because it makes it seem like this option disables NUMA support, 
>>>>>>>>>> which is not the case.
>>>>>>>>>>
>>>>>>>>>> I would also argue that it is not relevant to 18.05+ memory 
>>>>>>>>>> subsystem, and should only work in legacy mode, because it is 
>>>>>>>>>> *impossible* to make it work right in the new memory subsystem, and 
>>>>>>>>>> here's why:
>>>>>>>>>>
>>>>>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate a 
>>>>>>>>>> hugepage on a specific socket - instead, any allocation will most 
>>>>>>>>>> likely happen on socket from which the allocation came from. For 
>>>>>>>>>> example, if user program's lcore is on socket 1, allocation on 
>>>>>>>>>> socket 0 will actually allocate a page on socket 1.
>>>>>>>>>>
>>>>>>>>>> If we don't check for page's NUMA node affinity (which is what 
>>>>>>>>>> currently happens) - we get performance degradation because we may 
>>>>>>>>>> unintentionally allocate memory on wrong NUMA node. If we do check 
>>>>>>>>>> for this - then allocation of memory on socket 1 from lcore on 
>>>>>>>>>> socket 0 will almost never succeed, because kernel will always give 
>>>>>>>>>> us pages on socket 0.
>>>>>>>>>>
>>>>>>>>>> Put it simply, there is no sane way to make this option work for the 
>>>>>>>>>> new memory subsystem - IMO it should be dropped, and libnuma should 
>>>>>>>>>> be made a hard dependency on Linux.
>>>>>>>>> I agree that new memory model could not work without libnuma, 
>>>>>>>>> i.e. will lead to unpredictable memory allocations with no any 
>>>>>>>>> respect to requested socket_id's. I also agree that 
>>>>>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory 
>>>>>>>>> model.
>>>>>>>>> It looks like we have no other choice than just drop the 
>>>>>>>>> option and make the code unconditional, i.e. have hard dependency on 
>>>>>>>>> libnuma.
>>>>>>>>>
>>>>>>>> We, probably, could compile this code and have hard dependency 
>>>>>>>> only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
>>>>>>> Well, as long as legacy mode stays supported, we have to keep the 
>>>>>>> option. The "drop" part was referring to supporting it under the new 
>>>>>>> memory system, not a literal drop from config files.
>>>>>> The option was introduced because we didn't want to introduce the 
>>>>>> new hard dependency. Since we'll have it anyway, I'm not sure if 
>>>>>> keeping the option for legacy mode makes any sense.
>>>>> Oh yes, you're right. Drop it is!
>>>>>
>>>>>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. 
>>>>>>> Distributions cannot deliver different DPDK versions based on the 
>>>>>>> number of sockets on a particular machine - so it would have to be a 
>>>>>>> hard dependency for distributions anyway (does any distribution ship 
>>>>>>> DPDK without libnuma?).
>>>>>> At least ARMv7 builds commonly does not ship libnuma package.
>>>>> Do you mean libnuma builds for ARMv7 are not available? Or do you mean 
>>>>> the libnuma package is not installed by default?
>>>>>
>>>>> If it's the latter, then i believe it's not installed by default 
>>>>> anywhere, but if using distribution version of DPDK, libnuma will be 
>>>>> taken care of via package manager. Presumably building from source can be 
>>>>> taken care of with pkg-config/meson.
>>>>>
>>>>> Or do you mean ARMv7 does not have libnuma for their arch at all, in any 
>>>>> distro?
>>>> libnuma builds for ARMv7 are not available in most of the distros. 
>>>> I didn't check all, but here is results for Ubuntu:
>>>>        
>>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>> pac
>>>> kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searcho
>>>> n%3 
>>>> Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com
>>>> %7C 
>>>> a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76
>>>> %7C 
>>>> 0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2Bj
>>>> Mra
>>>> BnhiqqpsXkRv2ifI%3D&amp;reserved=0
>>>>
>>>> You may see that Ubuntu 18.04 (bionic) has no libnuma package for 
>>>> 'armhf' and also 'powerpc' platforms.
>>>>
>>> That's a difficulty. Do these platforms support NUMA? In other words, could 
>>> we replace this flag with just outright disabling NUMA support?
>>
>> Many platforms don't support NUMA, so they dont' really need libnuma.
>>
>> Mandating libnuma will also break several things:
>>
>>     - cross build for ARM on x86 - which is among the preferred 
>> method for build by many in ARM community.
>>
>>    - many of the embedded SoCs are without NUMA support, they use 
>> smaller rootf (e.g. Yocto).  It will be a burden to add libnuma there.
>>
> 
> OK, point taken.
> 
> So, the alternative would be to have the ability to outright disable NUMA 
> support (either with a new option, or reworking this one - i would prefer a 
> new one, since this one is confusingly named). Meaning, report all cores as 
> socket 0, report all hardware as socket 0, report all memory as socket 0 and 
> never care about NUMA nodes anywhere.
> 
> Would that work? E.g. by default, make libnuma a hard dependency on x86 Linux 
> (but allow to disable it), but disable it everywhere else?

I think, you may just rename the RTE_EAL_NUMA_AWARE_HUGEPAGES to something like 
RTE_EAL_NUMA_SUPPORT and keep all the defaults as is, i.e.
* globally disabled
* enabled for linux
* disabled for armv7a, dpaa, dpaa2 and stingray.
Meson could handle everything dynamically.

>>
>>>
>>>>>>> For those compiling from source - are there any supported 
>>>>>>> distributions which don't package libnuma? I don't see much 
>>>>>>> sense in keeping libnuma optional, IMO. This is of course up to 
>>>>>>> the tech board to decide, but IMO the "without libnuma it's 
>>>>>>> basically broken" argument is very strong in my opinion :)
>>>>>>>
>>>>>
>>>
>>> --
>>> Thanks,
>>> Anatoly
> 
> 

Reply via email to