Hi Anatoly,

Thank you very much for the useful explanations!

Thanks,
Asaf

-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com> 
Sent: Monday, December 10, 2018 12:10 PM
To: Asaf Sinai <asa...@radware.com>; Ilya Maximets <i.maxim...@samsung.com>; 
Hemant Agrawal <hemant.agra...@nxp.com>; dev@dpdk.org; Thomas Monjalon 
<tho...@monjalon.net>
Cc: Ilia Ferdman <il...@radware.com>; Sasha Hodos <sas...@radware.com>
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in 
memory pool allocations, when enabling/disabling this configuration

On 09-Dec-18 8:14 AM, Asaf Sinai wrote:
> Hi all,
> 
> Thanks for the detailed explanations!
> 
> So, what we understood from that, is the following (please correct, if it is 
> wrong):
> Before 18.05 version:
> - Dividing huge pages between NUMAs was based, by default, on Linux good will.
> - Enforcing Linux to divide huge pages between NUMAs, required enabling 
> configuration option "CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES".
> - The enforcement was done via "libnuma" library.
> 
>  From 18.05 version:
> - The mentioned configuration option is ignored, so that by default, all huge 
> pages are allocated on NUMA 0.
> - if "libnuma" library exists in system, then huge pages will be divided 
> between NUMAs, without any special configuration.
> - The above is relevant to architectures that support NUMA, e.g. X86 (which 
> we use).
> 
> Thanks,
> Asaf

Hi Asaf,

Before 18.05, the above description is correct.

Since 18.05, it's not _quite_ like that. There are two memory modes in
18.05 - default and legacy. Legacy mode pretty much behaves like
pre-18.05 code.

Default memory mode without the CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES for all 
intents and purposes should be considered unsupported for post-18.05 code, and 
libnuma should be considered to be a hard dependency for non-legacy, NUMA-aware 
code. Without this option, EAL will disallow allocations on sockets other than 
0, but on a NUMA-enabled system, you won't necessarily get memory from socket 0 
- it will *say* it is on socket 0, but it may not *actually* be the case, 
because without libnuma we do not check where it was allocated.

Reasons for the above behavior is simple: legacy mem mode preallocates all 
memory in advance. This gives us an opportunity to figure out page socket 
affinity at initialization, and not worry about it afterwards. 
Non-legacy mode doesn't have the luxury of preallocating all memory in advance, 
instead we allocate memory on the fly - which means that whenever an allocation 
is requested, we need memory not just anywhere (like in legacy init case), but 
located on a specific socket - we cannot "sort it out later" like we do with 
legacy mem. Without libnuma, we cannot get this functionality.

> 
> -----Original Message-----
> From: Ilya Maximets <i.maxim...@samsung.com>
> Sent: Tuesday, November 27, 2018 06:50 PM
> To: Burakov, Anatoly <anatoly.bura...@intel.com>; Hemant Agrawal 
> <hemant.agra...@nxp.com>; Asaf Sinai <asa...@radware.com>; 
> dev@dpdk.org; Thomas Monjalon <tho...@monjalon.net>
> Cc: Ilia Ferdman <il...@radware.com>; Sasha Hodos <sas...@radware.com>
> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no 
> difference in memory pool allocations, when enabling/disabling this 
> configuration
> 
> On 27.11.2018 13:33, Burakov, Anatoly wrote:
>> On 27-Nov-18 10:26 AM, Hemant Agrawal wrote:
>>>
>>> On 11/26/2018 8:55 PM, Asaf Sinai wrote:
>>>> +CC Ilia & Sasha.
>>>>
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly <anatoly.bura...@intel.com>
>>>> Sent: Monday, November 26, 2018 04:57 PM
>>>> To: Ilya Maximets <i.maxim...@samsung.com>; Asaf Sinai 
>>>> <asa...@radware.com>; dev@dpdk.org; Thomas Monjalon 
>>>> <tho...@monjalon.net>
>>>> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no 
>>>> difference in memory pool allocations, when enabling/disabling this 
>>>> configuration
>>>>
>>>> On 26-Nov-18 2:32 PM, Ilya Maximets wrote:
>>>>> On 26.11.2018 17:21, Burakov, Anatoly wrote:
>>>>>> On 26-Nov-18 2:10 PM, Ilya Maximets wrote:
>>>>>>> On 26.11.2018 16:42, Burakov, Anatoly wrote:
>>>>>>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote:
>>>>>>>>> On 26.11.2018 16:16, Ilya Maximets wrote:
>>>>>>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote:
>>>>>>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
>>>>>>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
>>>>>>>>>>>>> Hi Anatoly,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We did not check it with "testpmd", only with our application.
>>>>>>>>>>>>>        From the beginning, we did not enable this configuration 
>>>>>>>>>>>>> (look at attached files), and everything works fine.
>>>>>>>>>>>>> Of course we rebuild DPDK, when we change configuration.
>>>>>>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works 
>>>>>>>>>>>>> fine?
>>>>>>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are 
>>>>>>>>>>>> describing. This is not intended behavior. I will look into it.
>>>>>>>>>>>>
>>>>>>>>>>> +CC author of commit introducing 
>>>>>>>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
>>>>>>>>>>>
>>>>>>>>>>> Looking at the code, i think this config option needs to be 
>>>>>>>>>>> reworked and we should clarify what we mean by this option. It 
>>>>>>>>>>> appears that i've misunderstood what this option actually intended 
>>>>>>>>>>> to do, and i also think it's naming could be improved because it's 
>>>>>>>>>>> confusing and misleading.
>>>>>>>>>>>
>>>>>>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it 
>>>>>>>>>>> merely disables using libnuma to perform memory allocation. This 
>>>>>>>>>>> looks like intended (if counter-intuitive) behavior - disabling 
>>>>>>>>>>> this option will simply revert DPDK to working as it did before 
>>>>>>>>>>> this option was introduced (i.e. best-effort allocation). This is 
>>>>>>>>>>> why your code still works - because EAL still does allocate memory 
>>>>>>>>>>> on socket 1, and *knows* that it's socket 1 memory. It still 
>>>>>>>>>>> supports NUMA.
>>>>>>>>>>>
>>>>>>>>>>> The commit message for these changes states that the actual purpose 
>>>>>>>>>>> of this option is to enable "balanced" hugepage allocation. In case 
>>>>>>>>>>> of cgroups limitations, previously, DPDK would've exhausted all 
>>>>>>>>>>> hugepages on master core's socket before attempting to allocate 
>>>>>>>>>>> from other sockets, but by the time we've reached cgroups limits on 
>>>>>>>>>>> numbers of hugepages, we might not have reached socket 1 and thus 
>>>>>>>>>>> missed out on the pages we could've allocated, but didn't. Using 
>>>>>>>>>>> libnuma solves this issue, because now we can allocate pages on 
>>>>>>>>>>> sockets we want, instead of hoping we won't run out of hugepages 
>>>>>>>>>>> before we get the memory we need.
>>>>>>>>>>>
>>>>>>>>>>> In 18.05 onwards, this option works differently (and arguably 
>>>>>>>>>>> wrong). More specifically, it disallows allocations on sockets 
>>>>>>>>>>> other than 0, and it also makes it so that EAL does not check which 
>>>>>>>>>>> socket the memory *actually* came from. So, not only allocating 
>>>>>>>>>>> memory from socket 1 is disabled, but allocating from socket 0 may 
>>>>>>>>>>> even get you memory from socket 1!
>>>>>>>>>> I'd consider this as a bug.
>>>>>>>>>>
>>>>>>>>>>> +CC Thomas
>>>>>>>>>>>
>>>>>>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, 
>>>>>>>>>>> because it makes it seem like this option disables NUMA support, 
>>>>>>>>>>> which is not the case.
>>>>>>>>>>>
>>>>>>>>>>> I would also argue that it is not relevant to 18.05+ memory 
>>>>>>>>>>> subsystem, and should only work in legacy mode, because it is 
>>>>>>>>>>> *impossible* to make it work right in the new memory subsystem, and 
>>>>>>>>>>> here's why:
>>>>>>>>>>>
>>>>>>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate 
>>>>>>>>>>> a hugepage on a specific socket - instead, any allocation will most 
>>>>>>>>>>> likely happen on socket from which the allocation came from. For 
>>>>>>>>>>> example, if user program's lcore is on socket 1, allocation on 
>>>>>>>>>>> socket 0 will actually allocate a page on socket 1.
>>>>>>>>>>>
>>>>>>>>>>> If we don't check for page's NUMA node affinity (which is what 
>>>>>>>>>>> currently happens) - we get performance degradation because we may 
>>>>>>>>>>> unintentionally allocate memory on wrong NUMA node. If we do check 
>>>>>>>>>>> for this - then allocation of memory on socket 1 from lcore on 
>>>>>>>>>>> socket 0 will almost never succeed, because kernel will always give 
>>>>>>>>>>> us pages on socket 0.
>>>>>>>>>>>
>>>>>>>>>>> Put it simply, there is no sane way to make this option work for 
>>>>>>>>>>> the new memory subsystem - IMO it should be dropped, and libnuma 
>>>>>>>>>>> should be made a hard dependency on Linux.
>>>>>>>>>> I agree that new memory model could not work without libnuma, 
>>>>>>>>>> i.e. will lead to unpredictable memory allocations with no 
>>>>>>>>>> any respect to requested socket_id's. I also agree that 
>>>>>>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory 
>>>>>>>>>> model.
>>>>>>>>>> It looks like we have no other choice than just drop the 
>>>>>>>>>> option and make the code unconditional, i.e. have hard dependency on 
>>>>>>>>>> libnuma.
>>>>>>>>>>
>>>>>>>>> We, probably, could compile this code and have hard dependency 
>>>>>>>>> only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
>>>>>>>> Well, as long as legacy mode stays supported, we have to keep the 
>>>>>>>> option. The "drop" part was referring to supporting it under the new 
>>>>>>>> memory system, not a literal drop from config files.
>>>>>>> The option was introduced because we didn't want to introduce 
>>>>>>> the new hard dependency. Since we'll have it anyway, I'm not 
>>>>>>> sure if keeping the option for legacy mode makes any sense.
>>>>>> Oh yes, you're right. Drop it is!
>>>>>>
>>>>>>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. 
>>>>>>>> Distributions cannot deliver different DPDK versions based on the 
>>>>>>>> number of sockets on a particular machine - so it would have to be a 
>>>>>>>> hard dependency for distributions anyway (does any distribution ship 
>>>>>>>> DPDK without libnuma?).
>>>>>>> At least ARMv7 builds commonly does not ship libnuma package.
>>>>>> Do you mean libnuma builds for ARMv7 are not available? Or do you mean 
>>>>>> the libnuma package is not installed by default?
>>>>>>
>>>>>> If it's the latter, then i believe it's not installed by default 
>>>>>> anywhere, but if using distribution version of DPDK, libnuma will be 
>>>>>> taken care of via package manager. Presumably building from source can 
>>>>>> be taken care of with pkg-config/meson.
>>>>>>
>>>>>> Or do you mean ARMv7 does not have libnuma for their arch at all, in any 
>>>>>> distro?
>>>>> libnuma builds for ARMv7 are not available in most of the distros.
>>>>> I didn't check all, but here is results for Ubuntu:
>>>>>         
>>>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>>>> F
>>>>> pac
>>>>> kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26search
>>>>> o
>>>>> n%3
>>>>> Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.co
>>>>> m
>>>>> %7C
>>>>> a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b7
>>>>> 6
>>>>> %7C
>>>>> 0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2B
>>>>> j
>>>>> Mra
>>>>> BnhiqqpsXkRv2ifI%3D&amp;reserved=0
>>>>>
>>>>> You may see that Ubuntu 18.04 (bionic) has no libnuma package for 
>>>>> 'armhf' and also 'powerpc' platforms.
>>>>>
>>>> That's a difficulty. Do these platforms support NUMA? In other words, 
>>>> could we replace this flag with just outright disabling NUMA support?
>>>
>>> Many platforms don't support NUMA, so they dont' really need libnuma.
>>>
>>> Mandating libnuma will also break several things:
>>>
>>>      - cross build for ARM on x86 - which is among the preferred 
>>> method for build by many in ARM community.
>>>
>>>     - many of the embedded SoCs are without NUMA support, they use 
>>> smaller rootf (e.g. Yocto).  It will be a burden to add libnuma there.
>>>
>>
>> OK, point taken.
>>
>> So, the alternative would be to have the ability to outright disable NUMA 
>> support (either with a new option, or reworking this one - i would prefer a 
>> new one, since this one is confusingly named). Meaning, report all cores as 
>> socket 0, report all hardware as socket 0, report all memory as socket 0 and 
>> never care about NUMA nodes anywhere.
>>
>> Would that work? E.g. by default, make libnuma a hard dependency on x86 
>> Linux (but allow to disable it), but disable it everywhere else?
> 
> I think, you may just rename the RTE_EAL_NUMA_AWARE_HUGEPAGES to something 
> like RTE_EAL_NUMA_SUPPORT and keep all the defaults as is, i.e.
> * globally disabled
> * enabled for linux
> * disabled for armv7a, dpaa, dpaa2 and stingray.
> Meson could handle everything dynamically.
> 
>>>
>>>>
>>>>>>>> For those compiling from source - are there any supported 
>>>>>>>> distributions which don't package libnuma? I don't see much 
>>>>>>>> sense in keeping libnuma optional, IMO. This is of course up to 
>>>>>>>> the tech board to decide, but IMO the "without libnuma it's 
>>>>>>>> basically broken" argument is very strong in my opinion :)
>>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Anatoly
>>
>>


--
Thanks,
Anatoly

Reply via email to