+CC Ilia & Sasha.

-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com> 
Sent: Monday, November 26, 2018 04:57 PM
To: Ilya Maximets <i.maxim...@samsung.com>; Asaf Sinai <asa...@radware.com>; 
dev@dpdk.org; Thomas Monjalon <tho...@monjalon.net>
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in 
memory pool allocations, when enabling/disabling this configuration

On 26-Nov-18 2:32 PM, Ilya Maximets wrote:
> On 26.11.2018 17:21, Burakov, Anatoly wrote:
>> On 26-Nov-18 2:10 PM, Ilya Maximets wrote:
>>> On 26.11.2018 16:42, Burakov, Anatoly wrote:
>>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote:
>>>>> On 26.11.2018 16:16, Ilya Maximets wrote:
>>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote:
>>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
>>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
>>>>>>>>> Hi Anatoly,
>>>>>>>>>
>>>>>>>>> We did not check it with "testpmd", only with our application.
>>>>>>>>>     From the beginning, we did not enable this configuration (look at 
>>>>>>>>> attached files), and everything works fine.
>>>>>>>>> Of course we rebuild DPDK, when we change configuration.
>>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine?
>>>>>>>>
>>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are 
>>>>>>>> describing. This is not intended behavior. I will look into it.
>>>>>>>>
>>>>>>>
>>>>>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
>>>>>>>
>>>>>>> Looking at the code, i think this config option needs to be reworked 
>>>>>>> and we should clarify what we mean by this option. It appears that i've 
>>>>>>> misunderstood what this option actually intended to do, and i also 
>>>>>>> think it's naming could be improved because it's confusing and 
>>>>>>> misleading.
>>>>>>>
>>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it 
>>>>>>> merely disables using libnuma to perform memory allocation. This looks 
>>>>>>> like intended (if counter-intuitive) behavior - disabling this option 
>>>>>>> will simply revert DPDK to working as it did before this option was 
>>>>>>> introduced (i.e. best-effort allocation). This is why your code still 
>>>>>>> works - because EAL still does allocate memory on socket 1, and *knows* 
>>>>>>> that it's socket 1 memory. It still supports NUMA.
>>>>>>>
>>>>>>> The commit message for these changes states that the actual purpose of 
>>>>>>> this option is to enable "balanced" hugepage allocation. In case of 
>>>>>>> cgroups limitations, previously, DPDK would've exhausted all hugepages 
>>>>>>> on master core's socket before attempting to allocate from other 
>>>>>>> sockets, but by the time we've reached cgroups limits on numbers of 
>>>>>>> hugepages, we might not have reached socket 1 and thus missed out on 
>>>>>>> the pages we could've allocated, but didn't. Using libnuma solves this 
>>>>>>> issue, because now we can allocate pages on sockets we want, instead of 
>>>>>>> hoping we won't run out of hugepages before we get the memory we need.
>>>>>>>
>>>>>>> In 18.05 onwards, this option works differently (and arguably wrong). 
>>>>>>> More specifically, it disallows allocations on sockets other than 0, 
>>>>>>> and it also makes it so that EAL does not check which socket the memory 
>>>>>>> *actually* came from. So, not only allocating memory from socket 1 is 
>>>>>>> disabled, but allocating from socket 0 may even get you memory from 
>>>>>>> socket 1!
>>>>>>
>>>>>> I'd consider this as a bug.
>>>>>>
>>>>>>>
>>>>>>> +CC Thomas
>>>>>>>
>>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because 
>>>>>>> it makes it seem like this option disables NUMA support, which is not 
>>>>>>> the case.
>>>>>>>
>>>>>>> I would also argue that it is not relevant to 18.05+ memory subsystem, 
>>>>>>> and should only work in legacy mode, because it is *impossible* to make 
>>>>>>> it work right in the new memory subsystem, and here's why:
>>>>>>>
>>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate a 
>>>>>>> hugepage on a specific socket - instead, any allocation will most 
>>>>>>> likely happen on socket from which the allocation came from. For 
>>>>>>> example, if user program's lcore is on socket 1, allocation on socket 0 
>>>>>>> will actually allocate a page on socket 1.
>>>>>>>
>>>>>>> If we don't check for page's NUMA node affinity (which is what 
>>>>>>> currently happens) - we get performance degradation because we may 
>>>>>>> unintentionally allocate memory on wrong NUMA node. If we do check for 
>>>>>>> this - then allocation of memory on socket 1 from lcore on socket 0 
>>>>>>> will almost never succeed, because kernel will always give us pages on 
>>>>>>> socket 0.
>>>>>>>
>>>>>>> Put it simply, there is no sane way to make this option work for the 
>>>>>>> new memory subsystem - IMO it should be dropped, and libnuma should be 
>>>>>>> made a hard dependency on Linux.
>>>>>>
>>>>>> I agree that new memory model could not work without libnuma, 
>>>>>> i.e. will lead to unpredictable memory allocations with no any 
>>>>>> respect to requested socket_id's. I also agree that 
>>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory 
>>>>>> model.
>>>>>> It looks like we have no other choice than just drop the option 
>>>>>> and make the code unconditional, i.e. have hard dependency on libnuma.
>>>>>>
>>>>>
>>>>> We, probably, could compile this code and have hard dependency 
>>>>> only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
>>>>
>>>> Well, as long as legacy mode stays supported, we have to keep the option. 
>>>> The "drop" part was referring to supporting it under the new memory 
>>>> system, not a literal drop from config files.
>>>
>>> The option was introduced because we didn't want to introduce the 
>>> new hard dependency. Since we'll have it anyway, I'm not sure if 
>>> keeping the option for legacy mode makes any sense.
>>
>> Oh yes, you're right. Drop it is!
>>
>>>
>>>>
>>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions 
>>>> cannot deliver different DPDK versions based on the number of sockets on a 
>>>> particular machine - so it would have to be a hard dependency for 
>>>> distributions anyway (does any distribution ship DPDK without libnuma?).
>>>
>>> At least ARMv7 builds commonly does not ship libnuma package.
>>
>> Do you mean libnuma builds for ARMv7 are not available? Or do you mean the 
>> libnuma package is not installed by default?
>>
>> If it's the latter, then i believe it's not installed by default anywhere, 
>> but if using distribution version of DPDK, libnuma will be taken care of via 
>> package manager. Presumably building from source can be taken care of with 
>> pkg-config/meson.
>>
>> Or do you mean ARMv7 does not have libnuma for their arch at all, in any 
>> distro?
> 
> libnuma builds for ARMv7 are not available in most of the distros. I 
> didn't check all, but here is results for Ubuntu:
>      
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac
> kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3
> Dnames%26keywords%3Dlibnuma&amp;data=02%7C01%7CAsafSi%40radware.com%7C
> a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C
> 0%7C0%7C636788410626179927&amp;sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra
> BnhiqqpsXkRv2ifI%3D&amp;reserved=0
> 
> You may see that Ubuntu 18.04 (bionic) has no libnuma package for 
> 'armhf' and also 'powerpc' platforms.
> 

That's a difficulty. Do these platforms support NUMA? In other words, could we 
replace this flag with just outright disabling NUMA support?

>>
>>>
>>>>
>>>> For those compiling from source - are there any supported 
>>>> distributions which don't package libnuma? I don't see much sense 
>>>> in keeping libnuma optional, IMO. This is of course up to the tech 
>>>> board to decide, but IMO the "without libnuma it's basically 
>>>> broken" argument is very strong in my opinion :)
>>>>
>>>
>>
>>
> 


--
Thanks,
Anatoly

Reply via email to