Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration

Burakov, Anatoly Mon, 26 Nov 2018 05:43:30 -0800

On 26-Nov-18 1:20 PM, Ilya Maximets wrote:

On 26.11.2018 16:16, Ilya Maximets wrote:

On 26.11.2018 15:50, Burakov, Anatoly wrote:

On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:

On 26-Nov-18 11:33 AM, Asaf Sinai wrote:

Hi Anatoly,


We did not check it with "testpmd", only with our application.
  From the beginning, we did not enable this configuration (look at attached 
files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?


Just tested with DPDK 17.11, and yes, it does work the way you are describing. 
This is not intended behavior. I will look into it.


+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.

Looking at the code, i think this config option needs to be reworked and we 
should clarify what we mean by this option. It appears that i've misunderstood 
what this option actually intended to do, and i also think it's naming could be 
improved because it's confusing and misleading.

In 17.11, this option does *not* prevent EAL from using NUMA - it merely 
disables using libnuma to perform memory allocation. This looks like intended 
(if counter-intuitive) behavior - disabling this option will simply revert DPDK 
to working as it did before this option was introduced (i.e. best-effort 
allocation). This is why your code still works - because EAL still does 
allocate memory on socket 1, and *knows* that it's socket 1 memory. It still 
supports NUMA.

The commit message for these changes states that the actual purpose of this option is to 
enable "balanced" hugepage allocation. In case of cgroups limitations, 
previously, DPDK would've exhausted all hugepages on master core's socket before 
attempting to allocate from other sockets, but by the time we've reached cgroups limits 
on numbers of hugepages, we might not have reached socket 1 and thus missed out on the 
pages we could've allocated, but didn't. Using libnuma solves this issue, because now we 
can allocate pages on sockets we want, instead of hoping we won't run out of hugepages 
before we get the memory we need.

In 18.05 onwards, this option works differently (and arguably wrong). More 
specifically, it disallows allocations on sockets other than 0, and it also 
makes it so that EAL does not check which socket the memory *actually* came 
from. So, not only allocating memory from socket 1 is disabled, but allocating 
from socket 0 may even get you memory from socket 1!


I'd consider this as a bug.


+CC Thomas

The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes 
it seem like this option disables NUMA support, which is not the case.

I would also argue that it is not relevant to 18.05+ memory subsystem, and 
should only work in legacy mode, because it is *impossible* to make it work 
right in the new memory subsystem, and here's why:

Without libnuma, we have no way of "asking" the kernel to allocate a hugepage 
on a specific socket - instead, any allocation will most likely happen on socket from 
which the allocation came from. For example, if user program's lcore is on socket 1, 
allocation on socket 0 will actually allocate a page on socket 1.

If we don't check for page's NUMA node affinity (which is what currently 
happens) - we get performance degradation because we may unintentionally 
allocate memory on wrong NUMA node. If we do check for this - then allocation 
of memory on socket 1 from lcore on socket 0 will almost never succeed, because 
kernel will always give us pages on socket 0.

Put it simply, there is no sane way to make this option work for the new memory 
subsystem - IMO it should be dropped, and libnuma should be made a hard 
dependency on Linux.


I agree that new memory model could not work without libnuma, i.e. will
lead to unpredictable memory allocations with no any respect to requested
socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
sane for a legacy memory model.
It looks like we have no other choice than just drop the option and make
the code unconditional, i.e. have hard dependency on libnuma.


We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.

Well, as long as legacy mode stays supported, we have to keep theoption. The "drop" part was referring to supporting it under the newmemory system, not a literal drop from config files.

As for using RTE_MAX_NUMA_NODES, i don't think it's merited.Distributions cannot deliver different DPDK versions based on the numberof sockets on a particular machine - so it would have to be a harddependency for distributions anyway (does any distribution ship DPDKwithout libnuma?).

For those compiling from source - are there any supported distributionswhich don't package libnuma? I don't see much sense in keeping libnumaoptional, IMO. This is of course up to the tech board to decide, but IMOthe "without libnuma it's basically broken" argument is very strong inmy opinion :)


--
Thanks,
Anatoly

Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration

Reply via email to