On 27-Nov-18 10:26 AM, Hemant Agrawal wrote:
On 11/26/2018 8:55 PM, Asaf Sinai wrote:
+CC Ilia & Sasha.
-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com>
Sent: Monday, November 26, 2018 04:57 PM
To: Ilya Maximets <i.maxim...@samsung.com>; Asaf Sinai <asa...@radware.com>;
dev@dpdk.org; Thomas Monjalon <tho...@monjalon.net>
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in
memory pool allocations, when enabling/disabling this configuration
On 26-Nov-18 2:32 PM, Ilya Maximets wrote:
On 26.11.2018 17:21, Burakov, Anatoly wrote:
On 26-Nov-18 2:10 PM, Ilya Maximets wrote:
On 26.11.2018 16:42, Burakov, Anatoly wrote:
On 26-Nov-18 1:20 PM, Ilya Maximets wrote:
On 26.11.2018 16:16, Ilya Maximets wrote:
On 26.11.2018 15:50, Burakov, Anatoly wrote:
On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
Hi Anatoly,
We did not check it with "testpmd", only with our application.
From the beginning, we did not enable this configuration (look at
attached files), and everything works fine.
Of course we rebuild DPDK, when we change configuration.
Please note that we use DPDK 17.11.3, maybe this is why it works fine?
Just tested with DPDK 17.11, and yes, it does work the way you are describing.
This is not intended behavior. I will look into it.
+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
Looking at the code, i think this config option needs to be reworked and we
should clarify what we mean by this option. It appears that i've misunderstood
what this option actually intended to do, and i also think it's naming could be
improved because it's confusing and misleading.
In 17.11, this option does *not* prevent EAL from using NUMA - it merely
disables using libnuma to perform memory allocation. This looks like intended
(if counter-intuitive) behavior - disabling this option will simply revert DPDK
to working as it did before this option was introduced (i.e. best-effort
allocation). This is why your code still works - because EAL still does
allocate memory on socket 1, and *knows* that it's socket 1 memory. It still
supports NUMA.
The commit message for these changes states that the actual purpose of this option is to
enable "balanced" hugepage allocation. In case of cgroups limitations,
previously, DPDK would've exhausted all hugepages on master core's socket before
attempting to allocate from other sockets, but by the time we've reached cgroups limits
on numbers of hugepages, we might not have reached socket 1 and thus missed out on the
pages we could've allocated, but didn't. Using libnuma solves this issue, because now we
can allocate pages on sockets we want, instead of hoping we won't run out of hugepages
before we get the memory we need.
In 18.05 onwards, this option works differently (and arguably wrong). More
specifically, it disallows allocations on sockets other than 0, and it also
makes it so that EAL does not check which socket the memory *actually* came
from. So, not only allocating memory from socket 1 is disabled, but allocating
from socket 0 may even get you memory from socket 1!
I'd consider this as a bug.
+CC Thomas
The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes
it seem like this option disables NUMA support, which is not the case.
I would also argue that it is not relevant to 18.05+ memory subsystem, and
should only work in legacy mode, because it is *impossible* to make it work
right in the new memory subsystem, and here's why:
Without libnuma, we have no way of "asking" the kernel to allocate a hugepage
on a specific socket - instead, any allocation will most likely happen on socket from
which the allocation came from. For example, if user program's lcore is on socket 1,
allocation on socket 0 will actually allocate a page on socket 1.
If we don't check for page's NUMA node affinity (which is what currently
happens) - we get performance degradation because we may unintentionally
allocate memory on wrong NUMA node. If we do check for this - then allocation
of memory on socket 1 from lcore on socket 0 will almost never succeed, because
kernel will always give us pages on socket 0.
Put it simply, there is no sane way to make this option work for the new memory
subsystem - IMO it should be dropped, and libnuma should be made a hard
dependency on Linux.
I agree that new memory model could not work without libnuma,
i.e. will lead to unpredictable memory allocations with no any
respect to requested socket_id's. I also agree that
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model.
It looks like we have no other choice than just drop the option
and make the code unconditional, i.e. have hard dependency on libnuma.
We, probably, could compile this code and have hard dependency
only for platforms with 'RTE_MAX_NUMA_NODES > 1'.
Well, as long as legacy mode stays supported, we have to keep the option. The
"drop" part was referring to supporting it under the new memory system, not a
literal drop from config files.
The option was introduced because we didn't want to introduce the
new hard dependency. Since we'll have it anyway, I'm not sure if
keeping the option for legacy mode makes any sense.
Oh yes, you're right. Drop it is!
As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions
cannot deliver different DPDK versions based on the number of sockets on a
particular machine - so it would have to be a hard dependency for distributions
anyway (does any distribution ship DPDK without libnuma?).
At least ARMv7 builds commonly does not ship libnuma package.
Do you mean libnuma builds for ARMv7 are not available? Or do you mean the
libnuma package is not installed by default?
If it's the latter, then i believe it's not installed by default anywhere, but
if using distribution version of DPDK, libnuma will be taken care of via
package manager. Presumably building from source can be taken care of with
pkg-config/meson.
Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro?
libnuma builds for ARMv7 are not available in most of the distros. I
didn't check all, but here is results for Ubuntu:
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac
kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3
Dnames%26keywords%3Dlibnuma&data=02%7C01%7CAsafSi%40radware.com%7C
a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C
0%7C0%7C636788410626179927&sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra
BnhiqqpsXkRv2ifI%3D&reserved=0
You may see that Ubuntu 18.04 (bionic) has no libnuma package for
'armhf' and also 'powerpc' platforms.
That's a difficulty. Do these platforms support NUMA? In other words, could we
replace this flag with just outright disabling NUMA support?
Many platforms don't support NUMA, so they dont' really need libnuma.
Mandating libnuma will also break several things:
- cross build for ARM on x86 - which is among the preferred method
for build by many in ARM community.
- many of the embedded SoCs are without NUMA support, they use smaller
rootf (e.g. Yocto). It will be a burden to add libnuma there.
OK, point taken.
So, the alternative would be to have the ability to outright disable
NUMA support (either with a new option, or reworking this one - i would
prefer a new one, since this one is confusingly named). Meaning, report
all cores as socket 0, report all hardware as socket 0, report all
memory as socket 0 and never care about NUMA nodes anywhere.
Would that work? E.g. by default, make libnuma a hard dependency on x86
Linux (but allow to disable it), but disable it everywhere else?
For those compiling from source - are there any supported
distributions which don't package libnuma? I don't see much sense
in keeping libnuma optional, IMO. This is of course up to the tech
board to decide, but IMO the "without libnuma it's basically
broken" argument is very strong in my opinion :)
--
Thanks,
Anatoly
--
Thanks,
Anatoly