+CC Ilia & Sasha. -----Original Message----- From: Burakov, Anatoly <anatoly.bura...@intel.com> Sent: Monday, November 26, 2018 04:57 PM To: Ilya Maximets <i.maxim...@samsung.com>; Asaf Sinai <asa...@radware.com>; dev@dpdk.org; Thomas Monjalon <tho...@monjalon.net> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
On 26-Nov-18 2:32 PM, Ilya Maximets wrote: > On 26.11.2018 17:21, Burakov, Anatoly wrote: >> On 26-Nov-18 2:10 PM, Ilya Maximets wrote: >>> On 26.11.2018 16:42, Burakov, Anatoly wrote: >>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote: >>>>> On 26.11.2018 16:16, Ilya Maximets wrote: >>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote: >>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote: >>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote: >>>>>>>>> Hi Anatoly, >>>>>>>>> >>>>>>>>> We did not check it with "testpmd", only with our application. >>>>>>>>> From the beginning, we did not enable this configuration (look at >>>>>>>>> attached files), and everything works fine. >>>>>>>>> Of course we rebuild DPDK, when we change configuration. >>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine? >>>>>>>> >>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are >>>>>>>> describing. This is not intended behavior. I will look into it. >>>>>>>> >>>>>>> >>>>>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES. >>>>>>> >>>>>>> Looking at the code, i think this config option needs to be reworked >>>>>>> and we should clarify what we mean by this option. It appears that i've >>>>>>> misunderstood what this option actually intended to do, and i also >>>>>>> think it's naming could be improved because it's confusing and >>>>>>> misleading. >>>>>>> >>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it >>>>>>> merely disables using libnuma to perform memory allocation. This looks >>>>>>> like intended (if counter-intuitive) behavior - disabling this option >>>>>>> will simply revert DPDK to working as it did before this option was >>>>>>> introduced (i.e. best-effort allocation). This is why your code still >>>>>>> works - because EAL still does allocate memory on socket 1, and *knows* >>>>>>> that it's socket 1 memory. It still supports NUMA. >>>>>>> >>>>>>> The commit message for these changes states that the actual purpose of >>>>>>> this option is to enable "balanced" hugepage allocation. In case of >>>>>>> cgroups limitations, previously, DPDK would've exhausted all hugepages >>>>>>> on master core's socket before attempting to allocate from other >>>>>>> sockets, but by the time we've reached cgroups limits on numbers of >>>>>>> hugepages, we might not have reached socket 1 and thus missed out on >>>>>>> the pages we could've allocated, but didn't. Using libnuma solves this >>>>>>> issue, because now we can allocate pages on sockets we want, instead of >>>>>>> hoping we won't run out of hugepages before we get the memory we need. >>>>>>> >>>>>>> In 18.05 onwards, this option works differently (and arguably wrong). >>>>>>> More specifically, it disallows allocations on sockets other than 0, >>>>>>> and it also makes it so that EAL does not check which socket the memory >>>>>>> *actually* came from. So, not only allocating memory from socket 1 is >>>>>>> disabled, but allocating from socket 0 may even get you memory from >>>>>>> socket 1! >>>>>> >>>>>> I'd consider this as a bug. >>>>>> >>>>>>> >>>>>>> +CC Thomas >>>>>>> >>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because >>>>>>> it makes it seem like this option disables NUMA support, which is not >>>>>>> the case. >>>>>>> >>>>>>> I would also argue that it is not relevant to 18.05+ memory subsystem, >>>>>>> and should only work in legacy mode, because it is *impossible* to make >>>>>>> it work right in the new memory subsystem, and here's why: >>>>>>> >>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate a >>>>>>> hugepage on a specific socket - instead, any allocation will most >>>>>>> likely happen on socket from which the allocation came from. For >>>>>>> example, if user program's lcore is on socket 1, allocation on socket 0 >>>>>>> will actually allocate a page on socket 1. >>>>>>> >>>>>>> If we don't check for page's NUMA node affinity (which is what >>>>>>> currently happens) - we get performance degradation because we may >>>>>>> unintentionally allocate memory on wrong NUMA node. If we do check for >>>>>>> this - then allocation of memory on socket 1 from lcore on socket 0 >>>>>>> will almost never succeed, because kernel will always give us pages on >>>>>>> socket 0. >>>>>>> >>>>>>> Put it simply, there is no sane way to make this option work for the >>>>>>> new memory subsystem - IMO it should be dropped, and libnuma should be >>>>>>> made a hard dependency on Linux. >>>>>> >>>>>> I agree that new memory model could not work without libnuma, >>>>>> i.e. will lead to unpredictable memory allocations with no any >>>>>> respect to requested socket_id's. I also agree that >>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory >>>>>> model. >>>>>> It looks like we have no other choice than just drop the option >>>>>> and make the code unconditional, i.e. have hard dependency on libnuma. >>>>>> >>>>> >>>>> We, probably, could compile this code and have hard dependency >>>>> only for platforms with 'RTE_MAX_NUMA_NODES > 1'. >>>> >>>> Well, as long as legacy mode stays supported, we have to keep the option. >>>> The "drop" part was referring to supporting it under the new memory >>>> system, not a literal drop from config files. >>> >>> The option was introduced because we didn't want to introduce the >>> new hard dependency. Since we'll have it anyway, I'm not sure if >>> keeping the option for legacy mode makes any sense. >> >> Oh yes, you're right. Drop it is! >> >>> >>>> >>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions >>>> cannot deliver different DPDK versions based on the number of sockets on a >>>> particular machine - so it would have to be a hard dependency for >>>> distributions anyway (does any distribution ship DPDK without libnuma?). >>> >>> At least ARMv7 builds commonly does not ship libnuma package. >> >> Do you mean libnuma builds for ARMv7 are not available? Or do you mean the >> libnuma package is not installed by default? >> >> If it's the latter, then i believe it's not installed by default anywhere, >> but if using distribution version of DPDK, libnuma will be taken care of via >> package manager. Presumably building from source can be taken care of with >> pkg-config/meson. >> >> Or do you mean ARMv7 does not have libnuma for their arch at all, in any >> distro? > > libnuma builds for ARMv7 are not available in most of the distros. I > didn't check all, but here is results for Ubuntu: > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac > kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3 > Dnames%26keywords%3Dlibnuma&data=02%7C01%7CAsafSi%40radware.com%7C > a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C > 0%7C0%7C636788410626179927&sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra > BnhiqqpsXkRv2ifI%3D&reserved=0 > > You may see that Ubuntu 18.04 (bionic) has no libnuma package for > 'armhf' and also 'powerpc' platforms. > That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support? >> >>> >>>> >>>> For those compiling from source - are there any supported >>>> distributions which don't package libnuma? I don't see much sense >>>> in keeping libnuma optional, IMO. This is of course up to the tech >>>> board to decide, but IMO the "without libnuma it's basically >>>> broken" argument is very strong in my opinion :) >>>> >>> >> >> > -- Thanks, Anatoly