On 26.11.2018 16:16, Ilya Maximets wrote: > On 26.11.2018 15:50, Burakov, Anatoly wrote: >> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote: >>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote: >>>> Hi Anatoly, >>>> >>>> We did not check it with "testpmd", only with our application. >>>> From the beginning, we did not enable this configuration (look at >>>> attached files), and everything works fine. >>>> Of course we rebuild DPDK, when we change configuration. >>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine? >>> >>> Just tested with DPDK 17.11, and yes, it does work the way you are >>> describing. This is not intended behavior. I will look into it. >>> >> >> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES. >> >> Looking at the code, i think this config option needs to be reworked and we >> should clarify what we mean by this option. It appears that i've >> misunderstood what this option actually intended to do, and i also think >> it's naming could be improved because it's confusing and misleading. >> >> In 17.11, this option does *not* prevent EAL from using NUMA - it merely >> disables using libnuma to perform memory allocation. This looks like >> intended (if counter-intuitive) behavior - disabling this option will simply >> revert DPDK to working as it did before this option was introduced (i.e. >> best-effort allocation). This is why your code still works - because EAL >> still does allocate memory on socket 1, and *knows* that it's socket 1 >> memory. It still supports NUMA. >> >> The commit message for these changes states that the actual purpose of this >> option is to enable "balanced" hugepage allocation. In case of cgroups >> limitations, previously, DPDK would've exhausted all hugepages on master >> core's socket before attempting to allocate from other sockets, but by the >> time we've reached cgroups limits on numbers of hugepages, we might not have >> reached socket 1 and thus missed out on the pages we could've allocated, but >> didn't. Using libnuma solves this issue, because now we can allocate pages >> on sockets we want, instead of hoping we won't run out of hugepages before >> we get the memory we need. >> >> In 18.05 onwards, this option works differently (and arguably wrong). More >> specifically, it disallows allocations on sockets other than 0, and it also >> makes it so that EAL does not check which socket the memory *actually* came >> from. So, not only allocating memory from socket 1 is disabled, but >> allocating from socket 0 may even get you memory from socket 1! > > I'd consider this as a bug. > >> >> +CC Thomas >> >> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it >> makes it seem like this option disables NUMA support, which is not the case. >> >> I would also argue that it is not relevant to 18.05+ memory subsystem, and >> should only work in legacy mode, because it is *impossible* to make it work >> right in the new memory subsystem, and here's why: >> >> Without libnuma, we have no way of "asking" the kernel to allocate a >> hugepage on a specific socket - instead, any allocation will most likely >> happen on socket from which the allocation came from. For example, if user >> program's lcore is on socket 1, allocation on socket 0 will actually >> allocate a page on socket 1. >> >> If we don't check for page's NUMA node affinity (which is what currently >> happens) - we get performance degradation because we may unintentionally >> allocate memory on wrong NUMA node. If we do check for this - then >> allocation of memory on socket 1 from lcore on socket 0 will almost never >> succeed, because kernel will always give us pages on socket 0. >> >> Put it simply, there is no sane way to make this option work for the new >> memory subsystem - IMO it should be dropped, and libnuma should be made a >> hard dependency on Linux. > > I agree that new memory model could not work without libnuma, i.e. will > lead to unpredictable memory allocations with no any respect to requested > socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only > sane for a legacy memory model. > It looks like we have no other choice than just drop the option and make > the code unconditional, i.e. have hard dependency on libnuma. >
We, probably, could compile this code and have hard dependency only for platforms with 'RTE_MAX_NUMA_NODES > 1'. > Best regards, Ilya Maximets.