On 26.11.2018 16:16, Ilya Maximets wrote:
> On 26.11.2018 15:50, Burakov, Anatoly wrote:
>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
>>>> Hi Anatoly,
>>>>
>>>> We did not check it with "testpmd", only with our application.
>>>>  From the beginning, we did not enable this configuration (look at 
>>>> attached files), and everything works fine.
>>>> Of course we rebuild DPDK, when we change configuration.
>>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine?
>>>
>>> Just tested with DPDK 17.11, and yes, it does work the way you are 
>>> describing. This is not intended behavior. I will look into it.
>>>
>>
>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
>>
>> Looking at the code, i think this config option needs to be reworked and we 
>> should clarify what we mean by this option. It appears that i've 
>> misunderstood what this option actually intended to do, and i also think 
>> it's naming could be improved because it's confusing and misleading.
>>
>> In 17.11, this option does *not* prevent EAL from using NUMA - it merely 
>> disables using libnuma to perform memory allocation. This looks like 
>> intended (if counter-intuitive) behavior - disabling this option will simply 
>> revert DPDK to working as it did before this option was introduced (i.e. 
>> best-effort allocation). This is why your code still works - because EAL 
>> still does allocate memory on socket 1, and *knows* that it's socket 1 
>> memory. It still supports NUMA.
>>
>> The commit message for these changes states that the actual purpose of this 
>> option is to enable "balanced" hugepage allocation. In case of cgroups 
>> limitations, previously, DPDK would've exhausted all hugepages on master 
>> core's socket before attempting to allocate from other sockets, but by the 
>> time we've reached cgroups limits on numbers of hugepages, we might not have 
>> reached socket 1 and thus missed out on the pages we could've allocated, but 
>> didn't. Using libnuma solves this issue, because now we can allocate pages 
>> on sockets we want, instead of hoping we won't run out of hugepages before 
>> we get the memory we need.
>>
>> In 18.05 onwards, this option works differently (and arguably wrong). More 
>> specifically, it disallows allocations on sockets other than 0, and it also 
>> makes it so that EAL does not check which socket the memory *actually* came 
>> from. So, not only allocating memory from socket 1 is disabled, but 
>> allocating from socket 0 may even get you memory from socket 1!
> 
> I'd consider this as a bug.
> 
>>
>> +CC Thomas
>>
>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it 
>> makes it seem like this option disables NUMA support, which is not the case.
>>
>> I would also argue that it is not relevant to 18.05+ memory subsystem, and 
>> should only work in legacy mode, because it is *impossible* to make it work 
>> right in the new memory subsystem, and here's why:
>>
>> Without libnuma, we have no way of "asking" the kernel to allocate a 
>> hugepage on a specific socket - instead, any allocation will most likely 
>> happen on socket from which the allocation came from. For example, if user 
>> program's lcore is on socket 1, allocation on socket 0 will actually 
>> allocate a page on socket 1.
>>
>> If we don't check for page's NUMA node affinity (which is what currently 
>> happens) - we get performance degradation because we may unintentionally 
>> allocate memory on wrong NUMA node. If we do check for this - then 
>> allocation of memory on socket 1 from lcore on socket 0 will almost never 
>> succeed, because kernel will always give us pages on socket 0.
>>
>> Put it simply, there is no sane way to make this option work for the new 
>> memory subsystem - IMO it should be dropped, and libnuma should be made a 
>> hard dependency on Linux.
> 
> I agree that new memory model could not work without libnuma, i.e. will
> lead to unpredictable memory allocations with no any respect to requested
> socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
> sane for a legacy memory model.
> It looks like we have no other choice than just drop the option and make
> the code unconditional, i.e. have hard dependency on libnuma.
> 

We, probably, could compile this code and have hard dependency only for
platforms with 'RTE_MAX_NUMA_NODES > 1'.

> Best regards, Ilya Maximets.

Reply via email to