Hi All. I wanted to ask (just to clarify current status): Will this patch be included in current release (acked by maintainer) and then I will upgrade it to hybrid logic or I will just prepare v3 with hybrid logic for 17.08 ?
Best regards, Ilya Maximets. On 27.03.2017 17:43, Ilya Maximets wrote: > On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: >> On 09/03/2017 12:57, Ilya Maximets wrote: >>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>>> Hi Ilya, >>>> >>>> I have done similar tests and as you already pointed out, 'numactl >>>> --interleave' does not seem to work as expected. >>>> I have also checked that the issue can be reproduced with quota limit on >>>> hugetlbfs mount point. >>>> >>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make >>>> memory allocation a bit more reliable. >>>> >>>> Currently at a high level regarding hugepages per numa node: >>>> 1) Try to map all free hugepages. The total number of mapped hugepages >>>> depends if there were any limits, such as cgroups or quota in mount point. >>>> 2) Find out numa node of each hugepage. >>>> 3) Check if we have enough hugepages for requested memory in each numa >>>> socket/node. >>>> >>>> Using libnuma we could try to allocate hugepages per numa: >>>> 1) Try to map as many hugepages from numa 0. >>>> 2) Check if we have enough hugepages for requested memory in numa 0. >>>> 3) Try to map as many hugepages from numa 1. >>>> 4) Check if we have enough hugepages for requested memory in numa 1. >>>> >>>> This approach would improve failing scenarios caused by limits but It >>>> would still not fix issues regarding non-contiguous hugepages (worst case >>>> each hugepage is a memseg). >>>> The non-contiguous hugepages issues are not as critical now that mempools >>>> can span over multiple memsegs/hugepages, but it is still a problem for >>>> any other library requiring big chunks of memory. >>>> >>>> Potentially if we were to add an option such as 'iommu-only' when all >>>> devices are bound to vfio-pci, we could have a reliable way to allocate >>>> hugepages by just requesting the number of pages from each numa. >>>> >>>> Thoughts? >>> Hi Sergio, >>> >>> Thanks for your attention to this. >>> >>> For now, as we have some issues with non-contiguous >>> hugepages, I'm thinking about following hybrid schema: >>> 1) Allocate essential hugepages: >>> 1.1) Allocate as many hugepages from numa N to >>> only fit requested memory for this numa. >>> 1.2) repeat 1.1 for all numa nodes. >>> 2) Try to map all remaining free hugepages in a round-robin >>> fashion like in this patch. >>> 3) Sort pages and choose the most suitable. >>> >>> This solution should decrease number of issues connected with >>> non-contiguous memory. >> >> Sorry for late reply, I was hoping for more comments from the community. >> >> IMHO this should be default behavior, which means no config option and >> libnuma as EAL dependency. >> I think your proposal is good, could you consider implementing such approach >> on next release? > > Sure, I can implement this for 17.08 release. > >>> >>>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>>> Hi all. >>>>> >>>>> So, what about this change? >>>>> >>>>> Best regards, Ilya Maximets. >>>>> >>>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>>> Currently EAL allocates hugepages one by one not paying >>>>>> attention from which NUMA node allocation was done. >>>>>> >>>>>> Such behaviour leads to allocation failure if number of >>>>>> available hugepages for application limited by cgroups >>>>>> or hugetlbfs and memory requested not only from the first >>>>>> socket. >>>>>> >>>>>> Example: >>>>>> # 90 x 1GB hugepages availavle in a system >>>>>> >>>>>> cgcreate -g hugetlb:/test >>>>>> # Limit to 32GB of hugepages >>>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>>> # Request 4GB from each of 2 sockets >>>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>>> >>>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>>> EAL: Not enough memory available on socket 1! >>>>>> Requested: 4096MB, available: 0MB >>>>>> PANIC in rte_eal_init(): >>>>>> Cannot init memory >>>>>> >>>>>> This happens beacause all allocated pages are >>>>>> on socket 0. >>>>>> >>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>>> In this case all allocated pages will be fairly distributed >>>>>> between all requested nodes. >>>>>> >>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>>> introduced and disabled by default because of external >>>>>> dependency from libnuma. >>>>>> >>>>>> Cc:<sta...@dpdk.org> >>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>>> >>>>>> Signed-off-by: Ilya Maximets<i.maxim...@samsung.com> >>>>>> --- >>>>>> config/common_base | 1 + >>>>>> lib/librte_eal/Makefile | 4 ++ >>>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 >>>>>> ++++++++++++++++++++++++++++++++ >>>>>> mk/rte.app.mk | 3 ++ >>>>>> 4 files changed, 74 insertions(+) >> >> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.mon...@intel.com> > > Thanks. > > Best regards, Ilya Maximets. >