On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: > On 09/03/2017 12:57, Ilya Maximets wrote: >> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>> Hi Ilya, >>> >>> I have done similar tests and as you already pointed out, 'numactl >>> --interleave' does not seem to work as expected. >>> I have also checked that the issue can be reproduced with quota limit on >>> hugetlbfs mount point. >>> >>> I would be inclined towards *adding libnuma as dependency* to DPDK to make >>> memory allocation a bit more reliable. >>> >>> Currently at a high level regarding hugepages per numa node: >>> 1) Try to map all free hugepages. The total number of mapped hugepages >>> depends if there were any limits, such as cgroups or quota in mount point. >>> 2) Find out numa node of each hugepage. >>> 3) Check if we have enough hugepages for requested memory in each numa >>> socket/node. >>> >>> Using libnuma we could try to allocate hugepages per numa: >>> 1) Try to map as many hugepages from numa 0. >>> 2) Check if we have enough hugepages for requested memory in numa 0. >>> 3) Try to map as many hugepages from numa 1. >>> 4) Check if we have enough hugepages for requested memory in numa 1. >>> >>> This approach would improve failing scenarios caused by limits but It would >>> still not fix issues regarding non-contiguous hugepages (worst case each >>> hugepage is a memseg). >>> The non-contiguous hugepages issues are not as critical now that mempools >>> can span over multiple memsegs/hugepages, but it is still a problem for any >>> other library requiring big chunks of memory. >>> >>> Potentially if we were to add an option such as 'iommu-only' when all >>> devices are bound to vfio-pci, we could have a reliable way to allocate >>> hugepages by just requesting the number of pages from each numa. >>> >>> Thoughts? >> Hi Sergio, >> >> Thanks for your attention to this. >> >> For now, as we have some issues with non-contiguous >> hugepages, I'm thinking about following hybrid schema: >> 1) Allocate essential hugepages: >> 1.1) Allocate as many hugepages from numa N to >> only fit requested memory for this numa. >> 1.2) repeat 1.1 for all numa nodes. >> 2) Try to map all remaining free hugepages in a round-robin >> fashion like in this patch. >> 3) Sort pages and choose the most suitable. >> >> This solution should decrease number of issues connected with >> non-contiguous memory. > > Sorry for late reply, I was hoping for more comments from the community. > > IMHO this should be default behavior, which means no config option and > libnuma as EAL dependency. > I think your proposal is good, could you consider implementing such approach > on next release?
Sure, I can implement this for 17.08 release. >> >>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>> Hi all. >>>> >>>> So, what about this change? >>>> >>>> Best regards, Ilya Maximets. >>>> >>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>> Currently EAL allocates hugepages one by one not paying >>>>> attention from which NUMA node allocation was done. >>>>> >>>>> Such behaviour leads to allocation failure if number of >>>>> available hugepages for application limited by cgroups >>>>> or hugetlbfs and memory requested not only from the first >>>>> socket. >>>>> >>>>> Example: >>>>> # 90 x 1GB hugepages availavle in a system >>>>> >>>>> cgcreate -g hugetlb:/test >>>>> # Limit to 32GB of hugepages >>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>> # Request 4GB from each of 2 sockets >>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>> >>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>> EAL: Not enough memory available on socket 1! >>>>> Requested: 4096MB, available: 0MB >>>>> PANIC in rte_eal_init(): >>>>> Cannot init memory >>>>> >>>>> This happens beacause all allocated pages are >>>>> on socket 0. >>>>> >>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>> In this case all allocated pages will be fairly distributed >>>>> between all requested nodes. >>>>> >>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>> introduced and disabled by default because of external >>>>> dependency from libnuma. >>>>> >>>>> Cc:<sta...@dpdk.org> >>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>> >>>>> Signed-off-by: Ilya Maximets<i.maxim...@samsung.com> >>>>> --- >>>>> config/common_base | 1 + >>>>> lib/librte_eal/Makefile | 4 ++ >>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 >>>>> ++++++++++++++++++++++++++++++++ >>>>> mk/rte.app.mk | 3 ++ >>>>> 4 files changed, 74 insertions(+) > > Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.mon...@intel.com> Thanks. Best regards, Ilya Maximets.