On 10.04.2017 10:51, Sergio Gonzalez Monroy wrote: > On 10/04/2017 08:11, Ilya Maximets wrote: >> On 07.04.2017 18:44, Thomas Monjalon wrote: >>> 2017-04-07 18:14, Ilya Maximets: >>>> Hi All. >>>> >>>> I wanted to ask (just to clarify current status): >>>> Will this patch be included in current release (acked by maintainer) >>>> and then I will upgrade it to hybrid logic or I will just prepare v3 >>>> with hybrid logic for 17.08 ? >>> What is your preferred option Ilya? >> I have no strong opinion on this. One thought is that it could be >> nice if someone else could test this functionality with current >> release before enabling it by default in 17.08. >> >> Tomorrow I'm going on vacation. So I'll post rebased version today >> (there are few fuzzes with current master) and you with Sergio may >> decide what to do. >> >> Best regards, Ilya Maximets. >> >>> Sergio? > > I would be inclined towards v3 targeting v17.08. IMHO it would be more clean > this way.
OK. I've sent rebased version just in case. > > Sergio > >>> >>>> On 27.03.2017 17:43, Ilya Maximets wrote: >>>>> On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: >>>>>> On 09/03/2017 12:57, Ilya Maximets wrote: >>>>>>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>>>>>>> Hi Ilya, >>>>>>>> >>>>>>>> I have done similar tests and as you already pointed out, 'numactl >>>>>>>> --interleave' does not seem to work as expected. >>>>>>>> I have also checked that the issue can be reproduced with quota limit >>>>>>>> on hugetlbfs mount point. >>>>>>>> >>>>>>>> I would be inclined towards *adding libnuma as dependency* to DPDK to >>>>>>>> make memory allocation a bit more reliable. >>>>>>>> >>>>>>>> Currently at a high level regarding hugepages per numa node: >>>>>>>> 1) Try to map all free hugepages. The total number of mapped hugepages >>>>>>>> depends if there were any limits, such as cgroups or quota in mount >>>>>>>> point. >>>>>>>> 2) Find out numa node of each hugepage. >>>>>>>> 3) Check if we have enough hugepages for requested memory in each numa >>>>>>>> socket/node. >>>>>>>> >>>>>>>> Using libnuma we could try to allocate hugepages per numa: >>>>>>>> 1) Try to map as many hugepages from numa 0. >>>>>>>> 2) Check if we have enough hugepages for requested memory in numa 0. >>>>>>>> 3) Try to map as many hugepages from numa 1. >>>>>>>> 4) Check if we have enough hugepages for requested memory in numa 1. >>>>>>>> >>>>>>>> This approach would improve failing scenarios caused by limits but It >>>>>>>> would still not fix issues regarding non-contiguous hugepages (worst >>>>>>>> case each hugepage is a memseg). >>>>>>>> The non-contiguous hugepages issues are not as critical now that >>>>>>>> mempools can span over multiple memsegs/hugepages, but it is still a >>>>>>>> problem for any other library requiring big chunks of memory. >>>>>>>> >>>>>>>> Potentially if we were to add an option such as 'iommu-only' when all >>>>>>>> devices are bound to vfio-pci, we could have a reliable way to >>>>>>>> allocate hugepages by just requesting the number of pages from each >>>>>>>> numa. >>>>>>>> >>>>>>>> Thoughts? >>>>>>> Hi Sergio, >>>>>>> >>>>>>> Thanks for your attention to this. >>>>>>> >>>>>>> For now, as we have some issues with non-contiguous >>>>>>> hugepages, I'm thinking about following hybrid schema: >>>>>>> 1) Allocate essential hugepages: >>>>>>> 1.1) Allocate as many hugepages from numa N to >>>>>>> only fit requested memory for this numa. >>>>>>> 1.2) repeat 1.1 for all numa nodes. >>>>>>> 2) Try to map all remaining free hugepages in a round-robin >>>>>>> fashion like in this patch. >>>>>>> 3) Sort pages and choose the most suitable. >>>>>>> >>>>>>> This solution should decrease number of issues connected with >>>>>>> non-contiguous memory. >>>>>> Sorry for late reply, I was hoping for more comments from the community. >>>>>> >>>>>> IMHO this should be default behavior, which means no config option and >>>>>> libnuma as EAL dependency. >>>>>> I think your proposal is good, could you consider implementing such >>>>>> approach on next release? >>>>> Sure, I can implement this for 17.08 release. >>>>> >>>>>>>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>>>>>>> Hi all. >>>>>>>>> >>>>>>>>> So, what about this change? >>>>>>>>> >>>>>>>>> Best regards, Ilya Maximets. >>>>>>>>> >>>>>>>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>>>>>>> Currently EAL allocates hugepages one by one not paying >>>>>>>>>> attention from which NUMA node allocation was done. >>>>>>>>>> >>>>>>>>>> Such behaviour leads to allocation failure if number of >>>>>>>>>> available hugepages for application limited by cgroups >>>>>>>>>> or hugetlbfs and memory requested not only from the first >>>>>>>>>> socket. >>>>>>>>>> >>>>>>>>>> Example: >>>>>>>>>> # 90 x 1GB hugepages availavle in a system >>>>>>>>>> >>>>>>>>>> cgcreate -g hugetlb:/test >>>>>>>>>> # Limit to 32GB of hugepages >>>>>>>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>>>>>>> # Request 4GB from each of 2 sockets >>>>>>>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>>>>>>> >>>>>>>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>>>>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>>>>>>> EAL: Not enough memory available on socket 1! >>>>>>>>>> Requested: 4096MB, available: 0MB >>>>>>>>>> PANIC in rte_eal_init(): >>>>>>>>>> Cannot init memory >>>>>>>>>> >>>>>>>>>> This happens beacause all allocated pages are >>>>>>>>>> on socket 0. >>>>>>>>>> >>>>>>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>>>>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>>>>>>> In this case all allocated pages will be fairly distributed >>>>>>>>>> between all requested nodes. >>>>>>>>>> >>>>>>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>>>>>>> introduced and disabled by default because of external >>>>>>>>>> dependency from libnuma. >>>>>>>>>> >>>>>>>>>> Cc:<sta...@dpdk.org> >>>>>>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ilya Maximets<i.maxim...@samsung.com> >>>>>>>>>> --- >>>>>>>>>> config/common_base | 1 + >>>>>>>>>> lib/librte_eal/Makefile | 4 ++ >>>>>>>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 >>>>>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>>>> mk/rte.app.mk | 3 ++ >>>>>>>>>> 4 files changed, 74 insertions(+) >>>>>> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.mon...@intel.com> >>>>> Thanks. >>>>> >>>>> Best regards, Ilya Maximets. >>>>> >>> >>> >>> >>> > > > >