Burakov, Anatoly <anatoly.bura...@intel.com> 于2023年5月20日周六 23:03写道: > > Hi, > > On 5/16/2023 1:21 PM, Fengnan Chang wrote: > > Under legacy mode, if the number of continuous memsegs greater > > than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though > > another memseg list is empty, because only one memseg list used > > to check in remap_needed_hugepages. > > > > For example: > > hugepage configure: > > 20480 > > 13370 > > 7110 > > > > startup log: > > EAL: Detected memory type: socket_id:0 hugepage_sz:2097152 > > EAL: Detected memory type: socket_id:1 hugepage_sz:2097152 > > EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152 > > EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152 > > EAL: Requesting 13370 pages of size 2MB from socket 0 > > EAL: Requesting 7110 pages of size 2MB from socket 1 > > EAL: Attempting to map 14220M on socket 1 > > EAL: Allocated 14220M on socket 1 > > EAL: Attempting to map 26740M on socket 0 > > EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in > > configuration. > > Unrelated, but this is probably a wrong message, this should've called > out the config options to change, not their values. Sounds like a log > message needs fixing somewhere...
In the older version, the log is: EAL: Could not find space for memseg. Please increase CONFIG_RTE_MAX_MEMSEG_PER_TYPE and/or CONFIG_RTE_MAX_MEM_PER_TYPE in configuration. Maybe it's better ? > > > EAL: Couldn't remap hugepage files into memseg lists > > EAL: FATAL: Cannot init memory > > EAL: Cannot init memory > > > > Signed-off-by: Fengnan Chang <changfeng...@bytedance.com> > > Signed-off-by: Lin Li <lilint...@bytedance.com> > > --- > > lib/eal/linux/eal_memory.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c > > index 60fc8cc6ca..36b9e78f5f 100644 > > --- a/lib/eal/linux/eal_memory.c > > +++ b/lib/eal/linux/eal_memory.c > > @@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file > > *hugepages, int n_pages) > > if (cur->size == 0) > > break; > > > > + if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST) > > + new_memseg = 1; > > I don't think this is quite right, because technically, > `RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment > lists - larger page sizes segment lists will hit their limits earlier. > So, while this will work for 2MB pages, it won't work for page sizes > which segment list length is smaller than the maximum (such as 1GB pages). > > I think this solution could be improved upon by trying to break up the > contiguous area instead. I suspect the core of the issue is not even the > fact that we're exceeding limits of one memseg list, but that we're > always attempting to map exactly N pages in `remap_hugepages`, which > results in us leaving large contiguous zones inside memseg lists unused > because we couldn't satisfy current allocation request and skipped to a > new memseg list. Correct, I didn't consider 1GB pages case, I get your point. Thanks. > > For example, let's suppose we found a large contiguous area that > would've exceeded limits of current memseg list. Sooner or later, this > contiguous area will end, and we'll attempt to remap this virtual area > into a memseg list. Whenever that happens, we call into the remap code, > which will start with first segment, attempt to find exactly N number of > free spots, fail to do so, and skip to the next segment list. > > Thus, sooner or later, if we get contiguous areas that are large enough, > we will not populate our memseg lists but instead skip through them, and > start with a new memseg list every time we need a large contiguous area. > We prioritize having a large contiguous area over using up all of our > memory map. > > If, instead, we could break up the allocation - that is, use > `rte_fbarray_find_biggest_free()` instead of > `rte_fbarray_find_next_n_free()`, and keep doing it until we run out of > segment lists, we will achieve the same result your patch does, but have > it work for all page sizes, because now we would be targeting the actual > issue (under-utilization of memseg lists), not its symptoms (exceeding > segment list limits for large allocations). > > This logic could either be inside `remap_hugepages`, or we could just > return number of pages mapped from `remap_hugepages`, and have the > calling code (`remap_needed_hugepages`) try again, this time with a > different start segment, reflecting how much pages we actually mapped. > IMO this would be easier to implement, as `remap_hugepages` is overly > complex as it is! > > > if (cur_page == 0) > > new_memseg = 1; > > else if (cur->socket_id != prev->socket_id) > > -- > Thanks, > Anatoly >