>On 24-Jul-18 10:39 AM, Kumar, Ravi1 wrote: >>> >>> >>> -----Original Message----- >>> From: Burakov, Anatoly <anatoly.bura...@intel.com> >>> Sent: Tuesday, July 24, 2018 2:33 PM >>> To: Kumar, Ravi1 <ravi1.ku...@amd.com>; dev@dpdk.org >>> Subject: Re: [dpdk-dev] DPDK 18.05 only works with up to 4 NUMAs >>> systems >>> >>> On 24-Jul-18 9:09 AM, Kumar, Ravi1 wrote: >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Burakov, Anatoly <anatoly.bura...@intel.com> >>>>> Sent: Monday, July 16, 2018 4:05 PM >>>>> To: Kumar, Ravi1 <ravi1.ku...@amd.com>; dev@dpdk.org >>>>> Subject: Re: [dpdk-dev] DPDK 18.05 only works with up to 4 NUMAs >>>>> systems >>>>> >>>>> On 14-Jul-18 10:44 AM, Kumar, Ravi1 wrote: >>>>>> >>>>>> Memory setup with 2M pages works with the default configuration. >>>>>> With the default configuration and 2M hugepages >>>>>> >>>>>> 1. Total amount of memory for each NUMA zone does not >>>>>> exceed 128G (CONFIG_RTE_MAX_MEM_MB_PER_TYPE). >>>>>> >>>>>> 2. Total number of segment lists per NUMA is limited to >>>>>> 32768 (CONFIG_RTE_MAX_MEMSEG_PER_TYPE). This constraint is met for >>>>>> each numa zone. This is the limiting factor for memory per numa >>>>>> with 2M hugepages and the default configuration. >>>>>> >>>>>> 3. The data structures are capable of supporting 64G of >>>>>> memory for each numa zone (32768 segments * 2M hugepagesize). >>>>>> >>>>>> 4. 8 NUMA zones * 64G = 512G. Therefore the total for all >>>>>> numa zones does not exceed 512G (CONFIG_RTE_MAX_MEM_MB). >>>>>> >>>>>> 5. Resources are capable of allocating up to 64G per NUMA >>>>>> zone. Things will work as long as there are enough 2M hugepages >>>>>> to cover the memory needs of the DPDK applications AND no memory >>>>>> zone needs more than 64G. >>>>>> >>>>>> With the default configuration and 1G hugepages >>>>>> >>>>>> 1. Total amount of memory for each NUMA zone is limited to >>>>>> 128G (CONFIG_RTE_MAX_MEM_MB_PER_TYPE). This constraint is hit for >>>>>> each numa zone. This is the limiting factor for memory per numa. >>>>>> >>>>>> 2. Total number of segment lists (128) does not exceed >>>>>> 32768 (CONFIG_RTE_MAX_MEMSEG_PER_TYPE). There are 128 segments per >>>>>> NUMA. >>>>>> >>>>>> 3. The data structures are capable of supporting 128G of >>>>>> memory for each numa zone (128 segments * 1G hugepagesize). >>>>>> However, only the first four NUMA zones get initialized before we >>>>>> hit CONFIG_RTE_MAX_MEM_MB (512G). >>>>>> >>>>>> 4. The total for all numa zones is limited to 512G >>>>>> (CONFIG_RTE_MAX_MEM_MB). This limit is hit after configuring the >>>>>> first four NUMA zones (4 x 128G = 512G). The rest of the NUMA zones >>>>>> cannot allocate memory. >>>>>> >>>>>> Apparently, it is intended to support max 8 NUMAs by default >>>>>> (CONFIG_RTE_MAX_NUMA_NODES=8), but when 1G hugepages are use, it >>>>>> can only support up to 4 NUMAs. >>>>>> >>>>>> Possible workarounds when using 1G hugepages: >>>>>> >>>>>> 1. Decrease CONFIG_RTE_MAX_MEM_MB_PER_TYPE to 65536 (limit >>>>>> of 64G per NUMA zone). This is probably the best option unless >>>>>> you need a lot of memory in any given NUMA. >>>>>> >>>>>> 2. Or, increase CONFIG_RTE_MAX_MEM_MB to 1048576. >>>>> >>>>> Hi Ravi, >>>>> >>>>> OK this makes it much clearer, thanks! >>>>> >>>>> I think the first one should be done. I think 64G per NUMA node is >>>>> still a reasonable amount of memory and it makes the default work >>>>> (i think we can go as far as reducing this limit to 32G per type!), >>>>> and whoever has issues with it can change >>>>> CONFIG_RTE_MAX_MEM_MB_PER_TYPE or CONFIG_RTE_MAX_MEM_MB for their >>>>> use case. That's what these options are there for :) >>>>> >>>>> -- >>>>> Thanks, >>>>> Anatoly >>>>> >>>> >>>> Hi Anatoly, >>>> >>>> Thanks a lot. Will the next release include this change? >>>> >>>> Regards, >>>> Ravi >>>> >>> >>> No one has submitted a patch for this, so not at this moment. I will do so >>> now, but i cannot guarantee it getting merged in 18.08 since it's almost >>> RC2 time, and introducing such a change may be too big a risk. >>> >>> -- >>> Thanks, >>> Anatoly >> Thanks Anatoly. I understand. >> >> Regards, >> Ravi >> > >Hi Ravi, > >In addition to predefined limits that aren't suitable for platforms with high >numbers of sockets, the calculation method itself has had numerous bugs that >prevented it from working if limits were changed. I've come up with a patch >that should fix the issue without the need to change the >config: > >http://patches.dpdk.org/patch/46112/ > >It would be great if you could test it! > >-- >Thanks, >Anatoly
Hi Anatoly, Thank you very much. This looks really good. I will test this and get back to you. Regards, Ravi