"Chao Zhu" <chao...@linux.vnet.ibm.com> wrote on 11/09/2017 04:08:36 AM:
> From: "Chao Zhu" <chao...@linux.vnet.ibm.com> > To: "'Jonas Pfefferle1'" <j...@zurich.ibm.com> > Cc: "'Burakov, Anatoly'" <anatoly.bura...@intel.com>, > <bruce.richard...@intel.com>, <dev@dpdk.org> > Date: 11/09/2017 04:08 AM > Subject: RE: [dpdk-dev] Huge mapping secondary process linux > > > > From: Jonas Pfefferle1 [mailto:j...@zurich.ibm.com] > Sent: 2017年11月7日 18:16 > To: Chao Zhu <chao...@linux.vnet.ibm.com> > Cc: 'Burakov, Anatoly' <anatoly.bura...@intel.com>; > bruce.richard...@intel.com; dev@dpdk.org > Subject: RE: [dpdk-dev] Huge mapping secondary process linux > > "Chao Zhu" <chao...@linux.vnet.ibm.com> wrote on 11/07/2017 09:25:26 AM: > > > From: "Chao Zhu" <chao...@linux.vnet.ibm.com> > > To: "'Jonas Pfefferle1'" <j...@zurich.ibm.com>, "'Burakov, Anatoly'" > > <anatoly.bura...@intel.com> > > Cc: <bruce.richard...@intel.com>, <dev@dpdk.org> > > Date: 11/07/2017 11:00 AM > > Subject: RE: [dpdk-dev] Huge mapping secondary process linux > > > > > > > > From: Jonas Pfefferle1 [mailto:j...@zurich.ibm.com] > > Sent: 2017年10月28日 3:23 > > To: Burakov, Anatoly <anatoly.bura...@intel.com> > > Cc: bruce.richard...@intel.com; chao...@linux.vnet.ibm.com; dev@dpdk.org > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > "Burakov, Anatoly" <anatoly.bura...@intel.com> wrote on 27/10/201718:00:27: > > > > > From: "Burakov, Anatoly" <anatoly.bura...@intel.com> > > > To: Jonas Pfefferle1 <j...@zurich.ibm.com> > > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, dev@dpdk.org > > > Date: 27/10/2017 18:00 > > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > > > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote: > > > > "dev" <dev-boun...@dpdk.org> wrote on 10/27/2017 04:58:01 PM: > > > > > > > > > From: "Jonas Pfefferle1" <j...@zurich.ibm.com> > > > > > To: "Burakov, Anatoly" <anatoly.bura...@intel.com> > > > > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, > > dev@dpdk.org > > > > > Date: 10/27/2017 04:58 PM > > > > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > > Sent by: "dev" <dev-boun...@dpdk.org> > > > > > > > > > > > > > > > "Burakov, Anatoly" <anatoly.bura...@intel.com> wrote on 10/27/2017 > > > > 04:44:52 > > > > > PM: > > > > > > > > > > > From: "Burakov, Anatoly" <anatoly.bura...@intel.com> > > > > > > To: Jonas Pfefferle1 <j...@zurich.ibm.com> > > > > > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, > > > > dev@dpdk.org > > > > > > Date: 10/27/2017 04:45 PM > > > > > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > > > > > > > > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote: > > > > > > > "Burakov, Anatoly" <anatoly.bura...@intel.com> wrote on10/27/2017 > > > > > > > 04:06:44 PM: > > > > > > > > > > > > > > Â > From: "Burakov, Anatoly" <anatoly.bura...@intel.com> > > > > > > > Â > To: Jonas Pfefferle1 <j...@zurich.ibm.com>, dev@dpdk.org > > > > > > > Â > Cc: chao...@linux.vnet.ibm.com, bruce.richard...@intel.com > > > > > > > Â > Date: 10/27/2017 04:06 PM > > > > > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > > > > Â > > > > > > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote: > > > > > > > Â > > > > > > > > > Â > > > > > > > > > Â > > Hi @all, > > > > > > > Â > > > > > > > > > Â > > I'm trying to make sense of the hugepage memory mappings in > > > > > > > Â > > librte_eal/linuxapp/eal/eal_memory.c: > > > > > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we > try to do a > > > > > private > > > > > > > Â > > mapping on /dev/zero (line 1393) why do we not > use MAP_FIXED > > > > if we > > > > > > > > > > > > need the > > > > > > > Â > > addresses to be identical with the primary process? > > > > > > > Â > > * On POWER we have this weird business going on > where we use > > > > > > > MAP_HUGETLB > > > > > > > Â > > because according to this commit: > > > > > > > Â > > > > > > > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440 > > > > > > > Â > > Author: Chao Zhu <chao...@linux.vnet.ibm.com> > > > > > > > Â > > Date: Â Thu Apr 6 15:36:09 2017 +0530 > > > > > > > Â > > > > > > > > > Â > > Â Â Â eal/ppc: fix mmap for memory initialization > > > > > > > Â > > > > > > > > > Â > > Â Â Â On IBM POWER platform, when mapping /dev/ > zero file to > > > > > hugepage > > > > > > > memory > > > > > > > Â > > Â Â Â space, mmap will not respect the requested address > > > > hint.This > > > > > will > > > > > > > Â > > cause > > > > > > > Â > > Â Â Â the memory initialization for the second > > > process fails. > > > > This > > > > > > > patch adds > > > > > > > Â > > Â Â Â the required mmap flags to make it work. > > > Beside this, users > > > > > > > need to set > > > > > > > Â > > Â Â Â the nr_overcommit_hugepages to expand the VA > > > range. When > > > > > > > Â > > Â Â Â doing the initialization, users need to set both > > > > nr_hugepages > > > > > and > > > > > > > Â > > Â Â Â nr_overcommit_hugepages to the same > value, like 64, > > > > 128, etc. > > > > > > > Â > > > > > > > > > Â > > mmap address hints are not respected. Looking at the mmap > > > > code in > > > > > the > > > > > > > Â > > kernel this is not true entirely however under some > > > > circumstances > > > > > > > the hint > > > > > > > Â > > can be ignored ( > > > > > > > Â > > https://urldefense.proofpoint.com/v2/url? > > > > > > > Â > > > > > > > > > > > > > > > > > > > > > > > > > > > > > u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx- > > > > > > > > > > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN- > > > > > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz- > > > > > > > Â > > > > > > > > BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e= > > > > > > > Â > > ). However I believe we can remove the extra case > > forPPC if we > > > > > use > > > > > > > Â > > MAP_FIXED when doing the secondary process > mappingsbecause we > > > > > need > > > > > > > them to > > > > > > > Â > > be identical anyway. We could also use MAP_FIXED > > whendoing the > > > > > primary > > > > > > > Â > > process mappings resp. get_virtual_area if we want > > to have any > > > > > > > guarantees > > > > > > > Â > > when specifying a base address. Any thoughts? > > > > > > > Â > > > > > > > > > Â > > Thanks, > > > > > > > Â > > Jonas > > > > > > > Â > > > > > > > > > Â > hi Jonas, > > > > > > > Â > > > > > > > > Â > MAP_FIXED is not used because it's dangerous, it > > unmaps anything > > > > > that is > > > > > > > Â > already mapped into that space. We would rather know > > > that we can't > > > > > map > > > > > > > Â > something than unwittingly unmap something that was > > > mapped before. > > > > > > > > > > > > > > Ok, I see. Maybe we can add a check to the primary > process's memory > > > > > > > mappings whether the hint has been respected or not? At > > least warn if > > > > > it > > > > > > > hasn't. > > > > > > > > > > > > Hi Jonas, > > > > > > > > > > > > I'm unfamiliar with POWER platform, so i'm afraid you'd > > have to explain > > > > > > a bit more what you mean by "hint has been respected" :) > > > > > > > > > > Hi Anatoly, > > > > > > > > > > What I meant was the mmap address hint: > > > > > > > > > > "If addr is not NULL, then the kernel takes it as a hint > > > > > Â about where to place the mapping; on Linux, the mapping will be > > > > > Â created at a nearby page boundary." > > > > > > > > > > This is actually not true on POWER. It can happen that the address > > > > hint is > > > > > ignored and you get any address back that fits your mapping. > > > > > > > > > > Thanks, > > > > > Jonas > > > > > > > > Actually looking through the kernel code this is also not > > guaranteed on x86. > > > > (https://urldefense.proofpoint.com/v2/url? > > > > > > u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID- > > > g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN- > > > > > > pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw- > > > jsnNP-hAcW487Mumv6xPw&e=) > > > > > > > > So in any case the address hint can be ignored by the kernel > and you get > > > > any address that fits your mapping. > > > > My suggestion is to check when we do the initial mapping in > > > > get_virtual_area if the hint was respected or not, i.e. if thereturned > > > > address == PAGE_ALIGN(address_hint). > > > > > > > > > > I'm not sure i see the issue here. So, just to make sure i understand > > > things correctly: > > > > > > Whenever we don't request a specific base address through base_address > > > EAL parameter, none of this matters - we always ask for memory in > > > arbitrary memory locations, correct? > > > > > > It's also not an issue with secondary processes because we do check > > > returned mmap address to see whether it's the same as we > requested, correct? > > > > > > It's only whenever we *do* specify a base_address, we provide an address > > > hint to mmap to, but we don't check if the address we got from mmap is > > > one in the vicinity of our requested base address, correct? We don't > > > check, and the kernel can ignore address hint, so we're not guaranteed > > > to respect the base_address flag. > > > > > > I'm not sure this is a serious issue, because as far as i'm concerned, > > > this flag is advisory - we only promise to *attempt* to map things at > > > that particular address, not that it will succeed. If the kernel simply > > > cannot find an address to satisfy our address hint, or ignores it for > > > other reasons - well, tough, nothing we can do about that. I'm not sure > > > putting a check like this, where we can't even predict an "expected" > > > address is a good idea. > > > > > > Am i getting this right? > > > > The problem is when we specify a base address we want it to be > used. If it is > > not respected we basically end up with the case like we would have > > never specified it. > > This very likely leads to not being able to run a secondary process because > > we will not be able to map the addresses from our primary process > > and that is why we > > introduced the base address parameter in the first place. > > > > > > > > -- > > > Thanks, > > > Anatoly > > > > > The reason why I put the patch there is that when mapping hugepage > > on POWER, the kernel will never respect the address hints when doing > > mmap unless we expand the address space or unmap all the hugepages. > > This is a big difference when compared with x86. And it affects the > > mapping of the secondary process. I agree that the hints is > > advisory. Just want to see if there are better solutions. > > > This is not true. I looked through the kernel code and the address > hint is treated almost the same on both platforms: > > PPC: https://elixir.free-electrons.com/linux/latest/source/arch/ > powerpc/mm/mmap.c#L143 > Line 169/170 > > x86: https://elixir.free-electrons.com/linux/latest/source/arch/x86/ > kernel/sys_x86_64.c#L165 > Line 189/190 > > The only thing that might differ is the virtual address layout > (e.g. due to different page size etc) and that might lead to the same > value for base-virtaddr not working on both x86 and POWER. > However I tested with different address hints and you easily can > find addresses where the address hint is indeed respected. > That is also why I send in a patch to remove the HUGETLB flags on > the mmap. > > Thanks, > Jonas > You can take a look at this. https://bugzilla.linux.ibm.com/ > show_bug.cgi?id=141628 > It’s quite interesting. Interesting indeed. I misunderstood the problem I thought the get_virtual_area mmap adress hint is not respected when the real problem is the address hint when mapping the hugepages. Still I hope we can find a better solution. Aside from that I still believe warning on the address hint being respected or not is a good idea.