Re: [dpdk-dev] [PATCH] vfio: fix sPAPR IOMMU DMA window size

2017-08-08 Thread Jonas Pfefferle1
Alexey Kardashevskiy  wrote on 08/08/2017 09:38:00 AM:

> From: Alexey Kardashevskiy 
> To: Jonas Pfefferle , anatoly.bura...@intel.com
> Cc: dev@dpdk.org
> Date: 08/08/2017 09:38 AM
> Subject: Re: [PATCH] vfio: fix sPAPR IOMMU DMA window size
>
> On 08/08/17 01:11, Jonas Pfefferle wrote:
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables
> > as we already span ~70TB of physical memory with 16MB hugepages.
> >
> > Signed-off-by: Jonas Pfefferle 
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 25 ++---
> >  1 file changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> > index 946df7e..8502216 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -722,6 +722,18 @@ vfio_type1_dma_map(int vfio_container_fd)
> > return 0;
> >  }
> >
> > +static uint64_t
> > +roundup_next_pow2(uint64_t n)
> > +{
> > +   uint32_t i;
> > +
> > +   n--;
> > +   for (i = 1; i < sizeof(n) * CHAR_BIT; i += i)
> > +  n |= n >> i;
> > +
> > +   return ++n;
> > +}
> > +
>
> wow :)
>
> QEMU does it using __builtin_ctzll() (used below for the page_shift)
> without a loop:
>
> https://git.qemu.org/gitweb.cgi?p=qemu.git;a=blob;f=include/qemu/
> host-
>
utils.h;h=95cf4f4163e50457cdf808263065ca5ef3f935da;hb=f22ab6cb0c47bd2a2785b7d58130949bd7d8d9af#l382

>
>
> Anyway, seems working.

Ok let me fix that :)

>
>
> Reviewed-by: Alexey Kardashevskiy 
>
>
>
>
> >  static int
> >  vfio_spapr_dma_map(int vfio_container_fd)
> >  {
> > @@ -759,10 +771,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >return -1;
> > }
> >
> > -   /* calculate window size based on number of hugepages configured */
> > -   create.window_size = rte_eal_get_physmem_size();
> > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr is
max */
> > +   /* create DMA window from 0 to max(phys_addr + len) */
> > +   /* sPAPR requires window size to be a power of 2 */
> > +   create.window_size = roundup_next_pow2(ms[0].phys_addr + ms
[0].len);
> > create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > -   create.levels = 2;
> > +   create.levels = 1;
> >
> > ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
&create);
> > if (ret) {
> > @@ -771,6 +785,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >return -1;
> > }
> >
> > +   if (create.start_addr != 0) {
> > +  RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > +  return -1;
> > +   }
> > +
> > /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >struct vfio_iommu_type1_dma_map dma_map;
> >
>
>
> --
> Alexey
>


Re: [dpdk-dev] [PATCH] vfio: fix sPAPR IOMMU DMA window size

2017-08-08 Thread Jonas Pfefferle1
"Ananyev, Konstantin"  wrote on 08/08/2017
10:27:28 AM:

> From: "Ananyev, Konstantin" 
> To: Alexey Kardashevskiy , Jonas Pfefferle
> , "Burakov, Anatoly" 
> Cc: "dev@dpdk.org" 
> Date: 08/08/2017 10:27 AM
> Subject: RE: [dpdk-dev] [PATCH] vfio: fix sPAPR IOMMU DMA window size
>
>
>
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alexey
Kardashevskiy
> > Sent: Tuesday, August 8, 2017 10:38 AM
> > To: Jonas Pfefferle ; Burakov, Anatoly
> 
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] vfio: fix sPAPR IOMMU DMA window size
> >
> > On 08/08/17 01:11, Jonas Pfefferle wrote:
> > > DMA window size needs to be big enough to span all memory segment's
> > > physical addresses. We do not need multiple levels of IOMMU tables
> > > as we already span ~70TB of physical memory with 16MB hugepages.
> > >
> > > Signed-off-by: Jonas Pfefferle 
> > > ---
> > >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 25 +
+---
> > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> > > index 946df7e..8502216 100644
> > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > @@ -722,6 +722,18 @@ vfio_type1_dma_map(int vfio_container_fd)
> > > return 0;
> > >  }
> > >
> > > +static uint64_t
> > > +roundup_next_pow2(uint64_t n)
> > > +{
> > > +   uint32_t i;
> > > +
> > > +   n--;
> > > +   for (i = 1; i < sizeof(n) * CHAR_BIT; i += i)
> > > +  n |= n >> i;
> > > +
> > > +   return ++n;
> > > +}
> > > +
> >
> > wow :)
> >
> > QEMU does it using __builtin_ctzll() (used below for the page_shift)
> > without a loop:
> >
> > https://git.qemu.org/gitweb.cgi?p=qemu.git;a=blob;f=include/qemu/host-
> >
>
utils.h;h=95cf4f4163e50457cdf808263065ca5ef3f935da;hb=f22ab6cb0c47bd2a2785b7d58130949bd7d8d9af#l382

> >
> >
> > Anyway, seems working.
>
> As I remember, there already exists rte_align64pow2().
> Konstantin

Thanks. Fixed it.

>
> >
> >
> > Reviewed-by: Alexey Kardashevskiy 
> >
> >
> >
> >
> > >  static int
> > >  vfio_spapr_dma_map(int vfio_container_fd)
> > >  {
> > > @@ -759,10 +771,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > >return -1;
> > > }
> > >
> > > -   /* calculate window size based on number of hugepages configured
*/
> > > -   create.window_size = rte_eal_get_physmem_size();
> > > +   /* physicaly pages are sorted descending i.e. ms
> [0].phys_addr is max */
> > > +   /* create DMA window from 0 to max(phys_addr + len) */
> > > +   /* sPAPR requires window size to be a power of 2 */
> > > +   create.window_size = roundup_next_pow2(ms[0].phys_addr + ms
[0].len);
> > > create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > > -   create.levels = 2;
> > > +   create.levels = 1;
> > >
> > > ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
&create);
> > > if (ret) {
> > > @@ -771,6 +785,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > >return -1;
> > > }
> > >
> > > +   if (create.start_addr != 0) {
> > > +  RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > > +  return -1;
> > > +   }
> > > +
> > > /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > > for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> > >struct vfio_iommu_type1_dma_map dma_map;
> > >
> >
> >
> > --
> > Alexey


Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size

2017-08-08 Thread Jonas Pfefferle1
"Burakov, Anatoly"  wrote on 08/08/2017 11:15:24
AM:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle 
> Cc: "dev@dpdk.org" , "a...@ozlabs.ru" 
> Date: 08/08/2017 11:18 AM
> Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
>
> From: Jonas Pfefferle [mailto:j...@zurich.ibm.com]
> > Sent: Tuesday, August 8, 2017 9:41 AM
> > To: Burakov, Anatoly 
> > Cc: dev@dpdk.org; a...@ozlabs.ru; Jonas Pfefferle 
> > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> >
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables
> > as we already span ~70TB of physical memory with 16MB hugepages.
> >
> > Signed-off-by: Jonas Pfefferle 
> > ---
> > v2:
> > * roundup to next power 2 function without loop.
> >
> > v3:
> > * Replace roundup_next_pow2 with rte_align64pow2
> >
> >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index 946df7e..550c41c 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >return -1;
> > }
> >
> > -   /* calculate window size based on number of hugepages configured
> > */
> > -   create.window_size = rte_eal_get_physmem_size();
> > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr is
max
> > */
>
> Do we always expect that to be the case in the future? Maybe it
> would be safer to walk the memsegs list.
>
> Thanks,
> Anatoly

I had this loop in before but removed it in favor of simplicity.
If we believe that the ordering is going to change in the future
I'm happy to bring back the loop. Is there other code which is
relying on the fact that the memsegs are sorted by their physical
addresses?

>
> > +   /* create DMA window from 0 to max(phys_addr + len) */
> > +   /* sPAPR requires window size to be a power of 2 */
> > +   create.window_size = rte_align64pow2(ms[0].phys_addr +
> > ms[0].len);
> > create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > -   create.levels = 2;
> > +   create.levels = 1;
> >
> > ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> > &create);
> > if (ret) {
> > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >return -1;
> > }
> >
> > +   if (create.start_addr != 0) {
> > +  RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > +  return -1;
> > +   }
> > +
> > /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >struct vfio_iommu_type1_dma_map dma_map;
> > --
> > 2.7.4
>


Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size

2017-08-08 Thread Jonas Pfefferle1

"Burakov, Anatoly"  wrote on 08/08/2017 11:43:43
AM:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle1 
> Cc: "a...@ozlabs.ru" , "dev@dpdk.org" 
> Date: 08/08/2017 11:43 AM
> Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
>
> > From: Jonas Pfefferle1 [mailto:j...@zurich.ibm.com]
> > Sent: Tuesday, August 8, 2017 10:30 AM
> > To: Burakov, Anatoly 
> > Cc: a...@ozlabs.ru; dev@dpdk.org
> > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> >
> > "Burakov, Anatoly"  wrote on 08/08/2017
> > 11:15:24 AM:
> >
> > > From: "Burakov, Anatoly" 
> > > To: Jonas Pfefferle 
> > > Cc: "dev@dpdk.org" , "a...@ozlabs.ru" 
> > > Date: 08/08/2017 11:18 AM
> > > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> > >
> > > From: Jonas Pfefferle [mailto:j...@zurich.ibm.com]
> > > > Sent: Tuesday, August 8, 2017 9:41 AM
> > > > To: Burakov, Anatoly 
> > > > Cc: dev@dpdk.org; a...@ozlabs.ru; Jonas Pfefferle

> > > > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> > > >
> > > > DMA window size needs to be big enough to span all memory segment's
> > > > physical addresses. We do not need multiple levels of IOMMU tables
> > > > as we already span ~70TB of physical memory with 16MB hugepages.
> > > >
> > > > Signed-off-by: Jonas Pfefferle 
> > > > ---
> > > > v2:
> > > > * roundup to next power 2 function without loop.
> > > >
> > > > v3:
> > > > * Replace roundup_next_pow2 with rte_align64pow2
> > > >
> > > >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++---
> > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > index 946df7e..550c41c 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > > >        return -1;
> > > >     }
> > > >
> > > > -   /* calculate window size based on number of hugepages
configured
> > > > */
> > > > -   create.window_size = rte_eal_get_physmem_size();
> > > > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr
is max
> > > > */
> > >
> > > Do we always expect that to be the case in the future? Maybe it
> > > would be safer to walk the memsegs list.
> > >
> > > Thanks,
> > > Anatoly
> >
> > I had this loop in before but removed it in favor of simplicity.
> > If we believe that the ordering is going to change in the future
> > I'm happy to bring back the loop. Is there other code which is
> > relying on the fact that the memsegs are sorted by their physical
> > addresses?
>
> I don't think there is. In any case, I think making assumptions
> about particulars of memseg organization is not a very good practice.
>
> I seem to recall us doing similar things in other places, so maybe
> down the line we could introduce a new API (or internal-only)
> function to get a memseg with min/max address. For now I think a
> loop will do.

Ok. Makes sense to me. Let me resubmit a new version with the loop.

>
> >
> > >
> > > > +   /* create DMA window from 0 to max(phys_addr + len) */
> > > > +   /* sPAPR requires window size to be a power of 2 */
> > > > +   create.window_size = rte_align64pow2(ms[0].phys_addr +
> > > > ms[0].len);
> > > >     create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > > > -   create.levels = 2;
> > > > +   create.levels = 1;
> > > >
> > > >     ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> > > > &create);
> > > >     if (ret) {
> > > > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > > >        return -1;
> > > >     }
> > > >
> > > > +   if (create.start_addr != 0) {
> > > > +      RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > > > +      return -1;
> > > > +   }
> > > > +
> > > >     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > > >     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> > > >        struct vfio_iommu_type1_dma_map dma_map;
> > > > --
> > > > 2.7.4
> > >
>


Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map

2017-04-20 Thread Jonas Pfefferle1

Alexey Kardashevskiy  wrote on 20/04/2017 09:24:02:

> From: Alexey Kardashevskiy 
> To: dev@dpdk.org
> Cc: Alexey Kardashevskiy , j...@zurich.ibm.com,
> Gowrishankar Muthukrishnan 
> Date: 20/04/2017 09:24
> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
> addresses for DMA map
>
> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
> just created DMA window. It happens to start from zero because the
default
> window is removed (leaving no windows) and new window starts from zero.
> However this is not guaranteed and the new window may start from another
> address, this adds an error check.
>
> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
> bus address while in this case a physical address of a user page is used.
> This changes IOVA to start from zero in a hope that the rest of DPDK
> expects this.

This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
phys_addr of the memory segment it got from /proc/self/pagemap cf.
librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
actual iova which basically makes the whole virtual to phyiscal mapping
with pagemap unnecessary which I believe should be the case for VFIO
anyway. Pagemap should only be needed when using pci_uio.

>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> index 46f951f4d..8b8e75c4f 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  {
> const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> int i, ret;
> -
> +   phys_addr_t io_offset;
> struct vfio_iommu_spapr_register_memory reg = {
>.argsz = sizeof(reg),
>.flags = 0
> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>return -1;
> }
>
> +   io_offset = create.start_addr;
> +   if (io_offset) {
> +  RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported,
"
> +"new window is created at %lx\n", io_offset);
> +  return -1;
> +   }
> +
> /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>struct vfio_iommu_type1_dma_map dma_map;
> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>dma_map.vaddr = ms[i].addr_64;
>dma_map.size = ms[i].len;
> -  dma_map.iova = ms[i].phys_addr;
> +  dma_map.iova = io_offset;
>dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>   VFIO_DMA_MAP_FLAG_WRITE;
>
> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>   return -1;
>}
>
> +  io_offset += dma_map.size;
> }
>
> return 0;
> --
> 2.11.0
>


Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map

2017-04-20 Thread Jonas Pfefferle1
Alexey Kardashevskiy  wrote on 20/04/2017 16:22:01:

> From: Alexey Kardashevskiy 
> To: Jonas Pfefferle1 
> Cc: dev@dpdk.org, Gowrishankar Muthukrishnan
> , Adrian Schuepbach 
> Date: 20/04/2017 16:22
> Subject: Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
> addresses for DMA map
>
> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
> > On 20/04/17 19:04, Jonas Pfefferle1 wrote:
> >> Alexey Kardashevskiy  wrote on 20/04/2017 09:24:02:
> >>
> >>> From: Alexey Kardashevskiy 
> >>> To: dev@dpdk.org
> >>> Cc: Alexey Kardashevskiy , j...@zurich.ibm.com,
> >>> Gowrishankar Muthukrishnan 
> >>> Date: 20/04/2017 09:24
> >>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
> >>> addresses for DMA map
> >>>
> >>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address
for
> >>> just created DMA window. It happens to start from zero because the
default
> >>> window is removed (leaving no windows) and new window starts from
zero.
> >>> However this is not guaranteed and the new window may start from
another
> >>> address, this adds an error check.
> >>>
> >>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a
PCI
> >>> bus address while in this case a physical address of a user page is
used.
> >>> This changes IOVA to start from zero in a hope that the rest of DPDK
> >>> expects this.
> >>
> >> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use
the
> >> phys_addr of the memory segment it got from /proc/self/pagemap cf.
> >> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to
the
> >> actual iova which basically makes the whole virtual to phyiscal
mapping
> >> with pagemap unnecessary which I believe should be the case for VFIO
> >> anyway. Pagemap should only be needed when using pci_uio.
> >
> >
> > Ah, ok, makes sense now. But it sure needs a big fat comment there as
it is
> > not obvious why host RAM address is used there as DMA window start is
not
> > guaranteed.
>
> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64
both
> have exact same value, in my setup it is 3fffb33c which is a
userspace
> address - at least ms[i].phys_addr must be physical address.

This might be the case if you are not using hugetlbfs i.e. passing
"--no-huge" cf. eal_memory.c:980

/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
addr = mmap(NULL, internal_config.memory, PROT_READ |
PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
return -1;
}
mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
mcfg->memseg[0].addr = addr;
mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
mcfg->memseg[0].len = internal_config.memory;
mcfg->memseg[0].socket_id = 0;
return 0;
}

If it fails to get the virt2phys mapping it actually assigns iovas starting
from 0 to the memory segments, cf. set_physaddrs eal_memory.c:263

>
>
> >
> >
> >>
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy 
> >>> ---
> >>>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++--
> >>>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> >>> librte_eal/linuxapp/eal/eal_vfio.c
> >>> index 46f951f4d..8b8e75c4f 100644
> >>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >>>  {
> >>> const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> >>> int i, ret;
> >>> -
> >>> +   phys_addr_t io_offset;
> >>> struct vfio_iommu_spapr_register_memory reg = {
> >>>.argsz = sizeof(reg),
> >>>.flags = 0
> >>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >>>return -1;
> >>> }
> >>>
> >>> +   io_offset = create.start_addr;
> >>> +   if (io_offset) {
> >>> +  RTE_LOG(ERR, EAL, "  DMA o

Re: [dpdk-dev] [PATCH] vfio: refactor PCI BAR mapping

2017-09-20 Thread Jonas Pfefferle1

Hi Anatoly,

"Burakov, Anatoly"  wrote on 09/19/2017 01:40:51
PM:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle , dev@dpdk.org
> Date: 09/19/2017 01:41 PM
> Subject: Re: [dpdk-dev] [PATCH] vfio: refactor PCI BAR mapping
>
> Hi Jonas,
>
> On 17-Aug-17 12:35 PM, Jonas Pfefferle wrote:
> > Split pci_vfio_map_resource for primary and secondary processes.
> > Save all relevant mapping data in primary process to allow
> > the secondary process to perform mappings.
> >
> > Signed-off-by: Jonas Pfefferle 
> > ---
> >   lib/librte_eal/common/include/rte_pci.h|   7 +
> >   lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 447 +++
> ++
> >   2 files changed, 271 insertions(+), 183 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/
> librte_eal/common/include/rte_pci.h
> > index 8b12339..0821af9 100644
> > --- a/lib/librte_eal/common/include/rte_pci.h
> > +++ b/lib/librte_eal/common/include/rte_pci.h
> > @@ -214,6 +214,12 @@ struct pci_map {
> >  uint64_t phaddr;
> >   };
> >
> > +struct pci_msix_table {
> > +   int bar_index;
> > +   uint32_t offset;
> > +   uint32_t size;
> > +};
> > +
> >   /**
> >* A structure describing a mapped PCI resource.
> >* For multi-process we need to reproduce all PCI mappings in
secondary
> > @@ -226,6 +232,7 @@ struct mapped_pci_resource {
> >  char path[PATH_MAX];
> >  int nb_maps;
> >  struct pci_map maps[PCI_MAX_RESOURCE];
> > +   struct pci_msix_table msix_table;
> >   };
> >
> >   /** mapped pci device list */
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_pci_vfio.c
> > index aa9d96e..f37552a 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > @@ -88,8 +88,7 @@ pci_vfio_write_config(const struct
> rte_intr_handle *intr_handle,
> >
> >   /* get PCI BAR number where MSI-X interrupts are */
> >   static int
> > -pci_vfio_get_msix_bar(int fd, int *msix_bar, uint32_t
*msix_table_offset,
> > -uint32_t *msix_table_size)
> > +pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
> >   {
> >  int ret;
> >  uint32_t reg;
> > @@ -161,9 +160,10 @@ pci_vfio_get_msix_bar(int fd, int *msix_bar,
> uint32_t *msix_table_offset,
> >   return -1;
> >}
> >
> > - *msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
> > - *msix_table_offset = reg & RTE_PCI_MSIX_TABLE_OFFSET;
> > - *msix_table_size = 16 * (1 + (flags &
RTE_PCI_MSIX_FLAGS_QSIZE));
> > + msix_table->bar_index = reg & RTE_PCI_MSIX_TABLE_BIR;
> > + msix_table->offset = reg & RTE_PCI_MSIX_TABLE_OFFSET;
> > + msix_table->size =
> > +16 * (1 + (flags & RTE_PCI_MSIX_FLAGS_QSIZE));
> >
> >return 0;
> > }
> > @@ -300,25 +300,152 @@ pci_vfio_setup_interrupts(struct
> rte_pci_device *dev, int vfio_dev_fd)
> >  return -1;
> >   }
> >
> > -/*
> > - * map the PCI resources of a PCI device in virtual memory (VFIO
version).
> > - * primary and secondary processes follow almost exactly the same path
> > - */
> > -int
> > -pci_vfio_map_resource(struct rte_pci_device *dev)
> > +static int
> > +pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
> > +{
> > +   uint32_t ioport_bar;
> > +   int ret;
> > +
> > +   ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
> > +   VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
> > +   + PCI_BASE_ADDRESS_0 + bar_index*4);
> > +   if (ret != sizeof(ioport_bar)) {
> > +  RTE_LOG(ERR, EAL, "Cannot read command (%x) from config
space!\n",
> > + PCI_BASE_ADDRESS_0 + bar_index*4);
> > +  return -1;
> > +   }
> > +
> > +   if (ioport_bar & PCI_BASE_ADDRESS_SPACE_IO) {
> > +  RTE_LOG(INFO, EAL, "Ignore mapping IO port bar(%d) addr: %x\n",
> > +  bar_index, ioport_bar);
>
> This log message should probably go to the "continue" portion of the
> calling code, it looks out of place here.

Agree. I will move it.

>
> > +  return 1;
> > +   }
> > +   return 0;
> > +}
> > +
> > +static int
> > +pci_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd)
> > +{
> > +   if (pci_vfio_setup_interrupts(dev, vfio_dev_fd) != 0) {
> > +  RTE_LOG(ERR, EAL, "Error setting up interrupts!\n");
> > +  return -1;
> > +   }
> > +
> > +   /* set bus mastering for the device */
> > +   if (pci_vfio_set_bus_master(vfio_dev_fd, true)) {
> > +  RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
> > +  return -1;
> > +   }
> > +
> > +   /* Reset the device */
> > +   ioctl(vfio_dev_fd, VFIO_DEVICE_RESET);
> > +
> > +   return 0;
> > +}
> > +
> > +static int
> > +pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource
*vfio_res,
> > +  int bar_index, int additional_flags)
> > +{
> > +   struct memreg {
> > +  unsigned long offset, size;
> > +   } memreg[2] = {};
> > +   void *bar_addr;
> > +   struct pci_msix_table *msix_table = &vfio_

Re: [dpdk-dev] [PATCH v2] vfio: refactor PCI BAR mapping

2017-10-06 Thread Jonas Pfefferle1
Hi Anatoly,

"Burakov, Anatoly"  wrote on 10/04/2017 03:13:22
PM:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle , dev@dpdk.org
> Date: 10/04/2017 03:14 PM
> Subject: Re: [PATCH v2] vfio: refactor PCI BAR mapping
>
> On 25-Sep-17 4:04 PM, Jonas Pfefferle wrote:
> > Split pci_vfio_map_resource for primary and secondary processes.
> > Save all relevant mapping data in primary process to allow
> > the secondary process to perform mappings.
> >
> > Signed-off-by: Jonas Pfefferle 
> > ---
> > v2:
> > * fix zero size and offset when trying to mmap non msix bar
> >
> >   lib/librte_eal/common/include/rte_pci.h|   7 +
> >   lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 446 +++
> ++
> >   2 files changed, 271 insertions(+), 182 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/
> librte_eal/common/include/rte_pci.h
> > index 8b12339..0821af9 100644
> > --- a/lib/librte_eal/common/include/rte_pci.h
> > +++ b/lib/librte_eal/common/include/rte_pci.h
> > @@ -214,6 +214,12 @@ struct pci_map {
> >  uint64_t phaddr;
> >   };
> >
> > +struct pci_msix_table {
> > +   int bar_index;
> > +   uint32_t offset;
> > +   uint32_t size;
> > +};
> > +
> >   /**
> >* A structure describing a mapped PCI resource.
> >* For multi-process we need to reproduce all PCI mappings in
secondary
> > @@ -226,6 +232,7 @@ struct mapped_pci_resource {
> >  char path[PATH_MAX];
> >  int nb_maps;
> >  struct pci_map maps[PCI_MAX_RESOURCE];
> > +   struct pci_msix_table msix_table;
> >   };
> >
> >   /** mapped pci device list */
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_pci_vfio.c
> > index aa9d96e..6443bd5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > @@ -88,8 +88,7 @@ pci_vfio_write_config(const struct
> rte_intr_handle *intr_handle,
> >
> >   /* get PCI BAR number where MSI-X interrupts are */
> >   static int
> > -pci_vfio_get_msix_bar(int fd, int *msix_bar, uint32_t
*msix_table_offset,
> > -uint32_t *msix_table_size)
> > +pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
> >   {
> >  int ret;
> >  uint32_t reg;
> > @@ -161,9 +160,10 @@ pci_vfio_get_msix_bar(int fd, int *msix_bar,
> uint32_t *msix_table_offset,
> >   return -1;
> >}
> >
> > - *msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
> > - *msix_table_offset = reg & RTE_PCI_MSIX_TABLE_OFFSET;
> > - *msix_table_size = 16 * (1 + (flags &
RTE_PCI_MSIX_FLAGS_QSIZE));
> > + msix_table->bar_index = reg & RTE_PCI_MSIX_TABLE_BIR;
> > + msix_table->offset = reg & RTE_PCI_MSIX_TABLE_OFFSET;
> > + msix_table->size =
> > +16 * (1 + (flags & RTE_PCI_MSIX_FLAGS_QSIZE));
> >
> >return 0;
> > }
> > @@ -300,25 +300,150 @@ pci_vfio_setup_interrupts(struct
> rte_pci_device *dev, int vfio_dev_fd)
> >  return -1;
> >   }
> >
> > -/*
> > - * map the PCI resources of a PCI device in virtual memory (VFIO
version).
> > - * primary and secondary processes follow almost exactly the same path
> > - */
> > -int
> > -pci_vfio_map_resource(struct rte_pci_device *dev)
> > +static int
> > +pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
> > +{
> > +   uint32_t ioport_bar;
> > +   int ret;
> > +
> > +   ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
> > +   VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
> > +   + PCI_BASE_ADDRESS_0 + bar_index*4);
> > +   if (ret != sizeof(ioport_bar)) {
> > +  RTE_LOG(ERR, EAL, "Cannot read command (%x) from config
space!\n",
> > + PCI_BASE_ADDRESS_0 + bar_index*4);
> > +  return -1;
> > +   }
> > +
> > +   return ioport_bar & PCI_BASE_ADDRESS_SPACE_IO;
>
> Not sure i like this. I think it's better to be explicit, e.g.
> return ioport_bar & PCI_BASE_ADDRESS_SPACE_IO != 0;
>
> Makes no difference (both because return value is non-zero and because
> PCI_BASE_ADDRESS_SPACE_IO is 0x01), but still, better to make intentions
> clear i think.

I agree. I wanted to change this anyway but forgot. V3 is going out in a
sec. Thanks for the ack ;)

>
> Otherwise, did a quick smoke-test and it works, so
>
> Acked-by: Anatoly Burakov 
>
> Keep the ack if you decide to submit a v3 :)
>


Re: [dpdk-dev] [PATCH v4] vfio: fix sPAPR IOMMU DMA window size

2017-10-10 Thread Jonas Pfefferle1
Hi Thomas,

Can you please apply this patch?

Thanks,
Jonas

Alexey Kardashevskiy  wrote on 08/10/2017 08:03:17 AM:

> From: Alexey Kardashevskiy 
> To: Jonas Pfefferle , anatoly.bura...@intel.com
> Cc: dev@dpdk.org
> Date: 08/10/2017 08:03 AM
> Subject: Re: [PATCH v4] vfio: fix sPAPR IOMMU DMA window size
>
> On 08/08/17 21:16, Jonas Pfefferle wrote:
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables
> > as we already span ~70TB of physical memory with 16MB hugepages.
> >
> > Signed-off-by: Jonas Pfefferle 
>
>
> Reviewed-by: Alexey Kardashevskiy 
>
>
>
> > ---
> > v2:
> > * roundup to next power 2 function without loop.
> >
> > v3:
> > * Replace roundup_next_pow2 with rte_align64pow2
> >
> > v4:
> > * do not assume ordering of physical addresses of memsegs
> >
> >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 20 +---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> > index 946df7e..7d5d61d 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -759,10 +759,19 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >return -1;
> > }
> >
> > -   /* calculate window size based on number of hugepages configured */
> > -   create.window_size = rte_eal_get_physmem_size();
> > +   /* create DMA window from 0 to max(phys_addr + len) */
> > +   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> > +  if (ms[i].addr == NULL)
> > + break;
> > +
> > +  create.window_size = RTE_MAX(create.window_size,
> > +ms[i].phys_addr + ms[i].len);
> > +   }
> > +
> > +   /* sPAPR requires window size to be a power of 2 */
> > +   create.window_size = rte_align64pow2(create.window_size);
> > create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > -   create.levels = 2;
> > +   create.levels = 1;
> >
> > ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
&create);
> > if (ret) {
> > @@ -771,6 +780,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >return -1;
> > }
> >
> > +   if (create.start_addr != 0) {
> > +  RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > +  return -1;
> > +   }
> > +
> > /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >struct vfio_iommu_type1_dma_map dma_map;
> >
>
>
> --
> Alexey
>


Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping

2017-10-26 Thread Jonas Pfefferle1

Hi @all

I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA will
not work for the sPAPR code since the dma window size is currently
determined by the physical address only. I'm preparing a patch to address
this.

Thanks,
Jonas

"dev"  wrote on 08/31/2017 05:26:16 AM:

> From: Santosh Shukla 
> To: dev@dpdk.org
> Cc: tho...@monjalon.net, jerin.ja...@caviumnetworks.com,
> hemant.agra...@nxp.com, olivier.m...@6wind.com,
> maxime.coque...@redhat.com, sergio.gonzalez.mon...@intel.com,
> bruce.richard...@intel.com, shreyansh.j...@nxp.com,
> gaetan.ri...@6wind.com, anatoly.bura...@intel.com,
> step...@networkplumber.org, acon...@redhat.com, Santosh Shukla
> 
> Date: 08/31/2017 05:28 AM
> Subject: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
> Sent by: "dev" 
>
> Check iova mode and accordingly map iova to pa or va.
>
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 
> Reviewed-by: Maxime Coquelin 
> ---
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> index c8a97b7e7..b32cd09a2 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>dma_map.vaddr = ms[i].addr_64;
>dma_map.size = ms[i].len;
> -  dma_map.iova = ms[i].phys_addr;
> +  if (rte_eal_iova_mode() == RTE_IOVA_VA)
> + dma_map.iova = dma_map.vaddr;
> +  else
> + dma_map.iova = ms[i].phys_addr;
>dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>
>ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
> @@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
>dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>dma_map.vaddr = ms[i].addr_64;
>dma_map.size = ms[i].len;
> -  dma_map.iova = ms[i].phys_addr;
> +  if (rte_eal_iova_mode() == RTE_IOVA_VA)
> + dma_map.iova = dma_map.vaddr;
> +  else
> + dma_map.iova = ms[i].phys_addr;
>dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>   VFIO_DMA_MAP_FLAG_WRITE;
>
> --
> 2.13.0
>


[dpdk-dev] Huge mapping secondary process linux

2017-10-27 Thread Jonas Pfefferle1


Hi @all,

I'm trying to make sense of the hugepage memory mappings in
librte_eal/linuxapp/eal/eal_memory.c:
* In rte_eal_hugepage_attach (line 1347) when we try to do a private
mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we need the
addresses to be identical with the primary process?
* On POWER we have this weird business going on where we use MAP_HUGETLB
because according to this commit:

commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
Author: Chao Zhu 
Date:   Thu Apr 6 15:36:09 2017 +0530

eal/ppc: fix mmap for memory initialization

On IBM POWER platform, when mapping /dev/zero file to hugepage memory
space, mmap will not respect the requested address hint. This will
cause
the memory initialization for the second process fails. This patch adds
the required mmap flags to make it work. Beside this, users need to set
the nr_overcommit_hugepages to expand the VA range. When
doing the initialization, users need to set both nr_hugepages and
nr_overcommit_hugepages to the same value, like 64, 128, etc.

mmap address hints are not respected. Looking at the mmap code in the
kernel this is not true entirely however under some circumstances the hint
can be ignored (
http://elixir.free-electrons.com/linux/latest/source/arch/powerpc/mm/mmap.c#L103
). However I believe we can remove the extra case for PPC if we use
MAP_FIXED when doing the secondary process mappings because we need them to
be identical anyway. We could also use MAP_FIXED when doing the primary
process mappings resp. get_virtual_area if we want to have any guarantees
when specifying a base address. Any thoughts?

Thanks,
Jonas


Re: [dpdk-dev] Huge mapping secondary process linux

2017-10-27 Thread Jonas Pfefferle1
"Burakov, Anatoly"  wrote on 10/27/2017 04:06:44
PM:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle1 , dev@dpdk.org
> Cc: chao...@linux.vnet.ibm.com, bruce.richard...@intel.com
> Date: 10/27/2017 04:06 PM
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>
> On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >
> >
> > Hi @all,
> >
> > I'm trying to make sense of the hugepage memory mappings in
> > librte_eal/linuxapp/eal/eal_memory.c:
> > * In rte_eal_hugepage_attach (line 1347) when we try to do a private
> > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we need
the
> > addresses to be identical with the primary process?
> > * On POWER we have this weird business going on where we use
MAP_HUGETLB
> > because according to this commit:
> >
> > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > Author: Chao Zhu 
> > Date:   Thu Apr 6 15:36:09 2017 +0530
> >
> >  eal/ppc: fix mmap for memory initialization
> >
> >  On IBM POWER platform, when mapping /dev/zero file to hugepage
memory
> >  space, mmap will not respect the requested address hint. This will
> > cause
> >  the memory initialization for the second process fails. This patch
adds
> >  the required mmap flags to make it work. Beside this, users need
to set
> >  the nr_overcommit_hugepages to expand the VA range. When
> >  doing the initialization, users need to set both nr_hugepages and
> >  nr_overcommit_hugepages to the same value, like 64, 128, etc.
> >
> > mmap address hints are not respected. Looking at the mmap code in the
> > kernel this is not true entirely however under some circumstances the
hint
> > can be ignored (
> > https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

> siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > ). However I believe we can remove the extra case for PPC if we use
> > MAP_FIXED when doing the secondary process mappings because we need
them to
> > be identical anyway. We could also use MAP_FIXED when doing the primary
> > process mappings resp. get_virtual_area if we want to have any
guarantees
> > when specifying a base address. Any thoughts?
> >
> > Thanks,
> > Jonas
> >
> hi Jonas,
>
> MAP_FIXED is not used because it's dangerous, it unmaps anything that is
> already mapped into that space. We would rather know that we can't map
> something than unwittingly unmap something that was mapped before.

Ok, I see. Maybe we can add a check to the primary process's memory
mappings whether the hint has been respected or not? At least warn if it
hasn't.

>
> --
> Thanks,
> Anatoly
>


Re: [dpdk-dev] Huge mapping secondary process linux

2017-10-27 Thread Jonas Pfefferle1

"Burakov, Anatoly"  wrote on 10/27/2017 04:44:52
PM:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle1 
> Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, dev@dpdk.org
> Date: 10/27/2017 04:45 PM
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>
> On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > "Burakov, Anatoly"  wrote on 10/27/2017
> > 04:06:44 PM:
> >
> >  > From: "Burakov, Anatoly" 
> >  > To: Jonas Pfefferle1 , dev@dpdk.org
> >  > Cc: chao...@linux.vnet.ibm.com, bruce.richard...@intel.com
> >  > Date: 10/27/2017 04:06 PM
> >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  >
> >  > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >  > >
> >  > >
> >  > > Hi @all,
> >  > >
> >  > > I'm trying to make sense of the hugepage memory mappings in
> >  > > librte_eal/linuxapp/eal/eal_memory.c:
> >  > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
private
> >  > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we

> > need the
> >  > > addresses to be identical with the primary process?
> >  > > * On POWER we have this weird business going on where we use
> > MAP_HUGETLB
> >  > > because according to this commit:
> >  > >
> >  > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> >  > > Author: Chao Zhu 
> >  > > Date:   Thu Apr 6 15:36:09 2017 +0530
> >  > >
> >  > >      eal/ppc: fix mmap for memory initialization
> >  > >
> >  > >      On IBM POWER platform, when mapping /dev/zero file to
hugepage
> > memory
> >  > >      space, mmap will not respect the requested address hint.This
will
> >  > > cause
> >  > >      the memory initialization for the second process fails. This
> > patch adds
> >  > >      the required mmap flags to make it work. Beside this, users
> > need to set
> >  > >      the nr_overcommit_hugepages to expand the VA range. When
> >  > >      doing the initialization, users need to set both nr_hugepages
and
> >  > >      nr_overcommit_hugepages to the same value, like 64, 128, etc.
> >  > >
> >  > > mmap address hints are not respected. Looking at the mmap code in
the
> >  > > kernel this is not true entirely however under some circumstances
> > the hint
> >  > > can be ignored (
> >  > > https://urldefense.proofpoint.com/v2/url?
> >  >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

> >  > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> >  > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> >  >
BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> >  > > ). However I believe we can remove the extra case for PPC if we
use
> >  > > MAP_FIXED when doing the secondary process mappings because we
need
> > them to
> >  > > be identical anyway. We could also use MAP_FIXED when doing the
primary
> >  > > process mappings resp. get_virtual_area if we want to have any
> > guarantees
> >  > > when specifying a base address. Any thoughts?
> >  > >
> >  > > Thanks,
> >  > > Jonas
> >  > >
> >  > hi Jonas,
> >  >
> >  > MAP_FIXED is not used because it's dangerous, it unmaps anything
that is
> >  > already mapped into that space. We would rather know that we can't
map
> >  > something than unwittingly unmap something that was mapped before.
> >
> > Ok, I see. Maybe we can add a check to the primary process's memory
> > mappings whether the hint has been respected or not? At least warn if
it
> > hasn't.
>
> Hi Jonas,
>
> I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain
> a bit more what you mean by "hint has been respected" :)

Hi Anatoly,

What I meant was the mmap address hint:

"If addr is not NULL, then the kernel takes it as a hint
 about where to place the mapping; on Linux, the mapping will be
 created at a nearby page boundary."

This is actually not true on POWER. It can happen that the address hint is
ignored and you get any address back that fits your mapping.

Thanks,
Jonas

>
>
> --
> Thanks,
> Anatoly
>


Re: [dpdk-dev] Huge mapping secondary process linux

2017-10-27 Thread Jonas Pfefferle1

"dev"  wrote on 10/27/2017 04:58:01 PM:

> From: "Jonas Pfefferle1" 
> To: "Burakov, Anatoly" 
> Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, dev@dpdk.org
> Date: 10/27/2017 04:58 PM
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> Sent by: "dev" 
>
>
> "Burakov, Anatoly"  wrote on 10/27/2017
04:44:52
> PM:
>
> > From: "Burakov, Anatoly" 
> > To: Jonas Pfefferle1 
> > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
dev@dpdk.org
> > Date: 10/27/2017 04:45 PM
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >
> > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > > "Burakov, Anatoly"  wrote on 10/27/2017
> > > 04:06:44 PM:
> > >
> > >  > From: "Burakov, Anatoly" 
> > >  > To: Jonas Pfefferle1 , dev@dpdk.org
> > >  > Cc: chao...@linux.vnet.ibm.com, bruce.richard...@intel.com
> > >  > Date: 10/27/2017 04:06 PM
> > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  >
> > >  > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > >  > >
> > >  > >
> > >  > > Hi @all,
> > >  > >
> > >  > > I'm trying to make sense of the hugepage memory mappings in
> > >  > > librte_eal/linuxapp/eal/eal_memory.c:
> > >  > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
> private
> > >  > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if
we
>
> > > need the
> > >  > > addresses to be identical with the primary process?
> > >  > > * On POWER we have this weird business going on where we use
> > > MAP_HUGETLB
> > >  > > because according to this commit:
> > >  > >
> > >  > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > >  > > Author: Chao Zhu 
> > >  > > Date:   Thu Apr 6 15:36:09 2017 +0530
> > >  > >
> > >  > >      eal/ppc: fix mmap for memory initialization
> > >  > >
> > >  > >      On IBM POWER platform, when mapping /dev/zero file to
> hugepage
> > > memory
> > >  > >      space, mmap will not respect the requested address
hint.This
> will
> > >  > > cause
> > >  > >      the memory initialization for the second process fails.
This
> > > patch adds
> > >  > >      the required mmap flags to make it work. Beside this, users
> > > need to set
> > >  > >      the nr_overcommit_hugepages to expand the VA range. When
> > >  > >      doing the initialization, users need to set both
nr_hugepages
> and
> > >  > >      nr_overcommit_hugepages to the same value, like 64, 128,
etc.
> > >  > >
> > >  > > mmap address hints are not respected. Looking at the mmap code
in
> the
> > >  > > kernel this is not true entirely however under some
circumstances
> > > the hint
> > >  > > can be ignored (
> > >  > > https://urldefense.proofpoint.com/v2/url?
> > >  >
> > >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

>
> > >  > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > >  > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> > >  >
> BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > >  > > ). However I believe we can remove the extra case for PPC if we
> use
> > >  > > MAP_FIXED when doing the secondary process mappings because we
> need
> > > them to
> > >  > > be identical anyway. We could also use MAP_FIXED when doing the
> primary
> > >  > > process mappings resp. get_virtual_area if we want to have any
> > > guarantees
> > >  > > when specifying a base address. Any thoughts?
> > >  > >
> > >  > > Thanks,
> > >  > > Jonas
> > >  > >
> > >  > hi Jonas,
> > >  >
> > >  > MAP_FIXED is not used because it's dangerous, it unmaps anything
> that is
> > >  > already mapped into that space. We would rather know that we can't
> map
> > >  > something than unwittingly unmap something that was mapped before.
> > >
> > > Ok, I see. Maybe we can add a check to the primary process's memory
> > > mappings whether the hint has been respected or not? At least warn if
> it
> > > hasn't.
> >
> > Hi Jonas,
> >
> > I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain
> > a bit more what you mean by "hint has been respected" :)
>
> Hi Anatoly,
>
> What I meant was the mmap address hint:
>
> "If addr is not NULL, then the kernel takes it as a hint
>  about where to place the mapping; on Linux, the mapping will be
>  created at a nearby page boundary."
>
> This is actually not true on POWER. It can happen that the address hint
is
> ignored and you get any address back that fits your mapping.
>
> Thanks,
> Jonas

Actually looking through the kernel code this is also not guaranteed on
x86.
(
http://elixir.free-electrons.com/linux/latest/source/arch/x86/kernel/sys_x86_64.c#L165
)

So in any case the address hint can be ignored by the kernel and you get
any address that fits your mapping.
My suggestion is to check when we do the initial mapping in
get_virtual_area if the hint was respected or not, i.e. if the returned
address == PAGE_ALIGN(address_hint).

Thanks,
Jonas

>
> >
> >
> > --
> > Thanks,
> > Anatoly
> >
>


Re: [dpdk-dev] Huge mapping secondary process linux

2017-10-27 Thread Jonas Pfefferle1
"Burakov, Anatoly"  wrote on 27/10/2017 
18:00:27:

> From: "Burakov, Anatoly" 
> To: Jonas Pfefferle1 
> Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, dev@dpdk.org
> Date: 27/10/2017 18:00
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> 
> On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > "dev"  wrote on 10/27/2017 04:58:01 PM:
> > 
> >  > From: "Jonas Pfefferle1" 
> >  > To: "Burakov, Anatoly" 
> >  > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, 
dev@dpdk.org
> >  > Date: 10/27/2017 04:58 PM
> >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > Sent by: "dev" 
> >  >
> >  >
> >  > "Burakov, Anatoly"  wrote on 10/27/2017 
> > 04:44:52
> >  > PM:
> >  >
> >  > > From: "Burakov, Anatoly" 
> >  > > To: Jonas Pfefferle1 
> >  > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com, 
> > dev@dpdk.org
> >  > > Date: 10/27/2017 04:45 PM
> >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > >
> >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> >  > > > "Burakov, Anatoly"  wrote on 
10/27/2017
> >  > > > 04:06:44 PM:
> >  > > >
> >  > > > Â > From: "Burakov, Anatoly" 
> >  > > > Â > To: Jonas Pfefferle1 , dev@dpdk.org
> >  > > > Â > Cc: chao...@linux.vnet.ibm.com, bruce.richard...@intel.com
> >  > > > Â > Date: 10/27/2017 04:06 PM
> >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process 
linux
> >  > > > Â >
> >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >  > > > Â > >
> >  > > > Â > >
> >  > > > Â > > Hi @all,
> >  > > > Â > >
> >  > > > Â > > I'm trying to make sense of the hugepage memory mappings 
in
> >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to 
do a
> >  > private
> >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use 
MAP_FIXED 
> > if we
> >  >
> >  > > > need the
> >  > > > Â > > addresses to be identical with the primary process?
> >  > > > Â > > * On POWER we have this weird business going on where we 
use
> >  > > > MAP_HUGETLB
> >  > > > Â > > because according to this commit:
> >  > > > Â > >
> >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> >  > > > Â > > Author: Chao Zhu 
> >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero 
file to
> >  > hugepage
> >  > > > memory
> >  > > > Â > > Â  Â  Â space, mmap will not respect the requested 
address 
> > hint.This
> >  > will
> >  > > > Â > > cause
> >  > > > Â > > Â  Â  Â the memory initialization for the second 
> process fails. 
> > This
> >  > > > patch adds
> >  > > > Â > > Â  Â  Â the required mmap flags to make it work. 
> Beside this, users
> >  > > > need to set
> >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA 
> range. When
> >  > > > Â > > Â  Â  Â doing the initialization, users need to set both 
> > nr_hugepages
> >  > and
> >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same value, like 
64, 
> > 128, etc.
> >  > > > Â > >
> >  > > > Â > > mmap address hints are not respected. Looking at the mmap 

> > code in
> >  > the
> >  > > > Â > > kernel this is not true entirely however under some 
> > circumstances
> >  > > > the hint
> >  > > > Â > > can be ignored (
> >  > > > Â > > https://urldefense.proofpoint.com/v2/url?
> >  > > > Â >
> >  > > >
> >  > >
> >  > 
> > 
> 
u=http-3A__elixir.free-2Delectrons.com_linux_l

Re: [dpdk-dev] [PATCH] bus/pci: fix vfio device reset

2017-10-28 Thread Jonas Pfefferle1
Jerin Jacob  wrote on 28/10/2017 08:22:55:

> From: Jerin Jacob 
> To: dev@dpdk.org
> Cc: tho...@monjalon.net, Jerin Jacob 
> , Jonas Pfefferle 
> , Anatoly Burakov 
> Date: 28/10/2017 08:23
> Subject: [dpdk-dev] [PATCH] bus/pci: fix vfio device reset
> 
> If the device is not capable of resetting, then Linux kernel updates
> the errno as EINVAL.
> https://urldefense.proofpoint.com/v2/url?
> u=http-3A__elixir.free-2Delectrons.com_linux_v4.
> 9_source_drivers_vfio_pci_vfio-5Fpci.c-23L887&d=DwIBAg&c=jf_iaSHvJObTbx-
> siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-pXjigIjRW0&m=V-
> sbOkvx7qxOMbwyk3n1Fb_1NCjAhl0io-
> hqldJ0r6M&s=OXV7lrxTpahVIsA3J3nCNUlLqW21nlMiQiYveAzyQhc&e=
> 
> Honor the EINVAL errno value to avoid pci vfio setup failure.
> 
> Fixes: f25f8f367644 ("bus/pci: check VFIO reset ioctl error")
> 
> Cc: Jonas Pfefferle 
> Cc: Anatoly Burakov 
> 
> Signed-off-by: Jerin Jacob 
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/
> linux/pci_vfio.c
> index 360eed380..11df64846 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -338,8 +338,11 @@ pci_vfio_setup_device(struct rte_pci_device 
> *dev, int vfio_dev_fd)
>return -1;
> }
> 
> -   /* Reset the device */
> -   if (ioctl(vfio_dev_fd, VFIO_DEVICE_RESET)) {
> +   /*
> +* Reset the device. If the device is not capable of resetting,
> +* then it updates errno as EINVAL.
> +*/
> +   if (ioctl(vfio_dev_fd, VFIO_DEVICE_RESET) && errno != EINVAL) {
>RTE_LOG(ERR, EAL, "Unable to reset device! Error: %d (%s)\n",
>  errno, strerror(errno));
>return -1;
> -- 
> 2.14.3
> 

Looks good to me.

Reviewed-by: Jonas Pfefferle 

Thanks,
Jonas



Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping

2017-11-02 Thread Jonas Pfefferle1

Thomas Monjalon  wrote on 11/02/2017 11:17:10 AM:

> From: Thomas Monjalon 
> To: Jonas Pfefferle1 
> Cc: dev@dpdk.org, Santosh Shukla
> , jerin.ja...@caviumnetworks.com,
> hemant.agra...@nxp.com, olivier.m...@6wind.com,
> maxime.coque...@redhat.com, sergio.gonzalez.mon...@intel.com,
> bruce.richard...@intel.com, shreyansh.j...@nxp.com,
> gaetan.ri...@6wind.com, anatoly.bura...@intel.com,
> step...@networkplumber.org, acon...@redhat.com
> Date: 11/02/2017 11:17 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
>
> Hi
>
> 26/10/2017 14:57, Jonas Pfefferle1:
> >
> > Hi @all
> >
> > I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA
will
> > not work for the sPAPR code since the dma window size is currently
> > determined by the physical address only.
>
> Is itaffecting POWER8?

It is.

>
> > I'm preparing a patch to address this.
>
> Any news?
> Can you use virtual addresses?

After a long discussion with Alexey (CC) we came to the conclusion that
with the current sPAPR iommu driver we cannot use virtual addresses since
the iova is restricted to lay in the DMA window which itself is restricted
to physical RAM addresses resp. with the current code 0 to hotplug memory
max. However, Alexey is working on a patch to lift this restriction on the
DMA window size which should allow us to do VA:VA mappings in the future.
For now we should fall back to PA in the dynamic iova mode check. I will
send an according patch later today.

>
>


Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping

2017-11-03 Thread Jonas Pfefferle1
"dev"  wrote on 11/02/2017 11:26:57 AM:

> From: "Jonas Pfefferle1" 
> To: Thomas Monjalon 
> Cc: acon...@redhat.com, anatoly.bura...@intel.com,
> bruce.richard...@intel.com, dev@dpdk.org, gaetan.ri...@6wind.com,
> hemant.agra...@nxp.com, jerin.ja...@caviumnetworks.com,
> maxime.coque...@redhat.com, olivier.m...@6wind.com, Santosh Shukla
> ,
> sergio.gonzalez.mon...@intel.com, shreyansh.j...@nxp.com,
> step...@networkplumber.org, "Alexey Kardashevskiy" 
> Date: 11/02/2017 11:27 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
> Sent by: "dev" 
>
>
> Thomas Monjalon  wrote on 11/02/2017 11:17:10 AM:
>
> > From: Thomas Monjalon 
> > To: Jonas Pfefferle1 
> > Cc: dev@dpdk.org, Santosh Shukla
> > , jerin.ja...@caviumnetworks.com,
> > hemant.agra...@nxp.com, olivier.m...@6wind.com,
> > maxime.coque...@redhat.com, sergio.gonzalez.mon...@intel.com,
> > bruce.richard...@intel.com, shreyansh.j...@nxp.com,
> > gaetan.ri...@6wind.com, anatoly.bura...@intel.com,
> > step...@networkplumber.org, acon...@redhat.com
> > Date: 11/02/2017 11:17 AM
> > Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> > mode before mapping
> >
> > Hi
> >
> > 26/10/2017 14:57, Jonas Pfefferle1:
> > >
> > > Hi @all
> > >
> > > I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA
> will
> > > not work for the sPAPR code since the dma window size is currently
> > > determined by the physical address only.
> >
> > Is itaffecting POWER8?
>
> It is.
>
> >
> > > I'm preparing a patch to address this.
> >
> > Any news?
> > Can you use virtual addresses?
>
> After a long discussion with Alexey (CC) we came to the conclusion that
> with the current sPAPR iommu driver we cannot use virtual addresses since
> the iova is restricted to lay in the DMA window which itself is
restricted
> to physical RAM addresses resp. with the current code 0 to hotplug memory
> max. However, Alexey is working on a patch to lift this restriction on
the
> DMA window size which should allow us to do VA:VA mappings in the future.
> For now we should fall back to PA in the dynamic iova mode check. I will
> send an according patch later today.

I looked into this yesterday but I'm not sure what the right solution is
here.
At the time rte_pci_get_iommu_class is called we already know which IOMMU
types are supported because vfio_get_container_fd resp.
vfio_has_supported_extensions  has been called however we do not know which
one is going to be used (Decided later in vfio_setup_device resp.
vfio_set_iommu_type). We can choose a iova mode which is supported by all
types but if the modes are exclusive to the types we have to guess which
one is going to be used. Or let the user decide?

Thanks,
Jonas

>
> >
> >
>


Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping

2017-11-03 Thread Jonas Pfefferle1
Thomas Monjalon  wrote on 11/03/2017 11:28:10 AM:

> From: Thomas Monjalon 
> To: Jonas Pfefferle1 
> Cc: acon...@redhat.com, anatoly.bura...@intel.com,
> bruce.richard...@intel.com, dev@dpdk.org, gaetan.ri...@6wind.com,
> hemant.agra...@nxp.com, jerin.ja...@caviumnetworks.com,
> maxime.coque...@redhat.com, olivier.m...@6wind.com, Santosh Shukla
> ,
> sergio.gonzalez.mon...@intel.com, shreyansh.j...@nxp.com,
> step...@networkplumber.org, Alexey Kardashevskiy 
> Date: 11/03/2017 11:28 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
>
> 03/11/2017 10:56, Jonas Pfefferle1:
> > Thomas Monjalon  wrote on 11/02/2017 11:17:10 AM:
> > > > 26/10/2017 14:57, Jonas Pfefferle1:
> > > > >
> > > > > Hi @all
> > > > >
> > > > > I just stumbled upon this patch while testing on POWER.
RTE_IOVA_VA
> > > will
> > > > > not work for the sPAPR code since the dma window size is
currently
> > > > > determined by the physical address only.
> > > >
> > > > Is itaffecting POWER8?
> > >
> > > It is.
> > >
> > > >
> > > > > I'm preparing a patch to address this.
> > > >
> > > > Any news?
> > > > Can you use virtual addresses?
> > >
> > > After a long discussion with Alexey (CC) we came to the conclusion
that
> > > with the current sPAPR iommu driver we cannot use virtual addresses
since
> > > the iova is restricted to lay in the DMA window which itself is
> > restricted
> > > to physical RAM addresses resp. with the current code 0 to hotplug
memory
> > > max. However, Alexey is working on a patch to lift this restriction
on
> > the
> > > DMA window size which should allow us to do VA:VA mappings in the
future.
> > > For now we should fall back to PA in the dynamic iova mode check. I
will
> > > send an according patch later today.
> >
> > I looked into this yesterday but I'm not sure what the right solution
is
> > here.
> > At the time rte_pci_get_iommu_class is called we already know which
IOMMU
> > types are supported because vfio_get_container_fd resp.
> > vfio_has_supported_extensions  has been called however we do not know
which
> > one is going to be used (Decided later in vfio_setup_device resp.
> > vfio_set_iommu_type). We can choose a iova mode which is supported by
all
> > types but if the modes are exclusive to the types we have to guess
which
> > one is going to be used. Or let the user decide?
>
> You can keep the old behaviour, restricting to physical memory,
> until you support virtual addressing.
> It can be just a #ifdef RTE_ARCH_PPC_64.
>

Ok but we might want to refine this in the future. IMO It looks much
cleaner
to decide this on the iommu type plus this would also cover the noiommu
case without having this extra check reading the sysfs variable.


Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping

2017-11-03 Thread Jonas Pfefferle1
Thomas Monjalon  wrote on 11/03/2017 11:54:45 AM:

> From: Thomas Monjalon 
> To: Jonas Pfefferle1 
> Cc: acon...@redhat.com, Alexey Kardashevskiy ,
> anatoly.bura...@intel.com, bruce.richard...@intel.com, dev@dpdk.org,
> gaetan.ri...@6wind.com, hemant.agra...@nxp.com,
> jerin.ja...@caviumnetworks.com, maxime.coque...@redhat.com,
> olivier.m...@6wind.com, Santosh Shukla
> ,
> sergio.gonzalez.mon...@intel.com, shreyansh.j...@nxp.com,
> step...@networkplumber.org
> Date: 11/03/2017 11:55 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
>
> 03/11/2017 11:44, Jonas Pfefferle1:
> > Thomas Monjalon  wrote on 11/03/2017 11:28:10 AM:
> > > 03/11/2017 10:56, Jonas Pfefferle1:
> > > > Thomas Monjalon  wrote on 11/02/2017 11:17:10
AM:
> > > > > > 26/10/2017 14:57, Jonas Pfefferle1:
> > > > > > >
> > > > > > > Hi @all
> > > > > > >
> > > > > > > I just stumbled upon this patch while testing on POWER.
> > RTE_IOVA_VA
> > > > > will
> > > > > > > not work for the sPAPR code since the dma window size is
> > currently
> > > > > > > determined by the physical address only.
> > > > > >
> > > > > > Is itaffecting POWER8?
> > > > >
> > > > > It is.
> > > > >
> > > > > >
> > > > > > > I'm preparing a patch to address this.
> > > > > >
> > > > > > Any news?
> > > > > > Can you use virtual addresses?
> > > > >
> > > > > After a long discussion with Alexey (CC) we came to the
conclusion
> > that
> > > > > with the current sPAPR iommu driver we cannot use virtual
addresses
> > since
> > > > > the iova is restricted to lay in the DMA window which itself is
> > > > restricted
> > > > > to physical RAM addresses resp. with the current code 0 to
hotplug
> > memory
> > > > > max. However, Alexey is working on a patch to lift this
restriction
> > on
> > > > the
> > > > > DMA window size which should allow us to do VA:VA mappings in the
> > future.
> > > > > For now we should fall back to PA in the dynamic iova mode check.
I
> > will
> > > > > send an according patch later today.
> > > >
> > > > I looked into this yesterday but I'm not sure what the right
solution
> > is
> > > > here.
> > > > At the time rte_pci_get_iommu_class is called we already know which
> > IOMMU
> > > > types are supported because vfio_get_container_fd resp.
> > > > vfio_has_supported_extensions  has been called however we do not
know
> > which
> > > > one is going to be used (Decided later in vfio_setup_device resp.
> > > > vfio_set_iommu_type). We can choose a iova mode which is supported
by
> > all
> > > > types but if the modes are exclusive to the types we have to guess
> > which
> > > > one is going to be used. Or let the user decide?
> > >
> > > You can keep the old behaviour, restricting to physical memory,
> > > until you support virtual addressing.
> > > It can be just a #ifdef RTE_ARCH_PPC_64.
> > >
> >
> > Ok but we might want to refine this in the future. IMO It looks much
> > cleaner
> > to decide this on the iommu type plus this would also cover the noiommu
> > case without having this extra check reading the sysfs variable.
>
> You are using the word "this" too many times to help me understand :)

What I meant is a fix that decides which iova mode to use based on the
iommu types supported (determined by vfio_get_container_fd) instead of
another extra case for PPC much like the noiommu check. Both should
be covered by the supported types based check.
IMO much cleaner and simpler to support new iommu types.

>
> Anyway, please send a quick fix today for 17.11.
> The RC3 will be probably closed before Monday.
>

Will do.


Re: [dpdk-dev] [PATCH v3 4/6] eal/memory: rename memory API to iovatypes

2017-11-03 Thread Jonas Pfefferle1
"dev"  wrote on 11/03/2017 02:58:43 PM:

> From: Thomas Monjalon 
> To: santosh 
> Cc: dev@dpdk.org, olivier.m...@6wind.com,
> jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com,
> anatoly.bura...@intel.com
> Date: 11/03/2017 02:59 PM
> Subject: Re: [dpdk-dev] [PATCH v3 4/6] eal/memory: rename memory API
> to iova types
> Sent by: "dev" 
>
> 03/11/2017 12:35, santosh:
> > On Friday 03 November 2017 04:41 PM, Thomas Monjalon wrote:
> > > 20/10/2017 14:31, Santosh Shukla:
> > >> Renamed memory translational api to _iova types.
> > >> The following api renamed from:
> > >>
> > >> rte_mempool_populate_phys()
> > >> rte_mempool_populate_phys_tab()
> > > These functions still have "physical addresses" in their description.
> > > It is not consistent.
> > >
> > > Please provide ABI compatibility for mempool functions.
> > >
> > >> rte_eal_using_phys_addrs()
> > > Why renaming rte_eal_using_phys_addrs?
> > > I think we need to review how it is used.
> > > Maybe it requires a rework.
> > >
> > >> rte_mem_virt2phy()
> > >> rte_dump_physmem_layout()
> > >> rte_eal_get_physmem_layout()
> > >> rte_eal_get_physmem_size()
> > > Those 3 functions deal with physical memory layout.
> > > I don't see a need to rename them.
> >
> > In iova=va mode, those api will have va address.
>
> Yes addresses can be virtual.
> But it is still physically segmented.
>
> > > But the dump function needs a change to avoid printing
> > > "phys" even in VA case.
> > >
> > >> rte_malloc_virt2phy()
> > >> rte_mem_phy2mch()
> > > This last function was removed with Xen.
> > > It is wrong to rename it in the release notes.
> > > It should just be removed from the map file (I will send a patch).
> > >
> > >> To the following iova types api:
> > >>
> > >> rte_mempool_populate_iova()
> > >> rte_mempool_populate_iova_tab()
> > >> rte_eal_using_iova_addrs()
> > >> rte_mem_virt2iova()
> > > [...]
> > >> rte_malloc_virt2iova()
> > > I am not convinced by the names virt2iova.
> > > I sounds like "virt to virt".
> > > What about "virt2io" or "virt2io_addr"?
> >
> > no, iova can be _pa or _va, its an io_addr indeed but
> > I prefer virt2iova, same mentioned in deprecation notice (no
> strong opinion),
> >
> > More suggestion on naming pl.?
>
> I like virt2io_addr but virt2iova is not so bad.
> I agree with Santosh that we need more opinions.

virt2iova +1

Another suggestion va2iova to be consistent with the names

>
> > > As the ABI is broken in EAL 17.11, we do not care about
compatibility.
> > > But we must keep an alias to the old function name in order to allow
> > > a smooth API transition for applications.
> > > I suggest to add static inline functions with the old names and set
> > > the deprecated attribute.
> >
> > ok, Will address in 18.02.
>
> I would prefer these changes made in 17.11 as announced.
> As you are not willing to work on it, I am trying to send some
> updated patches.
>


Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling

2017-11-07 Thread Jonas Pfefferle1
Thomas Monjalon  wrote on 11/06/2017 09:25:15 PM:

> From: Thomas Monjalon 
> To: Jonas Pfefferle , anatoly.bura...@intel.com
> Cc: dev@dpdk.org
> Date: 11/06/2017 09:55 PM
> Subject: Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling
>
> 31/10/2017 16:59, Jonas Pfefferle:
> > Check and report errors on open/read in noiommu check.
> >
> > Signed-off-by: Jonas Pfefferle 
>
> I cannot decide to apply this patch as it does not explain what
> it is fixing, and as it is not reviewed.
>

This patch adds error handling and logging to the noiommu check.
Also, on older kernels when the noiommu_enable file does not exist it
assumes noiommu is not enabled instead of returning -1.
Note that in rte_pci_get_iommu_class (drivers/bus/pci/linux/pci.c)
is the only usage of the function and it assumes return == 1:
noiommu is enabled any other return value noiommu disabled, i.e.
my code change does not change the behavior of this function.
We might want to check for errors in rte_pci_get_iommu_class
as well since assuming it is not enabled when we cannot open
and read it might lead to iova == VA being used even if noiommu is
enabled.

All this comes back to what I proposed before: instead of
the noiommu and PPC64 check we should decide which iova mode
to use depending on the iommu types available.
The type should be already available at the point where we
decide on the iova type.
(iommu types supported is checked by vfio_get_container_fd)


Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling

2017-11-07 Thread Jonas Pfefferle1
Thomas Monjalon  wrote on 11/07/2017 10:40:15 AM:

> From: Thomas Monjalon 
> To: Jonas Pfefferle1 , anatoly.bura...@intel.com
> Cc: dev@dpdk.org
> Date: 11/07/2017 10:40 AM
> Subject: Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling
>
> 07/11/2017 10:05, Jonas Pfefferle1:
> > Thomas Monjalon  wrote on 11/06/2017 09:25:15 PM:
> >
> > > From: Thomas Monjalon 
> > > To: Jonas Pfefferle , anatoly.bura...@intel.com
> > > Cc: dev@dpdk.org
> > > Date: 11/06/2017 09:55 PM
> > > Subject: Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling
> > >
> > > 31/10/2017 16:59, Jonas Pfefferle:
> > > > Check and report errors on open/read in noiommu check.
> > > >
> > > > Signed-off-by: Jonas Pfefferle 
> > >
> > > I cannot decide to apply this patch as it does not explain what
> > > it is fixing, and as it is not reviewed.
> > >
> >
> > This patch adds error handling and logging to the noiommu check.
> > Also, on older kernels when the noiommu_enable file does not exist it
> > assumes noiommu is not enabled instead of returning -1.
> > Note that in rte_pci_get_iommu_class (drivers/bus/pci/linux/pci.c)
> > is the only usage of the function and it assumes return == 1:
> > noiommu is enabled any other return value noiommu disabled, i.e.
> > my code change does not change the behavior of this function.
> > We might want to check for errors in rte_pci_get_iommu_class
> > as well since assuming it is not enabled when we cannot open
> > and read it might lead to iova == VA being used even if noiommu is
> > enabled.
> >
> > All this comes back to what I proposed before: instead of
> > the noiommu and PPC64 check we should decide which iova mode
> > to use depending on the iommu types available.
> > The type should be already available at the point where we
> > decide on the iova type.
> > (iommu types supported is checked by vfio_get_container_fd)
>
> Is there something urgent for 17.11?
> Or can it be refined in 18.02?

Nothing urgent. We can refine this for 18.02.


>
> Anatoly, any thought?
>


Re: [dpdk-dev] Huge mapping secondary process linux

2017-11-07 Thread Jonas Pfefferle1
"Chao Zhu"  wrote on 11/07/2017 09:25:26 AM:

> From: "Chao Zhu" 
> To: "'Jonas Pfefferle1'" , "'Burakov, Anatoly'"
> 
> Cc: , 
> Date: 11/07/2017 11:00 AM
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
>
>
>
> From: Jonas Pfefferle1 [mailto:j...@zurich.ibm.com]
> Sent: 2017年10月28日 3:23
> To: Burakov, Anatoly 
> Cc: bruce.richard...@intel.com; chao...@linux.vnet.ibm.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>
> "Burakov, Anatoly"  wrote on 27/10/2017
18:00:27:
>
> > From: "Burakov, Anatoly" 
> > To: Jonas Pfefferle1 
> > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
dev@dpdk.org
> > Date: 27/10/2017 18:00
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >
> > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > > "dev"  wrote on 10/27/2017 04:58:01 PM:
> > >
> > >  > From: "Jonas Pfefferle1" 
> > >  > To: "Burakov, Anatoly" 
> > >  > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
> dev@dpdk.org
> > >  > Date: 10/27/2017 04:58 PM
> > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > Sent by: "dev" 
> > >  >
> > >  >
> > >  > "Burakov, Anatoly"  wrote on 10/27/2017

> > > 04:44:52
> > >  > PM:
> > >  >
> > >  > > From: "Burakov, Anatoly" 
> > >  > > To: Jonas Pfefferle1 
> > >  > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
> > > dev@dpdk.org
> > >  > > Date: 10/27/2017 04:45 PM
> > >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > >
> > >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > >  > > > "Burakov, Anatoly"  wrote on
10/27/2017
> > >  > > > 04:06:44 PM:
> > >  > > >
> > >  > > > Â > From: "Burakov, Anatoly" 
> > >  > > > Â > To: Jonas Pfefferle1 , dev@dpdk.org
> > >  > > > Â > Cc: chao...@linux.vnet.ibm.com, bruce.richard...@intel.com
> > >  > > > Â > Date: 10/27/2017 04:06 PM
> > >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process
linux
> > >  > > > Â >
> > >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > >  > > > Â > >
> > >  > > > Â > >
> > >  > > > Â > > Hi @all,
> > >  > > > Â > >
> > >  > > > Â > > I'm trying to make sense of the hugepage memory mappings
in
> > >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> > >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to
do a
> > >  > private
> > >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use
MAP_FIXED
> > > if we
> > >  >
> > >  > > > need the
> > >  > > > Â > > addresses to be identical with the primary process?
> > >  > > > Â > > * On POWER we have this weird business going on where we
use
> > >  > > > MAP_HUGETLB
> > >  > > > Â > > because according to this commit:
> > >  > > > Â > >
> > >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > >  > > > Â > > Author: Chao Zhu 
> > >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> > >  > > > Â > >
> > >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> > >  > > > Â > >
> > >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero
file to
> > >  > hugepage
> > >  > > > memory
> > >  > > > Â > > Â  Â  Â space, mmap will not respect the requested
address
> > > hint.This
> > >  > will
> > >  > > > Â > > cause
> > >  > > > Â > > Â  Â  Â the memory initialization for the second
> > process fails.
> > > This
> > >  > > > patch adds
> > >  > > > Â > > Â  Â  Â the required mmap flags to make it work.
> > Beside this, users
> > >  > > > need to set
> > >  > > > Â > 

Re: [dpdk-dev] [PATCH] pci: get IOMMU class sPAPR iommu fix

2017-11-07 Thread Jonas Pfefferle1
Thomas Monjalon  wrote on 11/07/2017 12:38:11 AM:

> From: Thomas Monjalon 
> To: Jonas Pfefferle 
> Cc: dev@dpdk.org, anatoly.bura...@intel.com
> Date: 11/07/2017 12:38 AM
> Subject: Re: [dpdk-dev] [PATCH] pci: get IOMMU class sPAPR iommu fix
>
> 03/11/2017 13:05, Jonas Pfefferle:
> > PPC64 sPAPR iommu does not support iova as va.
> > Use pa mode instead.
> >
> > Signed-off-by: Jonas Pfefferle 
>
> Applied, thanks
>

Hi Thomas,

I just noticed that I send in an old patch by mistake.
It uses the wrong define RTE_ARCH_PPC64 instead of RTE_ARCH_PPC_64.
Sorry for this, I will send in a fix for this in the next hour.

Thanks,
Jonas


Re: [dpdk-dev] [PATCH] mem: warn if address hint is not respected

2017-11-08 Thread Jonas Pfefferle1
"Burakov, Anatoly"  wrote on 11/07/2017 02:54:24
PM:

> From: "Burakov, Anatoly" 
> To: Thomas Monjalon 
> Cc: dev@dpdk.org, Jonas Pfefferle ,
jianfeng@intel.com
> Date: 11/07/2017 02:54 PM
> Subject: Re: [dpdk-dev] [PATCH] mem: warn if address hint is not
respected
>
> On 06-Nov-17 8:26 PM, Thomas Monjalon wrote:
> > 31/10/2017 10:08, Jonas Pfefferle:
> >> Print a warning if the --base-virtaddr hint is not respected
> >> since this might lead to problems when mapping memory in
> >> the secondary process.
> >>
> >> Signed-off-by: Jonas Pfefferle 
> >
> > Anatoly, please review this patch.
> > It does not seem to fix something, so it is candidate for 18.02.
> >
>
> For some reason my Thunderbird ate the original email, so i'll reply to
> this one.
>
> One nitpick would be that we're calling get_virtual_area many times and
> it would probably be a good idea to make pagesize static and call
> sysconf only once. Otherwise,

We should address this in a separate patch and introduce a pagesize
function
for everyone to use. sysconf is used like this all over the place.

>
> Acked-by: Anatoly Burakov 
>
> --
> Thanks,
> Anatoly
>


Re: [dpdk-dev] Huge mapping secondary process linux

2017-11-09 Thread Jonas Pfefferle1
"Chao Zhu"  wrote on 11/09/2017 04:08:36 AM:

> From: "Chao Zhu" 
> To: "'Jonas Pfefferle1'" 
> Cc: "'Burakov, Anatoly'" ,
> , 
> Date: 11/09/2017 04:08 AM
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
>
>
>
> From: Jonas Pfefferle1 [mailto:j...@zurich.ibm.com]
> Sent: 2017年11月7日 18:16
> To: Chao Zhu 
> Cc: 'Burakov, Anatoly' ;
> bruce.richard...@intel.com; dev@dpdk.org
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
>
> "Chao Zhu"  wrote on 11/07/2017 09:25:26 AM:
>
> > From: "Chao Zhu" 
> > To: "'Jonas Pfefferle1'" , "'Burakov, Anatoly'"
> > 
> > Cc: , 
> > Date: 11/07/2017 11:00 AM
> > Subject: RE: [dpdk-dev] Huge mapping secondary process linux
> >
> >
> >
> > From: Jonas Pfefferle1 [mailto:j...@zurich.ibm.com]
> > Sent: 2017年10月28日 3:23
> > To: Burakov, Anatoly 
> > Cc: bruce.richard...@intel.com; chao...@linux.vnet.ibm.com;
dev@dpdk.org
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >
> > "Burakov, Anatoly"  wrote on
27/10/201718:00:27:
> >
> > > From: "Burakov, Anatoly" 
> > > To: Jonas Pfefferle1 
> > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
dev@dpdk.org
> > > Date: 27/10/2017 18:00
> > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >
> > > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > > > "dev"  wrote on 10/27/2017 04:58:01 PM:
> > > >
> > > >  > From: "Jonas Pfefferle1" 
> > > >  > To: "Burakov, Anatoly" 
> > > >  > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
> > dev@dpdk.org
> > > >  > Date: 10/27/2017 04:58 PM
> > > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > > >  > Sent by: "dev" 
> > > >  >
> > > >  >
> > > >  > "Burakov, Anatoly"  wrote on
10/27/2017
> > > > 04:44:52
> > > >  > PM:
> > > >  >
> > > >  > > From: "Burakov, Anatoly" 
> > > >  > > To: Jonas Pfefferle1 
> > > >  > > Cc: bruce.richard...@intel.com, chao...@linux.vnet.ibm.com,
> > > > dev@dpdk.org
> > > >  > > Date: 10/27/2017 04:45 PM
> > > >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > > >  > >
> > > >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > > >  > > > "Burakov, Anatoly"  wrote
on10/27/2017
> > > >  > > > 04:06:44 PM:
> > > >  > > >
> > > >  > > > Â > From: "Burakov, Anatoly" 
> > > >  > > > Â > To: Jonas Pfefferle1 , dev@dpdk.org
> > > >  > > > Â > Cc: chao...@linux.vnet.ibm.com,
bruce.richard...@intel.com
> > > >  > > > Â > Date: 10/27/2017 04:06 PM
> > > >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process
linux
> > > >  > > > Â >
> > > >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > > >  > > > Â > >
> > > >  > > > Â > >
> > > >  > > > Â > > Hi @all,
> > > >  > > > Â > >
> > > >  > > > Â > > I'm trying to make sense of the hugepage memory
mappings in
> > > >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> > > >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we
> try to do a
> > > >  > private
> > > >  > > > Â > > mapping on /dev/zero (line 1393) why do we not
> use MAP_FIXED
> > > > if we
> > > >  >
> > > >  > > > need the
> > > >  > > > Â > > addresses to be identical with the primary process?
> > > >  > > > Â > > * On POWER we have this weird business going on
> where we use
> > > >  > > > MAP_HUGETLB
> > > >  > > > Â > > because according to this commit:
> > > >  > > > Â > >
> > > >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > > >  > > > Â > > Author: Chao Zhu 
> > > >  > > > Â > &g