[dpdk-dev] could not l2fwd in DOM0

2014-05-07 Thread Liu, Jijiang
Hi,

I have checked source codes of xen_create_contiguous_region function in kernel 
3.14, and found the dma_handle cannot be NULL.

int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
 unsigned int address_bits,
 dma_addr_t *dma_handle)
{
unsigned long *in_frames = discontig_frames, out_frame;
unsigned long  flags;
intsuccess;
unsigned long vstart = (unsigned long)phys_to_virt(pstart);
...
*dma_handle = virt_to_machine(vstart).maddr;
return success ? 0 : -ENOMEM;
}

Thanks
Frank Liu

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Samuel Monderer
Sent: Wednesday, April 30, 2014 1:54 AM
To: dev at dpdk.org
Cc: Shimon Zadok
Subject: [dpdk-dev] could not l2fwd in DOM0

Hi,

First I've encountered a compiling problem when compiling for DOM0 due to 
prototype change of the function xen_create_contiguous_region I made the 
following changes:

diff --git a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c 
b/lib/librte_eal/linuxapp/xen index 87fa3e6..8addc21 100644
--- a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
+++ b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+//#include 

 #include 
 #include 
@@ -309,6 +310,7 @@ dom0_prepare_memsegs(struct memory_info* meminfo, struct 
dom0_mm_data
uint64_t pfn, vstart, vaddr;
uint32_t i, num_block, size;
int idx;
+   dma_addr_t *dma_handle = NULL;

/* Allocate 2M memory once */
num_block = meminfo->size / 2;
@@ -344,7 +346,7 @@ dom0_prepare_memsegs(struct memory_info* meminfo, struct 
dom0_mm_data
 * contiguous physical addresses, its maximum size is 2M.
 */
if 
(xen_create_contiguous_region(mm_data->block_info[i].vir_addr,
-   DOM0_CONTIG_NUM_ORDER, 0) == 0) {
+   DOM0_CONTIG_NUM_ORDER, 0,
+ dma_handle) == 0) {
mm_data->block_info[i].exchange_flag = 1;
mm_data->block_info[i].mfn =
pfn_to_mfn(mm_data->block_info[i].pfn);

After that I tried to run l2fwd example and got a segmentation fault

root at Smart:~/samuelm/dpdk/dpdk# modprobe uio root at 
Smart:~/samuelm/dpdk/dpdk# insmod ./x86_64-default-linuxapp-gcc/kmod/igb_uio.ko
root at Smart:~/samuelm/dpdk/dpdk# insmod 
./x86_64-default-linuxapp-gcc/kmod/rte_dom0_mm.ko
root at Smart:~/samuelm/dpdk/dpdk# cd examples/l2fwd/build/ root at 
Smart:~/samuelm/dpdk/dpdk/examples/l2fwd/build# echo 2048 > 
/sys/kernel/mm/dom0-mm/memsize-mB/memsize
root at Smart:~/samuelm/dpdk/dpdk/examples/l2fwd/build# ./l2fwd -c 3 -n 4 
--xen-dom0 -- -q 1 -p 3
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Setting up memory...
Segmentation fault
root at Smart:~/samuelm/dpdk/dpdk/examples/l2fwd/build#

Has anyone already encountered this problem?

Samuelm



[dpdk-dev] RTE Ring removing

2014-05-07 Thread Igor Ryzhov
Hello again.

I did some investigation on the code.
I learned that RTE Ring creation function uses functions related to RTE 
Memzone to reserve memory (rte_memzone_reserve).
Documentation states that once reserved memzone can not be unreserved. I 
decided to find out why it is so.

I noticed that in Memzone realization there is a special global variable 
"free_memseg" containing pointers on free memory segments.
An memzone reserve function just finst the best segment for allocation 
from this "free_memseg" variable.

So I think there is a possibility to unreserve already reserved memory 
back to "free_memseg", and impossibility of unreserving memory is just 
because there is no function for that, not because it is impossible in 
principle.
Am I right? Or there are any restrictions?

Best regards,
Igor Ryzhov

06.05.2014 13:05, Igor Ryzhov ?:
> Hello.
>
> For what reason RTE Rings can not be removed once created?
> In my application I want to use many rings with different names so I 
> think there may be a problem with memory because of many ring that 
> already not in use, but allocated.
> Or DPDK has a mechanism of reusing memory if rings are not in use?
>
> Best regards,
> Igor Ryzhov



[dpdk-dev] [PATCH 0/2] ring: allow to init a rte_ring outside of an rte_memzone

2014-05-07 Thread Olivier Matz
These 2 patches adds 2 new functions that permits to initialize and use
a rte_ring anywhere in memory.

Before this patches, only rte_ring_create() was available. This function
allocates a rte_memzone (that cannot be freed) and initializes a ring
inside.

This series allows to do the following:
  size = rte_ring_get_memsize(1024);
  r = malloc(size);
  rte_ring_init(r, "my_ring", 1024, 0);


Olivier Matz (2):
  ring: introduce rte_ring_get_memsize()
  ring: introduce rte_ring_init()

 lib/librte_ring/rte_ring.c | 88 +-
 lib/librte_ring/rte_ring.h | 67 ---
 2 files changed, 118 insertions(+), 37 deletions(-)

-- 
1.9.2



[dpdk-dev] [PATCH 1/2] ring: introduce rte_ring_get_memsize()

2014-05-07 Thread Olivier Matz
Add a function that returns the amount of memory occupied by a rte_ring
structure and its object table. This commit prepares the next one that
will allow to allocate a ring dynamically.

Signed-off-by: Olivier Matz 
---
 lib/librte_ring/rte_ring.c | 29 ++---
 lib/librte_ring/rte_ring.h | 16 
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 0d43a55..4aa500f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -94,6 +94,24 @@ TAILQ_HEAD(rte_ring_list, rte_ring);
 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)

+/* return the size of memory occupied by a ring */
+ssize_t rte_ring_get_memsize(unsigned count)
+{
+   ssize_t sz;
+
+   /* count must be a power of 2 */
+   if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
+   RTE_LOG(ERR, RING,
+   "Requested size is invalid, must be power of 2, and "
+   "do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+   return -EINVAL;
+   }
+
+   sz = sizeof(struct rte_ring) + count * sizeof(void *);
+   sz = (sz + CACHE_LINE_MASK) & (~CACHE_LINE_MASK);
+   return sz;
+}
+
 /* create the ring */
 struct rte_ring *
 rte_ring_create(const char *name, unsigned count, int socket_id,
@@ -102,7 +120,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
const struct rte_memzone *mz;
-   size_t ring_size;
+   ssize_t ring_size;
int mz_flags = 0;
struct rte_ring_list* ring_list = NULL;

@@ -129,16 +147,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

-   /* count must be a power of 2 */
-   if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
-   rte_errno = EINVAL;
-   RTE_LOG(ERR, RING, "Requested size is invalid, must be power of 
2, and "
-   "do not exceed the size limit %u\n", 
RTE_RING_SZ_MASK);
+   ring_size = rte_ring_get_memsize(count);
+   if (ring_size < 0) {
+   rte_errno = ring_size;
return NULL;
}

rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);
-   ring_size = count * sizeof(void *) + sizeof(struct rte_ring);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 775ea79..e8493f2 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -199,6 +199,22 @@ struct rte_ring {
 #endif

 /**
+ * Calculate the memory size needed for a ring
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_ring and the size of the memory needed by the
+ * objects pointers. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_ring_get_memsize(unsigned count);
+
+/**
  * Create a new ring named *name* in memory.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Its size is
-- 
1.9.2



[dpdk-dev] [PATCH 2/2] ring: introduce rte_ring_init()

2014-05-07 Thread Olivier Matz
Allow to initialize a ring in an already allocated memory. The rte_ring_create()
function that allocates a ring in a rte_memzone is still available and now uses
the new rte_ring_init() function in order to factorize the code.

Signed-off-by: Olivier Matz 
---
 lib/librte_ring/rte_ring.c | 63 ++
 lib/librte_ring/rte_ring.h | 51 +
 2 files changed, 82 insertions(+), 32 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4aa500f..a65f33e 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -112,18 +112,10 @@ ssize_t rte_ring_get_memsize(unsigned count)
return sz;
 }

-/* create the ring */
-struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-   unsigned flags)
+int
+rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+   unsigned flags)
 {
-   char mz_name[RTE_MEMZONE_NAMESIZE];
-   struct rte_ring *r;
-   const struct rte_memzone *mz;
-   ssize_t ring_size;
-   int mz_flags = 0;
-   struct rte_ring_list* ring_list = NULL;
-
/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
  CACHE_LINE_MASK) != 0);
@@ -140,11 +132,38 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
  CACHE_LINE_MASK) != 0);
 #endif

+   /* init the ring structure */
+   memset(r, 0, sizeof(*r));
+   rte_snprintf(r->name, sizeof(r->name), "%s", name);
+   r->flags = flags;
+   r->prod.watermark = count;
+   r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
+   r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
+   r->prod.size = r->cons.size = count;
+   r->prod.mask = r->cons.mask = count-1;
+   r->prod.head = r->cons.head = 0;
+   r->prod.tail = r->cons.tail = 0;
+
+   return 0;
+}
+
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+   unsigned flags)
+{
+   char mz_name[RTE_MEMZONE_NAMESIZE];
+   struct rte_ring *r;
+   const struct rte_memzone *mz;
+   ssize_t ring_size;
+   int mz_flags = 0;
+   struct rte_ring_list* ring_list = NULL;
+
/* check that we have an initialised tail queue */
-   if ((ring_list = 
+   if ((ring_list =
 RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_RING, rte_ring_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
-   return NULL;
+   return NULL;
}

ring_size = rte_ring_get_memsize(count);
@@ -163,26 +182,16 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
if (mz != NULL) {
r = mz->addr;
-
-   /* init the ring structure */
-   memset(r, 0, sizeof(*r));
-   rte_snprintf(r->name, sizeof(r->name), "%s", name);
-   r->flags = flags;
-   r->prod.watermark = count;
-   r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
-   r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-   r->prod.size = r->cons.size = count;
-   r->prod.mask = r->cons.mask = count-1;
-   r->prod.head = r->cons.head = 0;
-   r->prod.tail = r->cons.tail = 0;
-
+   /* no need to check return value here, we already checked the
+* arguments above */
+   rte_ring_init(r, name, count, flags);
TAILQ_INSERT_TAIL(ring_list, r, next);
} else {
r = NULL;
RTE_LOG(ERR, RING, "Cannot reserve memory\n");
}
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-   
+
return r;
 }

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e8493f2..c62e7d7 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -215,13 +215,54 @@ struct rte_ring {
 ssize_t rte_ring_get_memsize(unsigned count);

 /**
+ * Initialize a ring structure.
+ *
+ * Initialize a ring structure in memory pointed by "r". The size of the
+ * memory area must be large enough to store the ring structure and the
+ * object table. It is advised to use rte_ring_get_memsize() to get the
+ * appropriate size.
+ *
+ * The ring size is set to *count*, which must be a power of two. Water
+ * marking is disabled by default. The real usable ring size is
+ * *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is not added in RTE_TAILQ_RING global list. Indeed, the
+ * memory given by the caller may not be shareable among dpdk
+ * processes.
+ *
+ * @param r
+ *   The pointer to the ring structure followed by the objects table.
+ * @param name
+ *   The size of the ring.
+ * @param count
+ *   The number of elements in the rin

[dpdk-dev] RTE Ring removing

2014-05-07 Thread Olivier MATZ
Hi Igor,

On 05/07/2014 09:54 AM, Igor Ryzhov wrote:
> I noticed that in Memzone realization there is a special global variable
> "free_memseg" containing pointers on free memory segments.
> An memzone reserve function just finst the best segment for allocation
> from this "free_memseg" variable.
>
> So I think there is a possibility to unreserve already reserved memory
> back to "free_memseg", and impossibility of unreserving memory is just
> because there is no function for that, not because it is impossible in
> principle.
> Am I right? Or there are any restrictions?

I think that implementing a freeing of memory segment is feasible, but
it would require some work to properly merge freed zones to avoid memory
fragmentation.

Another solution is to allocate/free rings in standard memory (malloc
for instance) instead of rte_memzones. Let me know if the patches I've
just sent on the mailing list solves your issue.

By the way, I plan to do the same thing for mempools in the coming
weeks but there is much more work.

Regards,
Olivier



[dpdk-dev] [PATCH 1/2] ring: introduce rte_ring_get_memsize()

2014-05-07 Thread Ananyev, Konstantin
Hi Oliver,

2 nits from me:

1. ssize_t rte_ring_get_memsize(unsigned count)

Can you use usual syntax for functions definitions in DPDK:

ssize_t
rte_ring_get_memsize(unsigned count)

2. sz = (sz + CACHE_LINE_MASK) & (~CACHE_LINE_MASK);

Use RTE_ALIGN(sz, CACHE_LINE_SIZE) instead?

Konstantin

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz
Sent: Wednesday, May 07, 2014 12:39 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH 1/2] ring: introduce rte_ring_get_memsize()

Add a function that returns the amount of memory occupied by a rte_ring
structure and its object table. This commit prepares the next one that
will allow to allocate a ring dynamically.

Signed-off-by: Olivier Matz 
---
 lib/librte_ring/rte_ring.c | 29 ++---
 lib/librte_ring/rte_ring.h | 16 
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 0d43a55..4aa500f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -94,6 +94,24 @@ TAILQ_HEAD(rte_ring_list, rte_ring);
 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)

+/* return the size of memory occupied by a ring */
+ssize_t rte_ring_get_memsize(unsigned count)
+{
+   ssize_t sz;
+
+   /* count must be a power of 2 */
+   if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
+   RTE_LOG(ERR, RING,
+   "Requested size is invalid, must be power of 2, and "
+   "do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+   return -EINVAL;
+   }
+
+   sz = sizeof(struct rte_ring) + count * sizeof(void *);
+   sz = (sz + CACHE_LINE_MASK) & (~CACHE_LINE_MASK);
+   return sz;
+}
+
 /* create the ring */
 struct rte_ring *
 rte_ring_create(const char *name, unsigned count, int socket_id,
@@ -102,7 +120,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
const struct rte_memzone *mz;
-   size_t ring_size;
+   ssize_t ring_size;
int mz_flags = 0;
struct rte_ring_list* ring_list = NULL;

@@ -129,16 +147,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

-   /* count must be a power of 2 */
-   if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
-   rte_errno = EINVAL;
-   RTE_LOG(ERR, RING, "Requested size is invalid, must be power of 
2, and "
-   "do not exceed the size limit %u\n", 
RTE_RING_SZ_MASK);
+   ring_size = rte_ring_get_memsize(count);
+   if (ring_size < 0) {
+   rte_errno = ring_size;
return NULL;
}

rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);
-   ring_size = count * sizeof(void *) + sizeof(struct rte_ring);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 775ea79..e8493f2 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -199,6 +199,22 @@ struct rte_ring {
 #endif

 /**
+ * Calculate the memory size needed for a ring
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_ring and the size of the memory needed by the
+ * objects pointers. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_ring_get_memsize(unsigned count);
+
+/**
  * Create a new ring named *name* in memory.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Its size is
-- 
1.9.2



[dpdk-dev] RTE Ring removing

2014-05-07 Thread Igor Ryzhov
It seems to be a good idea, thank you, Olivier!

But a few questions:
1. Will this changes affect performance?
2. In PATCH 2/2 you have a small bug:

In file rte_ring.h, in comments describing rte_ring_init function you have:

+ * @param name
+ *   The size of the ring.

But it is name of the ring, not size.

Best regards,
Igor Ryzhov

07.05.2014 15:39, Olivier MATZ ?:
> Hi Igor,
>
> On 05/07/2014 09:54 AM, Igor Ryzhov wrote:
>> I noticed that in Memzone realization there is a special global variable
>> "free_memseg" containing pointers on free memory segments.
>> An memzone reserve function just finst the best segment for allocation
>> from this "free_memseg" variable.
>>
>> So I think there is a possibility to unreserve already reserved memory
>> back to "free_memseg", and impossibility of unreserving memory is just
>> because there is no function for that, not because it is impossible in
>> principle.
>> Am I right? Or there are any restrictions?
>
> I think that implementing a freeing of memory segment is feasible, but
> it would require some work to properly merge freed zones to avoid memory
> fragmentation.
>
> Another solution is to allocate/free rings in standard memory (malloc
> for instance) instead of rte_memzones. Let me know if the patches I've
> just sent on the mailing list solves your issue.
>
> By the way, I plan to do the same thing for mempools in the coming
> weeks but there is much more work.
>
> Regards,
> Olivier
>



[dpdk-dev] could not l2fwd in DOM0

2014-05-07 Thread Samuel Monderer
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liu, Jijiang
> Sent: Wednesday, May 07, 2014 7:36 AM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] could not l2fwd in DOM0
> 
> Hi,
> 
> I have checked source codes of xen_create_contiguous_region function in
> kernel 3.14, and found the dma_handle cannot be NULL.
> 
> int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
>  unsigned int address_bits,
>  dma_addr_t *dma_handle) {
> unsigned long *in_frames = discontig_frames, out_frame;
> unsigned long  flags;
> intsuccess;
> unsigned long vstart = (unsigned long)phys_to_virt(pstart);
> ...
> *dma_handle = virt_to_machine(vstart).maddr;
> return success ? 0 : -ENOMEM;
> }
> 
> Thanks
> Frank Liu
> 

Thanks Frank,

I've changed the code as following but now the kernel module crashes.

diff --git a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c?
b/lib/librte_eal/linuxapp/xen index 87fa3e6..8addc21 100644
--- a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
+++ b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
@@ -64,6 +64,7 @@
?#include 
?#include 
?#include 

?#include 
?#include 
@@ -309,6 +310,7 @@ dom0_prepare_memsegs(struct memory_info* meminfo, struct?
dom0_mm_data
uint64_t pfn, vstart, vaddr;
uint32_t i, num_block, size;
int idx;
+???dma_addr_t dma_handle;

/* Allocate 2M memory once */
num_block = meminfo->size / 2;
@@ -344,7 +346,7 @@ dom0_prepare_memsegs(struct memory_info* meminfo, struct?
dom0_mm_data
?* contiguous physical addresses, its maximum size is 2M.
?*/
if 
(xen_create_contiguous_region(mm_data->block_info[i].vir_addr,
-???DOM0_CONTIG_NUM_ORDER, 0) == 0) {
+???DOM0_CONTIG_NUM_ORDER, 0,
+ &dma_handle) == 0) {
mm_data->block_info[i].exchange_flag = 1;
mm_data->block_info[i].mfn =
pfn_to_mfn(mm_data->block_info[i].pfn);

Samuel

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Samuel Monderer
> Sent: Wednesday, April 30, 2014 1:54 AM
> To: dev at dpdk.org
> Cc: Shimon Zadok
> Subject: [dpdk-dev] could not l2fwd in DOM0
> 
> Hi,
> 
> First I've encountered a compiling problem when compiling for DOM0 due to
> prototype change of the function xen_create_contiguous_region I made the
> following changes:
> 
> diff --git a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
> b/lib/librte_eal/linuxapp/xen index 87fa3e6..8addc21 100644
> --- a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
> +++ b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
> @@ -64,6 +64,7 @@
>  #include 
>  #include 
>  #include 
> +//#include 
> 
>  #include 
>  #include 
> @@ -309,6 +310,7 @@ dom0_prepare_memsegs(struct memory_info*
> meminfo, struct dom0_mm_data
> uint64_t pfn, vstart, vaddr;
> uint32_t i, num_block, size;
> int idx;
> +   dma_addr_t *dma_handle = NULL;
> 
> /* Allocate 2M memory once */
> num_block = meminfo->size / 2;
> @@ -344,7 +346,7 @@ dom0_prepare_memsegs(struct memory_info*
> meminfo, struct dom0_mm_data
>  * contiguous physical addresses, its maximum size is 2M.
>  */
> if 
> (xen_create_contiguous_region(mm_data->block_info[i].vir_addr,
> -   DOM0_CONTIG_NUM_ORDER, 0) == 0) {
> +   DOM0_CONTIG_NUM_ORDER, 0,
> + dma_handle) == 0) {
> mm_data->block_info[i].exchange_flag = 1;
> mm_data->block_info[i].mfn =
> pfn_to_mfn(mm_data->block_info[i].pfn);
> 
> After that I tried to run l2fwd example and got a segmentation fault
> 
> root at Smart:~/samuelm/dpdk/dpdk# modprobe uio
> root at Smart:~/samuelm/dpdk/dpdk# insmod ./x86_64-default-linuxapp-
> gcc/kmod/igb_uio.ko
> root at Smart:~/samuelm/dpdk/dpdk# insmod ./x86_64-default-linuxapp-
> gcc/kmod/rte_dom0_mm.ko
> root at Smart:~/samuelm/dpdk/dpdk# cd examples/l2fwd/build/
> root at Smart:~/samuelm/dpdk/dpdk/examples/l2fwd/build# echo 2048 >
> /sys/kernel/mm/dom0-mm/memsize-mB/memsize
> root at Smart:~/samuelm/dpdk/dpdk/examples/l2fwd/build# ./l2fwd -c 3 -n 4
> --xen-dom0 -- -q 1 -p 3
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 0 on socket 0
> EAL: Setting up memory...
> Segmentation fault
> root at Smart:~/samuelm/dpdk/dpdk/examples/l2fwd/build#
> 
> Has anyone already encountered this problem?
> 
> Samuelm



[dpdk-dev] [PATCH 1/2] ring: introduce rte_ring_get_memsize()

2014-05-07 Thread Olivier MATZ
Hi Konstantin,

On 05/07/2014 02:35 PM, Ananyev, Konstantin wrote:
> 1. ssize_t rte_ring_get_memsize(unsigned count)
>
> Can you use usual syntax for functions definitions in DPDK:
>
> ssize_t
> rte_ring_get_memsize(unsigned count)
>
> 2. sz = (sz + CACHE_LINE_MASK) & (~CACHE_LINE_MASK);
>
> Use RTE_ALIGN(sz, CACHE_LINE_SIZE) instead?

Thank you for reviewing. I'll include these 2 changes in
a patch-v2.

Regards,
Olivier



[dpdk-dev] [PATCH] eal: parse args before any kinds of init

2014-05-07 Thread Thomas Monjalon
2014-05-05 17:50, Thomas Monjalon:
> 2014-04-15 11:03, Wang Sheng-Hui:
> > Parse args first, to resolve any invalid args and give out the usage
> > string. E.g './helloworld --invalid', the '--invalid' will be checked
> > before any init. After the options are checked, take any init actions.
> > 
> > Signed-off-by: Wang Sheng-Hui 
> 
> [...]
> > @@ -964,16 +969,16 @@ rte_eal_init(int argc, char **argv)
> > 
> > thread_id = pthread_self();
> > 
> > +   fctret = eal_parse_args(argc, argv);
> > +   if (fctret < 0)
> > +   exit(1);
> > +
> > 
> > if (rte_eal_log_early_init() < 0)
> > 
> > rte_panic("Cannot init early logs\n");
> > 
> > if (rte_eal_cpu_init() < 0)
> > 
> > rte_panic("Cannot detect lcores\n");
> > 
> > -   fctret = eal_parse_args(argc, argv);
> > -   if (fctret < 0)
> > -   exit(1);
> > -
> 
> You should move eal_parse_args() just after rte_eal_log_early_init() in
> order to have logs available.

When double checking, I saw this commit which justify why rte_eal_cpu_init() 
is before eal_parse_args():
http://dpdk.org/browse/dpdk/commit/?id=f563a3727b5dba

If the goal is to move debug lines in cpu_init, you should split 
rte_eal_log_early_init() in 2 functions: 1 to detect cores and 1 for debug 
summary.
By the way, these are debug logs which should be disabled by default.

-- 
Thomas


[dpdk-dev] RTE Ring removing

2014-05-07 Thread Olivier MATZ
Hi Igor,

On 05/07/2014 02:42 PM, Igor Ryzhov wrote:
> But a few questions:
> 1. Will this changes affect performance?

It should not. If you have many rings, you may allocate them
in huge pages to avoid TLB misses.

> 2. In PATCH 2/2 you have a small bug:
>
> In file rte_ring.h, in comments describing rte_ring_init function you have:
>
> + * @param name
> + *   The size of the ring.
>
> But it is name of the ring, not size.

Thank you for this comment, I'll fix it in the v2.

Regards,
Olivier



[dpdk-dev] [PATCH] fix for jumbo frame issue with DPDK VF

2014-05-07 Thread Ivan Boule
On 05/06/2014 04:31 PM, Konstantin Ananyev wrote:
> When latest Linux ixgbe PF is used, and DPDK VF is used in DPDK application,
> jumbo frames are not received.
> Also - if Linux ixgbe PF has MTU set to 1500 (default),
> then normal sized packets can be received by DPDK VF.
> However, if Linux PF has MTU > 1500, then DPDK VF receives no packets
> (normal or jumbo).
> With ixgbe_mbox_api_10 ixgbe simply didn't allow set VF MTU > 1514 for 82599.
> With ixgbe_mbox_ajpi_11 it does, though now, if PF uses jumbo frames,
> it simply disables RX for all VFs.
> So to work with PF ithat using jumbo frames, at startup each VF has to:
> 1. negotiate with PF mbox_api_11.
> 2. Send to PF SET_LPE message with desired MTU.
> Note, that if PF already uses MTU bigger then asked by the VF,
> then PF wouldn't take any action.
>
> Signed-off-by: Konstantin Ananyev 
> ---
>   lib/librte_pmd_e1000/igb_rxtx.c |5 +++
>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   47 
> --
>   lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |4 +++
>   3 files changed, 42 insertions(+), 14 deletions(-)
>
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index 4608595..6b454a5 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -2077,6 +2077,11 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
>   
>   hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>   
> + /* setup MTU */
> + e1000_rlpml_set_vf(hw,
> + (uint16_t)(dev->data->dev_conf.rxmode.max_rx_pkt_len +
> + VLAN_TAG_SIZE));
> +
>   /* Configure and enable each RX queue. */
>   rctl_bsize = 0;
>   dev->rx_pkt_burst = eth_igb_recv_pkts;
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 89ab4aa..94dc3ec 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -808,19 +808,30 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct 
> eth_driver *eth_drv,
>   return 0;
>   }
>   
> -static void ixgbevf_get_queue_num(struct ixgbe_hw *hw)
> +
> +/*
> + * Negotiate mailbox API version with the PF.
> + * After reset API version is always set to the basic one 
> (ixgbe_mbox_api_10).
> + * Then we try to negotiate starting with the most recent one.
> + * If all negotiation attempts fail, then we will proceed with
> + * the default one (ixgbe_mbox_api_10).
> + */
> +static void
> +ixgbevf_negotiate_api(struct ixgbe_hw *hw)
>   {
> - /* Traffic classes are not supported by now */
> - unsigned int tcs, tc;
> + int32_t i;
>   
> - /*
> -  * Must let PF know we are at mailbox API version 1.1.
> -  * Otherwise PF won't answer properly.
> -  * In case that PF fails to provide Rx/Tx queue number,
> -  * max_tx_queues and max_rx_queues remain to be 1.
> -  */
> - if (!ixgbevf_negotiate_api_version(hw, ixgbe_mbox_api_11))
> - ixgbevf_get_queues(hw, &tcs, &tc);
> + /* start with highest supported, proceed down */
> + static const enum ixgbe_pfvf_api_rev sup_ver[] = {
> + ixgbe_mbox_api_11,
> + ixgbe_mbox_api_10,
> + };
> +
> + for (i = 0;
> + i != RTE_DIM(sup_ver) &&
> + ixgbevf_negotiate_api_version(hw, sup_ver[i]) != 0;
> + i++)
> + ;
>   }
>   
>   /*
> @@ -830,9 +841,11 @@ static int
>   eth_ixgbevf_dev_init(__attribute__((unused)) struct eth_driver *eth_drv,
>struct rte_eth_dev *eth_dev)
>   {
> - struct rte_pci_device *pci_dev;
> - struct ixgbe_hw *hw = 
> IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
>   int diag;
> + uint32_t tc, tcs;
> + struct rte_pci_device *pci_dev;
> + struct ixgbe_hw *hw =
> + IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
>   struct ixgbe_vfta * shadow_vfta =
>   IXGBE_DEV_PRIVATE_TO_VFTA(eth_dev->data->dev_private);
>   struct ixgbe_hwstrip *hwstrip =
> @@ -891,8 +904,11 @@ eth_ixgbevf_dev_init(__attribute__((unused)) struct 
> eth_driver *eth_drv,
>   return (diag);
>   }
>   
> + /* negotiate mailbox API version to use with the PF. */
> + ixgbevf_negotiate_api(hw);
> +
>   /* Get Rx/Tx queue count via mailbox, which is ready after reset_hw */
> - ixgbevf_get_queue_num(hw);
> + ixgbevf_get_queues(hw, &tcs, &tc);
>   
>   /* Allocate memory for storing MAC addresses */
>   eth_dev->data->mac_addrs = rte_zmalloc("ixgbevf", ETHER_ADDR_LEN *
> @@ -2518,6 +2534,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
>   
>   hw->mac.ops.reset_hw(hw);
>   
> + /* negotiate mailbox API version to use with the PF. */
> + ixgbevf_negotiate_api(hw);
> +
>   ixgbevf_dev_tx_init(dev);
>   
>   /* This can fail when allocating mbufs for descriptor rings */
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 

[dpdk-dev] [PATCH] Use proper mac type for 82576 VF e1000_vfadapt type corresponds to 82576 VF devices, check e1000_set_mac_type() for more details.

2014-05-07 Thread Ivan Boule
On 05/06/2014 04:33 PM, Konstantin Ananyev wrote:
> Signed-off-by: Konstantin Ananyev 
> ---
>   lib/librte_pmd_e1000/igb_rxtx.c |2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index 6b454a5..7fe1780 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -2154,7 +2154,7 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
>   rxdctl &= 0xFFF0;
>   rxdctl |= (rxq->pthresh & 0x1F);
>   rxdctl |= ((rxq->hthresh & 0x1F) << 8);
> - if (hw->mac.type == e1000_82576) {
> + if (hw->mac.type == e1000_vfadapt) {
>   /*
>* Workaround of 82576 VF Erratum
>* force set WTHRESH to 1

Acked.

Thanks.

-- 
Ivan Boule
6WIND Development Engineer



[dpdk-dev] RTE Ring removing

2014-05-07 Thread Venkatesan, Venky
Olivier, 

We should look at how to make the memseg capable of doing alloc/free (including 
re-assembly of fragments) after the 1.7 release. Is that something you are 
considering doing (or are there any other DPDKers considering this), or should 
I look at putting together a patch for that? 

Regards, 
-Venky

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier MATZ
Sent: Wednesday, May 07, 2014 4:39 AM
To: Igor Ryzhov; dev at dpdk.org
Subject: Re: [dpdk-dev] RTE Ring removing

Hi Igor,

On 05/07/2014 09:54 AM, Igor Ryzhov wrote:
> I noticed that in Memzone realization there is a special global 
> variable "free_memseg" containing pointers on free memory segments.
> An memzone reserve function just finst the best segment for allocation 
> from this "free_memseg" variable.
>
> So I think there is a possibility to unreserve already reserved memory 
> back to "free_memseg", and impossibility of unreserving memory is just 
> because there is no function for that, not because it is impossible in 
> principle.
> Am I right? Or there are any restrictions?

I think that implementing a freeing of memory segment is feasible, but it would 
require some work to properly merge freed zones to avoid memory fragmentation.

Another solution is to allocate/free rings in standard memory (malloc for 
instance) instead of rte_memzones. Let me know if the patches I've just sent on 
the mailing list solves your issue.

By the way, I plan to do the same thing for mempools in the coming weeks but 
there is much more work.

Regards,
Olivier



[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Tomasz K
Hi All

We're trying to run testpmd application on HP Proliant DL380P Gen 8 server.
We've enabled SR-IOV in BIOS and set appropriate flags when booting kernel
(iommu=pt intel_iommu=on)
The NIC we are using is 82599EB (2 ports, 10Gb SFP+)

When running testpmd application we always encouter DMAR error in dmesg

[  186.302866] dmar: DRHD: handling fault status reg 2
[  186.302872] dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr
1f7322
[  186.302872] DMAR:[fault reason 06] PTE Read access is not set
[  186.302875] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr
1f7320
[  186.302875] DMAR:[fault reason 06] PTE Read access is not set
[  324.759520] dmar: DRHD: handling fault status reg 202
[  324.759525] dmar: DMAR:[DMA Read] Request device [07:00.1] fault addr
1f7322
[  324.759525] DMAR:[fault reason 06] PTE Read access is not set
[  324.759528] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr
1f7320
[  324.759528] DMAR:[fault reason 06] PTE Read access is not set

Has anyone encountered this issue?
Tried to search through gmane and google and the only solution was to
disable SR-IOV which we cannot do.

Thanks in advance
Tomasz

Host:
HP Proliant DL380p Gen 8 Server
Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz

NIC:
82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)


[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Burakov, Anatoly
Hi Tomasz

It looks like you have your kernel booted with iommu=on. Please check your 
/proc/cmdline to make sure you didn't accidentally selected the wrong 
bootloader entry.

> We're trying to run testpmd application on HP Proliant DL380P Gen 8 server.
> We've enabled SR-IOV in BIOS and set appropriate flags when booting kernel
> (iommu=pt intel_iommu=on) The NIC we are using is 82599EB (2 ports, 10Gb
> SFP+)
> 
> When running testpmd application we always encouter DMAR error in dmesg
> 
> [  186.302866] dmar: DRHD: handling fault status reg 2 [  186.302872] dmar:
> DMAR:[DMA Read] Request device [07:00.1] fault addr
> 1f7322
> [  186.302872] DMAR:[fault reason 06] PTE Read access is not set [
> 186.302875] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr
> 1f7320
> [  186.302875] DMAR:[fault reason 06] PTE Read access is not set [
> 324.759520] dmar: DRHD: handling fault status reg 202 [  324.759525] dmar:
> DMAR:[DMA Read] Request device [07:00.1] fault addr
> 1f7322
> [  324.759525] DMAR:[fault reason 06] PTE Read access is not set [
> 324.759528] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr
> 1f7320
> [  324.759528] DMAR:[fault reason 06] PTE Read access is not set

Best regards,
Anatoly Burakov
DPDK SW Engineer





[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Tomasz K
Hi Anatoly.

Yes I do have iommu=on set when booting kernel. This is one of
prerequisites to have SR-IOV running.
Below is the cmdline output

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.11.0-20-generic
root=UUID=c70fa456-ee10-43e5-9f07-4dbb372dcee3 ro quiet splash iommu=pt
intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 vt.handoff=7

Thanks
Tomasz


2014-05-07 16:07 GMT+02:00 Burakov, Anatoly :

> Hi Tomasz
>
> It looks like you have your kernel booted with iommu=on. Please check your
> /proc/cmdline to make sure you didn't accidentally selected the wrong
> bootloader entry.
>
> > We're trying to run testpmd application on HP Proliant DL380P Gen 8
> server.
> > We've enabled SR-IOV in BIOS and set appropriate flags when booting
> kernel
> > (iommu=pt intel_iommu=on) The NIC we are using is 82599EB (2 ports, 10Gb
> > SFP+)
> >
> > When running testpmd application we always encouter DMAR error in dmesg
> >
> > [ 186.302866] dmar: DRHD: handling fault status reg 2 [  186.302872]
> dmar:
> > DMAR:[DMA Read] Request device [07:00.1] fault addr
> > 1f7322
> > [  186.302872] DMAR:[fault reason 06] PTE Read access is not set [
> > 186.302875] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr
> > 1f7320
> > [  186.302875] DMAR:[fault reason 06] PTE Read access is not set [
> > 324.759520] dmar: DRHD: handling fault status reg 202 [  324.759525]
> dmar:
> > DMAR:[DMA Read] Request device [07:00.1] fault addr
> > 1f7322
> > [  324.759525] DMAR:[fault reason 06] PTE Read access is not set [
> > 324.759528] dmar: DMAR:[DMA Read] Request device [07:00.0] fault addr
> > 1f7320
> > [  324.759528] DMAR:[fault reason 06] PTE Read access is not set
>
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>
>
>
>


[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Burakov, Anatoly
Hi Tomasz,

Your words:

> We've enabled SR-IOV in BIOS and set appropriate flags when booting kernel
> (iommu=pt intel_iommu=on)

Your other words:

> Yes I do have iommu=on set when booting kernel.

Here lies your mistake :-) Boot your kernel with iommu to "pt" (iommu=pt 
intel_iommu=on) and everything will work. Thje "pt" option enables IOMMU only 
for VM's while "on" sets up your whole system to work through IOMMU (including 
host devices). However, both of these options enable SR-IOV. 

Best regards,
Anatoly Burakov
DPDK SW Engineer





[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Burakov, Anatoly
Hi Tomasz, 

> 
> Here lies your mistake :-) Boot your kernel with iommu to "pt" (iommu=pt
> intel_iommu=on) and everything will work. Thje "pt" option enables IOMMU
> only for VM's while "on" sets up your whole system to work through IOMMU
> (including host devices). However, both of these options enable SR-IOV.
> 

Apologies, I misread your command-line output. It seems like you have 
everything set up correctly. I'm not sure what is wrong here, but the messages 
you're seeing are usually related to incorrect IOMMU boot parameters. Which 
DPDK version are you using?

Best regards,
Anatoly Burakov
DPDK SW Engineer





[dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation

2014-05-07 Thread Venkatesan, Venky
David,

Sorry for the late response. Yes, your suggestion would work. Let?s implement 
it ?

Regards,
-Venky

From: David Marchand [mailto:david.march...@6wind.com]
Sent: Monday, May 05, 2014 2:26 AM
To: Venkatesan, Venky
Cc: Burakov, Anatoly; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory 
allocation

Hello Venky, Anatoly,

On Fri, May 2, 2014 at 11:05 AM, Venkatesan, Venky mailto:venky.venkatesan at intel.com>> wrote:
Agree with Anatoly - I would much rather not change legacy option behaviour 
that has existed for a while, especially when --socket-mem is available to do 
exactly what is needed.

-Venky

-Original Message-
From: dev [mailto:dev-bounces at dpdk.org] On 
Behalf Of Burakov, Anatoly
Sent: Friday, May 02, 2014 1:54 AM
To: Burakov, Anatoly; David Marchand; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory 
allocation

Sorry for spamming, but now that I think of it, I don't believe this change 
makes much sense. If the user wants memory on specific sockets, there's already 
--socket-mem option. If the user doesn't care, there's -m option, which gives 
the user memory from whatever sockets it is available. With this change 
applied, DPDK will fail when run with -m switch under certain circumstances 
(e.g. cores from socket 0 present in the coremask but no memory left on socket 
0), which is quite the opposite of a simple "give me n megs, I don't care where 
it comes from" option -m is providing.

Actually, if we don't care where memory is allocated, we can at least try to 
have the more common setup work properly (i.e. spread memory allocations based 
on used cores).
I can see no usual setup where you want to use cores on a socket while having 
all memory on another socket but still expect performance to be good.

So here is another approach for Didier's patch.
We can try to spread memory on numa sockets, if this fails, then we default to 
previous behavior but leave a trace with a warning log "Could not spread memory 
on numa sockets".

What do you think about this ?


I would also take into account Anatoly's comments (multi line comments + ensure 
we won't try to get more memory than asked by user).

--
David Marchand


[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Tomasz K
Hi Anatoly

I'm using dpdk-1.6.0r2 on Ubuntu 12.04 LTS with kernel 3.11.0-20-generic
#35~precise1-Ubuntu SMP

As for further investigation:
1. It doesn't matter whether DPDK uses 1GB or 2MB hugepages.
2. dmesg messages appear only when I invoke "start tx_first" in testpmd app
(so only when I try to send some packets)

The setup is very easy. 2 NICs connected with each other (port0 on NIC1 to
port0 on NIC2, same for port1). Each NIC iis on different server

Thanks
Tomasz


2014-05-07 16:45 GMT+02:00 Burakov, Anatoly :

> Hi Tomasz,
>
> >
> > Here lies your mistake :-) Boot your kernel with iommu to "pt" (iommu=pt
> > intel_iommu=on) and everything will work. Thje "pt" option enables IOMMU
> > only for VM's while "on" sets up your whole system to work through IOMMU
> > (including host devices). However, both of these options enable SR-IOV.
> >
>
> Apologies, I misread your command-line output. It seems like you have
> everything set up correctly. I'm not sure what is wrong here, but the
> messages you're seeing are usually related to incorrect IOMMU boot
> parameters. Which DPDK version are you using?
>
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>
>
>
>


[dpdk-dev] RTE Ring removing

2014-05-07 Thread Olivier MATZ
Hi Venky,

On 05/07/2014 04:01 PM, Venkatesan, Venky wrote:
> We should look at how to make the memseg capable of doing alloc/free
> (including re-assembly of fragments) after the 1.7 release. Is that
> something you are considering doing (or are there any other DPDKers
> considering this), or should I look at putting together a patch for
> that?

No, that's not something I'm working on today.

On this topic, I have some work in progress in the rte_mempool code.
I'll submit it here as soon as it is ready, I'm not sure it will be
finished before the end of the 1.7.0 integration window.

Regards,
Olivier



[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Burakov, Anatoly
Hi Tomasz,

> 2. dmesg messages appear only when I invoke "start tx_first" in testpmd app 
> (so only when I try to send some packets)

Does receiving packets work? I would assume it doesn't, but just making sure.

Best regards,
Anatoly Burakov
DPDK SW Engineer





[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Tomasz K
Hi Anatoly

You guessed right...it doesn't
"show port stats all" always shows zeros.


Thanks
Tomasz


2014-05-07 17:08 GMT+02:00 Burakov, Anatoly :

> Hi Tomasz,
>
> > 2. dmesg messages appear only when I invoke "start tx_first" in testpmd
> app (so only when I try to send some packets)
>
> Does receiving packets work? I would assume it doesn't, but just making
> sure.
>
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>
>
>
>


[dpdk-dev] RTE Ring removing

2014-05-07 Thread Ananyev, Konstantin
Hi Oliver,
Just to clarify about mempool - I suppose you are talking about ability to 
place internal ring and mempool metadata inside externally allocated memory?
It is already possible to keep mempool elements inside externally allocated 
memory (rte_mempool_xmem_create()). 
Konstantin

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier MATZ
Sent: Wednesday, May 07, 2014 4:10 PM
To: Venkatesan, Venky; Igor Ryzhov; dev at dpdk.org
Subject: Re: [dpdk-dev] RTE Ring removing

Hi Venky,

On 05/07/2014 04:01 PM, Venkatesan, Venky wrote:
> We should look at how to make the memseg capable of doing alloc/free
> (including re-assembly of fragments) after the 1.7 release. Is that
> something you are considering doing (or are there any other DPDKers
> considering this), or should I look at putting together a patch for
> that?

No, that's not something I'm working on today.

On this topic, I have some work in progress in the rte_mempool code.
I'll submit it here as soon as it is ready, I'm not sure it will be
finished before the end of the 1.7.0 integration window.

Regards,
Olivier



[dpdk-dev] DMAR errors when running testpmd

2014-05-07 Thread Burakov, Anatoly
Hi Tomasz,

> You guessed right...it doesn't
> "show port stats all" always shows zeros.

As I said earlier, such errors are usually related to errors in boot 
parameters, but your kernel cmdline looks perfectly fine, so unless there's 
something really odd happening, I can't see this being at fault.

Another (rather unlikely, but I'll still mention it just in case) reason could 
be that you're using igb_uio module that is really old (I don't remember when 
igb_uio gained IOMMU support, probably 1.4.x), which is why I asked about DPDK 
version, but since you're using 1.6.0 you should be fine.

Other than that, I'm afraid I can't think of any reasons why this could be 
happening. Did you try this on another board with the same OS? 

Best regards,
Anatoly Burakov
DPDK SW Engineer





[dpdk-dev] RTE Ring removing

2014-05-07 Thread Rogers, Gerald
Venky,

This also applies to mbuf pools.  Inside of the openvswitch.org patches we
allocate mbuf pools for a port, but we are unable to free them back when
the port is removed.

One other request (maybe it is there, and I?m unaware), is the ability to
dynamically add / remove a physical port to DPDK.  Basically we should be
able to reassign on the fly a port from the kernel to DPDK, and vice versa
(of course with the caveat that all structures be released in both
environments and a port reinitialized).

Gerald

On 5/7/14, 7:01 AM, "Venkatesan, Venky"  wrote:

>Olivier, 
>
>We should look at how to make the memseg capable of doing alloc/free
>(including re-assembly of fragments) after the 1.7 release. Is that
>something you are considering doing (or are there any other DPDKers
>considering this), or should I look at putting together a patch for that?
>
>Regards, 
>-Venky
>
>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
>Sent: Wednesday, May 07, 2014 4:39 AM
>To: Igor Ryzhov; dev at dpdk.org
>Subject: Re: [dpdk-dev] RTE Ring removing
>
>Hi Igor,
>
>On 05/07/2014 09:54 AM, Igor Ryzhov wrote:
>> I noticed that in Memzone realization there is a special global
>> variable "free_memseg" containing pointers on free memory segments.
>> An memzone reserve function just finst the best segment for allocation
>> from this "free_memseg" variable.
>>
>> So I think there is a possibility to unreserve already reserved memory
>> back to "free_memseg", and impossibility of unreserving memory is just
>> because there is no function for that, not because it is impossible in
>> principle.
>> Am I right? Or there are any restrictions?
>
>I think that implementing a freeing of memory segment is feasible, but it
>would require some work to properly merge freed zones to avoid memory
>fragmentation.
>
>Another solution is to allocate/free rings in standard memory (malloc for
>instance) instead of rte_memzones. Let me know if the patches I've just
>sent on the mailing list solves your issue.
>
>By the way, I plan to do the same thing for mempools in the coming weeks
>but there is much more work.
>
>Regards,
>Olivier
>



[dpdk-dev] RTE Ring removing

2014-05-07 Thread Olivier MATZ
Hi Konstantin,

On 05/07/2014 05:19 PM, Ananyev, Konstantin wrote:
> Just to clarify about mempool - I suppose you are talking about
> ability to place internal ring and mempool metadata inside externally
> allocated memory?

Yes, exactly.

> It is already possible to keep mempool elements inside externally
> allocated memory (rte_mempool_xmem_create()).

You are right, but I think the current API is a bit too complex.
For instance the function rte_mempool_xmem_create() has 15 arguments
which is probably too much.

Anyway, as soon as I'll have a patch to show, I'll send it to the
mailing list so we can discuss it.

Regards,
Olivier



[dpdk-dev] [PATCH 4/5] memzone: add iterator function

2014-05-07 Thread Stephen Hemminger
On Tue, 6 May 2014 09:17:46 +
"Burakov, Anatoly"  wrote:

> Hi Stephen,
> 
> > When doing diagnostic function, it is useful to have a ability to iterate 
> > over all
> > memzones.
> > 
> 
> You can already access all memzones through 
> rte_eal_get_configuration()->mem_config.memzone[idx].
> 
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
> 

Yes you can look at memzone[idx] to look at individual zones, but there
is no thread safe way to iterate.


[dpdk-dev] packet loss: multi-queue (RSS enabled)

2014-05-07 Thread Daniel Kaminsky
Hi Hamid,

I didn't see any attachment but I think there is a solution. My first
question is, when you created the mempool, how much did you defined the
memory cache per CPU? In the example program this number is fairly small
(32) and I think that increasing it to something much bigger (e.g. 512)
will significantly improve the CPU scalability.

Regards,
Daniel Kaminsky


On Wed, Apr 30, 2014 at 7:56 AM, Jayakumar, Muthurajan <
muthurajan.jayakumar at intel.com> wrote:

> Hi,
> Please find the attached paper http://kfall.net/ucbpage/papers/snc.pdf
> Figures 4 and 5 refers about the degradation when the # of queues are
> increased.
> It refers sweet spot as 2 to 4 queues.
>
> Have you please verified with smaller # of queues please?
>
> Thanks,
>
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hamid Ramazani
> Sent: Tuesday, April 29, 2014 9:42 PM
> To: Thomas Monjalon; dev at dpdk.org
> Subject: [dpdk-dev] packet loss: multi-queue (RSS enabled)
>
> Hi,
> I tried a lot (more than a week) to solve the problem myself and not to
> bother the list, but I didn't succeed. Maybe other people have the same
> problem.
>
> I have a simple program attached. It is intended for simple packet
> capturing; captures from interface and writes to memory, and frees the
> memory in the next loop.
>
> I have a 10G 82599EB Intel SFI/SFP+ network interface for capturing
> packets.
> As you may know, this network card supports up to 128 RSS queues.
>
> This is just a test, so the packets being sent at 820Kpps (kilo packet per
> second). Each packet is 1500B (fixed size); it is 9.16 Gbit per second.
> Of course when the packet per seconds goes up and packet size goes down
> (e.g. 400B per packet), it gets much worse.
>
> When using one queue to receive, I receive all the packets, with no loss.
> When I use more than one queue (e.g. 8 queues), with each thread running
> on a dedicated core, I have a considerable amount of loss.
>
> Please note that:
> 1. The computer has 12 * 2.67GHz cores, and it does nothing else but
> capturing packets. The CPU is Intel Xeon X5650.
> 2. The operating system is Ubuntu 12.04.3 LTS
>
> Attached file includes:
> main.h
> main.c
> Makefile
> ./run.sh
>
> It is configured to be run with 8 queues.
>
> If you want to change the number of receive queues, please:
> 1. in main.c, change the value assigned to nb_rx_q_of_dev to the desired
> value.
> 2. change core mask in run.sh file (since there is SKIP_MASTER, you should
> give a core containing one more CPU than given number of queues).
>
> I think there might be following problems:
> 1. the port configuration is not fine.
> 2. freeing memory has a considerable amount of overhead, and may be I
> shouldn't do that. But If I don't the pool will be full, won't be? Is there
> any other way?
>
> Please help.
> Thanks a lot in advance for your help and comments.
>
> All the Best,
> Hamid
>