[dpdk-dev] Surprisingly high TCP ACK packets drop counter

2013-11-05 Thread Alexander Belyakov
Hello,

On Mon, Nov 4, 2013 at 7:06 AM, Prashant Upadhyaya <
prashant.upadhyaya at aricent.com> wrote:

> Hi Alexander,
>
> Please confirm if the patch works for you.
>

Disabling RSC (DPDK 1.3) indeed brings ACK flood forwarding performance to
14,5+ Mpps. No negative side affects were discovered so far, but we're
still testing.


>
> @Wang, are you saying that without the patch the NIC does not fan out the
> messages properly on all the receive queues ?
> So what exactly happens ?
>
>
Patch deals with RSC (receive side coalescing) but not RSS (receive side
scaling).


> Regards
> -Prashant
>
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexander Belyakov
> Sent: Monday, November 04, 2013 1:51 AM
> To: Wang, Shawn
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter
>
> Hi,
>
> thanks for the patch and explanation. We have tried DPDK 1.3 and 1.5 -
> both have the same issue.
>
> Regards,
> Alexander
>
>


[dpdk-dev] Surprisingly high TCP ACK packets drop counter

2013-11-05 Thread Olivier MATZ
Hi,

 > Disabling RSC (DPDK 1.3) indeed brings ACK flood forwarding
 > performance to 14,5+ Mpps. No negative side affects were discovered
 > so far, but we're still testing.

The role of RSC is to reassemble input TCP segments, so it is possible
that the number of TCP packets sent to the DPDK is lower but some
packets may contain more data. Can you confirm that?

In my opinion, this mechanism should be disabled by default because it
could break PMTU discovery on a router. However it could be useful for
somebody doing TCP termination only.

Regards,
Olivier



[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue

2013-11-05 Thread Jose Gavine Cueto
Hi,

When using *static int rte_ring_dequeue( structe rte_ring * r, void **
obj_p )*, is the user presumed to allocate obj_p , or does this method
allocates this obj_p ?

Cheers,
Pepe

-- 
To stop learning is like to stop loving.


[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue

2013-11-05 Thread Cyril Cressent
On Tue, Nov 05, 2013 at 06:15:01PM +0800, Jose Gavine Cueto wrote:
> 
> When using *static int rte_ring_dequeue( structe rte_ring * r, void **
> obj_p )*, is the user presumed to allocate obj_p , or does this method
> allocates this obj_p ?

This method doesn't allocate anything ; you have to allocate the object
you want to fill yourself.

You can find more details about how to work with rings there:

- DPDK Programmer's Guide (Chapter 5 - Ring Library)
- DPDK API Documentation for rte_ring.h

Both of which are accessible at http://dpdk.org/doc

Quite a few sample applications (examples/ directory in the DPDK
distribution) also make use of rings, notably the quota_watermark one,
so it's also a good place to look at.

On a side note, it looks like the API reference page for rte_ring.h is
broken? It's missing a lot of functions. I'll look into it if I get a
chance.
http://dpdk.org/doc/api/rte__ring_8h.html#func-members

Cyril


[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue

2013-11-05 Thread Jose Gavine Cueto
Thank you,

I've actually read the code & guide, but I wanted to make sure that what I
understood was correct.

Cheers,
Pepe


On Tue, Nov 5, 2013 at 6:33 PM, Cyril Cressent wrote:

> On Tue, Nov 05, 2013 at 06:15:01PM +0800, Jose Gavine Cueto wrote:
> >
> > When using *static int rte_ring_dequeue( structe rte_ring * r, void **
> > obj_p )*, is the user presumed to allocate obj_p , or does this method
> > allocates this obj_p ?
>
> This method doesn't allocate anything ; you have to allocate the object
> you want to fill yourself.
>
> You can find more details about how to work with rings there:
>
> - DPDK Programmer's Guide (Chapter 5 - Ring Library)
> - DPDK API Documentation for rte_ring.h
>
> Both of which are accessible at http://dpdk.org/doc
>
> Quite a few sample applications (examples/ directory in the DPDK
> distribution) also make use of rings, notably the quota_watermark one,
> so it's also a good place to look at.
>
> On a side note, it looks like the API reference page for rte_ring.h is
> broken? It's missing a lot of functions. I'll look into it if I get a
> chance.
> http://dpdk.org/doc/api/rte__ring_8h.html#func-members
>
> Cyril
>



-- 
To stop learning is like to stop loving.


[dpdk-dev] Thread preemption and rte_ring

2013-11-05 Thread Dmitry Vyal
Hello,

Documentation for rte_ring says: the ring implementation is not 
preemptable. A lcore must not be interrupted by another task that uses 
the same ring. What does it precisely mean? Must all the producers and 
consumers be non-preemptive? Can we relax that restriction somehow? Say, 
can I have multiple non-preemptive writers running on dedicated cores 
and a single reader running as a regular Linux thread?

Thanks,
Dmitry


[dpdk-dev] [PATCH] doc: fix doxygen parsing of __attribute__

2013-11-05 Thread Thomas Monjalon
Ignore __attribute__ because it was wrongly parsed as an identifier.

Signed-off-by: Thomas Monjalon 
---
 doc/doxy-api.conf |4 
 1 file changed, 4 insertions(+)

diff --git a/doc/doxy-api.conf b/doc/doxy-api.conf
index 7ea692a..749db78 100644
--- a/doc/doxy-api.conf
+++ b/doc/doxy-api.conf
@@ -47,9 +47,13 @@ INPUT   = doc/doxy-api-index.md \
 FILE_PATTERNS   = rte_*.h \
   cmdline.h
 PREDEFINED  = __DOXYGEN__ \
+  __attribute__(x)= \
   RTE_MBUF_SCATTER_GATHER

 OPTIMIZE_OUTPUT_FOR_C   = YES
+ENABLE_PREPROCESSING= YES
+MACRO_EXPANSION = YES
+EXPAND_ONLY_PREDEF  = YES
 EXTRACT_STATIC  = YES
 HIDE_UNDOC_MEMBERS  = YES
 HIDE_UNDOC_CLASSES  = YES
-- 
1.7.10.4



[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue

2013-11-05 Thread Thomas Monjalon
05/11/2013 11:33, Cyril Cressent :
> On a side note, it looks like the API reference page for rte_ring.h is
> broken? It's missing a lot of functions. I'll look into it if I get a
> chance.
> http://dpdk.org/doc/api/rte__ring_8h.html#func-members

It is fixed by the patch I just sent. Thanks for reporting.
-- 
Thomas


[dpdk-dev] Surprisingly high TCP ACK packets drop counter

2013-11-05 Thread Alexander Belyakov
Hello,

The role of RSC is to reassemble input TCP segments, so it is possible
> that the number of TCP packets sent to the DPDK is lower but some
> packets may contain more data. Can you confirm that?
>
>
I don't think out test case can answer your question, because all generated
TCP ACK packets were as small as possible (no tcp payload at all). Source
IPs and ports were picked at random for each packet, so most of (adjacent)
packets belong to different TCP sessions.


> In my opinion, this mechanism should be disabled by default because it
> could break PMTU discovery on a router. However it could be useful for
> somebody doing TCP termination only.
>
>
I was thinking about new rte_eth_rxmode structure option:

@@ -280,6 +280,7 @@ struct rte_eth_rxmode {
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
+   disable_rsc  : 1, /**< Disable RSC (receive side
convalescing). */
hw_strip_crc : 1; /**< Enable CRC stripping by
hardware. */
 };


Regards,
Alexander


[dpdk-dev] pci_unbind.py failure

2013-11-05 Thread Jyotiswarup Raiturkar
Hello

I'm trying to install DPDK on my laptop.

I have 82579LM NIC which i'm trying to bind to the igb_uio driver. I get
the following error

# ./tools/pci_unbind.py --status

Network devices using IGB_UIO driver



Network devices using kernel driver
===
:00:19.0 '82579LM Gigabit Network Connection' if=eth0 drv=e1000e
unused=

Other network devices
=



# ./tools/pci_unbind.py --force --bind=igb_uio 00:19.0
Error: bind failed for :00:19.0 - Cannot bind to driver igb_uio
Error: unbind failed for :00:19.0 - Cannot open
/sys/bus/pci/drivers//unbind

After this, the --status shows this :

# ./tools/pci_unbind.py --status

Network devices using IGB_UIO driver



Network devices using kernel driver
===


Other network devices
=
:00:19.0 '82579LM Gigabit Network Connection' unused=e1000e


My kernel version is 3.5.0-23-generic ( Ubuntu 12.04.2 LTS). I heard about
an UIO bug in 3.10; is this the same bug i;m hitting?

Thanks
Jyoti


[dpdk-dev] pci_unbind.py failure

2013-11-05 Thread Cyril Cressent
On Tue, Nov 05, 2013 at 05:41:17PM +0530, Jyotiswarup Raiturkar wrote:
> 
> I have 82579LM NIC which i'm trying to bind to the igb_uio driver. I get
> the following error

I can't find the 82579LM listed on
http://dpdk.org/doc/nics
or in
lib/librte_eal/common/include/rte_pci_dev_ids.h

My guess is that your NIC is not supported.

> My kernel version is 3.5.0-23-generic ( Ubuntu 12.04.2 LTS). I heard about
> an UIO bug in 3.10; is this the same bug i;m hitting?

No, the bug you mention prevented the correct registration of the second
and subsequent ports bound to igb_uio. The first port bound to igb_uio
was working fine.

Cyril


[dpdk-dev] pci_unbind.py failure

2013-11-05 Thread Jyotiswarup Raiturkar
Thanks for the quick reply. I saw some definitions of e1000_phy_82579 hence
I thought (hoped) the NIC would be supported. I will try to run my dpdk app
inside a VM with an emulated e1000 NIC (just to test the code ..).

Thanks





On Tue, Nov 5, 2013 at 5:53 PM, Cyril Cressent wrote:

> On Tue, Nov 05, 2013 at 05:41:17PM +0530, Jyotiswarup Raiturkar wrote:
> >
> > I have 82579LM NIC which i'm trying to bind to the igb_uio driver. I get
> > the following error
>
> I can't find the 82579LM listed on
> http://dpdk.org/doc/nics
> or in
> lib/librte_eal/common/include/rte_pci_dev_ids.h
>
> My guess is that your NIC is not supported.
>
> > My kernel version is 3.5.0-23-generic ( Ubuntu 12.04.2 LTS). I heard
> about
> > an UIO bug in 3.10; is this the same bug i;m hitting?
>
> No, the bug you mention prevented the correct registration of the second
> and subsequent ports bound to igb_uio. The first port bound to igb_uio
> was working fine.
>
> Cyril
>


[dpdk-dev] Surprisingly high TCP ACK packets drop counter

2013-11-05 Thread Prashant Upadhyaya
Hi Alexander,

I am also wondering like Olivier ? yours is a nice testcase and setup, hence 
requesting the information below instead of spending a lot of time reinventing 
the test case at my end.
If you have the time on your side, it would be interesting to know what is the 
number of packets per second received inside your application on each of your 4 
queues individually in both the usecases ? with and without RSC.

I am just wondering (since your throughput almost exactly goes down 50 %), that 
your apparent randomization of packets may not really be random enough and with 
RSC enabled the packets are coming on two queues only or there might be an 
uneven distribution.
Or it may well be that NIC gets overwhelmed with RSC processing and that brings 
down the throughput.

Either way, it would be very interesting to get stats for packets per second on 
each queue in both the usecases.

Regards
-Prashant


From: Alexander Belyakov [mailto:abely...@gmail.com]
Sent: Tuesday, November 05, 2013 5:29 PM
To: Olivier MATZ
Cc: Prashant Upadhyaya; dev at dpdk.org
Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter

Hello,

The role of RSC is to reassemble input TCP segments, so it is possible
that the number of TCP packets sent to the DPDK is lower but some
packets may contain more data. Can you confirm that?

I don't think out test case can answer your question, because all generated TCP 
ACK packets were as small as possible (no tcp payload at all). Source IPs and 
ports were picked at random for each packet, so most of (adjacent) packets 
belong to different TCP sessions.

In my opinion, this mechanism should be disabled by default because it
could break PMTU discovery on a router. However it could be useful for
somebody doing TCP termination only.

I was thinking about new rte_eth_rxmode structure option:

@@ -280,6 +280,7 @@ struct rte_eth_rxmode {
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
+   disable_rsc  : 1, /**< Disable RSC (receive side 
convalescing). */
hw_strip_crc : 1; /**< Enable CRC stripping by hardware. */
 };

Regards,
Alexander




===
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===


[dpdk-dev] Thread preemption and rte_ring

2013-11-05 Thread Olivier MATZ
Hello Dmitry,

 > Documentation for rte_ring says: the ring implementation is not
 > preemptable. A lcore must not be interrupted by another task that uses
 > the same ring. What does it precisely mean? Must all the producers and
 > consumers be non-preemptive?

The "non-preemptive" constraint means:

- a pthread doing multi-producers enqueues on a given ring must not
   be preempted by another pthread doing a multi-producer enqueue on
   the same ring.
- a pthread doing multi-consumers dequeues on a given ring must not
   be preempted by another pthread doing a multi-consumer dequeue on
   the same ring.

Bypassing this constraints may cause the 2nd pthread to spin until the
1st one is scheduled again.
Moreover, if the 1st pthread is preempted by a context that has an
higher priority (for instance a kernel thread), it can even cause
a dead lock.

 > Can we relax that restriction somehow? Say,
 > can I have multiple non-preemptive writers running on dedicated cores
 > and a single reader running as a regular Linux thread?

Yes, this should work.

Regards,
Olivier



[dpdk-dev] Unable to compile DPDK 1.5 on Debian GNU/Linux: lib/librte_eal/linuxapp/igb_uio

2013-11-05 Thread Cyril Cressent
Hi Marc,

On Mon, Nov 04, 2013 at 09:53:29PM +0100, Marc Sune wrote:
> 
> I think it is not this variable. When the folder
> /lib/modules/$(shell uname -r)/build does not exist, the Makefile
> properly warns you (I manually created it, since it was not existing
> during the first compilation attempt).

build should be a symlink to the corresponding kernel sources or
headers, usually in /usr/src/.

> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build
> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/
> build   modules.alias  modules.builtin  modules.dep
> modules.devname  modules.softdep  modules.symbols.bin
> kernel  modules.alias.bin  modules.builtin.bin  modules.dep.bin
> modules.ordermodules.symbols  source
> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build -la
> total 8
> drwxr-xr-x 2 root root 4096 jul 31 16:41 .
> drwxr-xr-x 4 root root 4096 nov  4 16:43 ..

That output shows that "build" is not a symlink to the kernel
sources/headers. Make it a symlink to /usr/src/linux-headers-`uname -r`.
And double check you have the kernel headers there...

> Concerning kernel headers, the kernel headers for the running kernel
> were already installed (via apt-get install linux-headers-`uname
> -r`), and no custom kernel is installed in the system.

That's weird ; you should have had the symlink properly created if you
used apt...

> Actually, this seems to me more of a variable definition problem,
> like the $(wildcard $(RTE_KERNELDIR)) but somehow related to the
> DPDK target folders, rather than an issue with the headers/gcc,
> since it is 'make' which is not able to find the existing file. But
> I could be wrong..

Yes, the target folder is /lib/modules/`uname -r`/build, which contains
a Makefile. Yours is empty because it's not the expected symlink, and
make then complains because there is no Makefile there.


[dpdk-dev] Unable to compile DPDK 1.5 on Debian GNU/Linux: lib/librte_eal/linuxapp/igb_uio

2013-11-05 Thread Marc Sune
Dear Thomas,

Thank you that really was the problem. I am still puzzled why it 
happend, since the headers were installed before. I will update the rest 
of installations.

Probably it would be slightly better to check whether the build folder 
contains the right Makefiles and scripts needed to print the right 
error, otherwise the output of make is misleading.

As I said thank you and regards
marc

On 05/11/13 16:42, Cyril Cressent wrote:
> Hi Marc,
>
> On Mon, Nov 04, 2013 at 09:53:29PM +0100, Marc Sune wrote:
>> I think it is not this variable. When the folder
>> /lib/modules/$(shell uname -r)/build does not exist, the Makefile
>> properly warns you (I manually created it, since it was not existing
>> during the first compilation attempt).
> build should be a symlink to the corresponding kernel sources or
> headers, usually in /usr/src/.
>
>> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build
>> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/
>> build   modules.alias  modules.builtin  modules.dep
>> modules.devname  modules.softdep  modules.symbols.bin
>> kernel  modules.alias.bin  modules.builtin.bin  modules.dep.bin
>> modules.ordermodules.symbols  source
>> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build -la
>> total 8
>> drwxr-xr-x 2 root root 4096 jul 31 16:41 .
>> drwxr-xr-x 4 root root 4096 nov  4 16:43 ..
> That output shows that "build" is not a symlink to the kernel
> sources/headers. Make it a symlink to /usr/src/linux-headers-`uname -r`.
> And double check you have the kernel headers there...
>
>> Concerning kernel headers, the kernel headers for the running kernel
>> were already installed (via apt-get install linux-headers-`uname
>> -r`), and no custom kernel is installed in the system.
> That's weird ; you should have had the symlink properly created if you
> used apt...
>
>> Actually, this seems to me more of a variable definition problem,
>> like the $(wildcard $(RTE_KERNELDIR)) but somehow related to the
>> DPDK target folders, rather than an issue with the headers/gcc,
>> since it is 'make' which is not able to find the existing file. But
>> I could be wrong..
> Yes, the target folder is /lib/modules/`uname -r`/build, which contains
> a Makefile. Yours is empty because it's not the expected symlink, and
> make then complains because there is no Makefile there.



[dpdk-dev] pci_unbind.py failure

2013-11-05 Thread Cyril Cressent
On Tue, Nov 05, 2013 at 08:01:06PM +0530, Jyotiswarup Raiturkar wrote:

> Thanks for the quick reply. I saw some definitions of e1000_phy_82579 hence
> I thought (hoped) the NIC would be supported. I will try to run my dpdk app
> inside a VM with an emulated e1000 NIC (just to test the code ..).

As a general rule, even if you find references to a NIC in the poll mode
drivers, if it's not listed in lib/librte_eal/common/include/rte_pci_dev_ids.h
then consider the NIC as not supported.

Good luck with the VM,

Cyril


[dpdk-dev] Surprisingly high TCP ACK packets drop counter

2013-11-05 Thread Wang, Shawn
My test is almost same with Alexander. But we only use one rx queue.


Sent from Samsung Mobile



 Original message 
From: Prashant Upadhyaya 
Date: 11/05/2013 6:41 AM (GMT-08:00)
To: Alexander Belyakov ,Olivier MATZ 
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter


Hi Alexander,

I am also wondering like Olivier ? yours is a nice testcase and setup, hence 
requesting the information below instead of spending a lot of time reinventing 
the test case at my end.
If you have the time on your side, it would be interesting to know what is the 
number of packets per second received inside your application on each of your 4 
queues individually in both the usecases ? with and without RSC.

I am just wondering (since your throughput almost exactly goes down 50 %), that 
your apparent randomization of packets may not really be random enough and with 
RSC enabled the packets are coming on two queues only or there might be an 
uneven distribution.
Or it may well be that NIC gets overwhelmed with RSC processing and that brings 
down the throughput.

Either way, it would be very interesting to get stats for packets per second on 
each queue in both the usecases.

Regards
-Prashant


From: Alexander Belyakov [mailto:abely...@gmail.com]
Sent: Tuesday, November 05, 2013 5:29 PM
To: Olivier MATZ
Cc: Prashant Upadhyaya; dev at dpdk.org
Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter

Hello,

The role of RSC is to reassemble input TCP segments, so it is possible
that the number of TCP packets sent to the DPDK is lower but some
packets may contain more data. Can you confirm that?

I don't think out test case can answer your question, because all generated TCP 
ACK packets were as small as possible (no tcp payload at all). Source IPs and 
ports were picked at random for each packet, so most of (adjacent) packets 
belong to different TCP sessions.

In my opinion, this mechanism should be disabled by default because it
could break PMTU discovery on a router. However it could be useful for
somebody doing TCP termination only.

I was thinking about new rte_eth_rxmode structure option:

@@ -280,6 +280,7 @@ struct rte_eth_rxmode {
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
+   disable_rsc  : 1, /**< Disable RSC (receive side 
convalescing). */
hw_strip_crc : 1; /**< Enable CRC stripping by hardware. */
 };

Regards,
Alexander




===
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===


[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue

2013-11-05 Thread Cyril Cressent
On Wed, Nov 06, 2013 at 12:47:13AM +0800, Jose Gavine Cueto wrote:

> Your'e welcome, and by the way the multiprocess example of simple_mp seems
> confusing here:
> 
> static int
> lcore_recv(__attribute__((unused)) void *arg)
> {
> unsigned lcore_id = rte_lcore_id();
> 
> printf("Starting core %u\n", lcore_id);
> while (!quit){
> void *msg;
> if (rte_ring_dequeue(recv_ring, &msg) < 0){
> usleep(5);
> continue;
> }
> printf("core %u: Received '%s'\n", lcore_id, (char *)msg);
> rte_mempool_put(message_pool, msg);
> }
> 
> return 0;
> }
> 
> It seems that it isn't allocating msg here, or maybe I'm just missing 
> something

I understand your question better now, and in that light I think my
previous answer was confusing. Let me try to clarify:

A ring only holds *pointers* to objects.  You enqueue pointers, and
dequeue those pointers later, somewhere else, usually in another thread.
The allocation/deallocation of the actual objects is none the concern of
the ring and its enqueue/dequeue operations.

If we take the simple_mp example, the msg dequeued by the lcore_recv()
thread is created in mp_command.c and a pointer to that message is
enqueued on "send_ring". If you read carefully how the rings are created
you'll understand how "send_ring" and "recv_ring" relate to each other.

I hope this is a bit clearer,

Cyril