[dpdk-dev] Surprisingly high TCP ACK packets drop counter
Hello, On Mon, Nov 4, 2013 at 7:06 AM, Prashant Upadhyaya < prashant.upadhyaya at aricent.com> wrote: > Hi Alexander, > > Please confirm if the patch works for you. > Disabling RSC (DPDK 1.3) indeed brings ACK flood forwarding performance to 14,5+ Mpps. No negative side affects were discovered so far, but we're still testing. > > @Wang, are you saying that without the patch the NIC does not fan out the > messages properly on all the receive queues ? > So what exactly happens ? > > Patch deals with RSC (receive side coalescing) but not RSS (receive side scaling). > Regards > -Prashant > > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexander Belyakov > Sent: Monday, November 04, 2013 1:51 AM > To: Wang, Shawn > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter > > Hi, > > thanks for the patch and explanation. We have tried DPDK 1.3 and 1.5 - > both have the same issue. > > Regards, > Alexander > >
[dpdk-dev] Surprisingly high TCP ACK packets drop counter
Hi, > Disabling RSC (DPDK 1.3) indeed brings ACK flood forwarding > performance to 14,5+ Mpps. No negative side affects were discovered > so far, but we're still testing. The role of RSC is to reassemble input TCP segments, so it is possible that the number of TCP packets sent to the DPDK is lower but some packets may contain more data. Can you confirm that? In my opinion, this mechanism should be disabled by default because it could break PMTU discovery on a router. However it could be useful for somebody doing TCP termination only. Regards, Olivier
[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue
Hi, When using *static int rte_ring_dequeue( structe rte_ring * r, void ** obj_p )*, is the user presumed to allocate obj_p , or does this method allocates this obj_p ? Cheers, Pepe -- To stop learning is like to stop loving.
[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue
On Tue, Nov 05, 2013 at 06:15:01PM +0800, Jose Gavine Cueto wrote: > > When using *static int rte_ring_dequeue( structe rte_ring * r, void ** > obj_p )*, is the user presumed to allocate obj_p , or does this method > allocates this obj_p ? This method doesn't allocate anything ; you have to allocate the object you want to fill yourself. You can find more details about how to work with rings there: - DPDK Programmer's Guide (Chapter 5 - Ring Library) - DPDK API Documentation for rte_ring.h Both of which are accessible at http://dpdk.org/doc Quite a few sample applications (examples/ directory in the DPDK distribution) also make use of rings, notably the quota_watermark one, so it's also a good place to look at. On a side note, it looks like the API reference page for rte_ring.h is broken? It's missing a lot of functions. I'll look into it if I get a chance. http://dpdk.org/doc/api/rte__ring_8h.html#func-members Cyril
[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue
Thank you, I've actually read the code & guide, but I wanted to make sure that what I understood was correct. Cheers, Pepe On Tue, Nov 5, 2013 at 6:33 PM, Cyril Cressent wrote: > On Tue, Nov 05, 2013 at 06:15:01PM +0800, Jose Gavine Cueto wrote: > > > > When using *static int rte_ring_dequeue( structe rte_ring * r, void ** > > obj_p )*, is the user presumed to allocate obj_p , or does this method > > allocates this obj_p ? > > This method doesn't allocate anything ; you have to allocate the object > you want to fill yourself. > > You can find more details about how to work with rings there: > > - DPDK Programmer's Guide (Chapter 5 - Ring Library) > - DPDK API Documentation for rte_ring.h > > Both of which are accessible at http://dpdk.org/doc > > Quite a few sample applications (examples/ directory in the DPDK > distribution) also make use of rings, notably the quota_watermark one, > so it's also a good place to look at. > > On a side note, it looks like the API reference page for rte_ring.h is > broken? It's missing a lot of functions. I'll look into it if I get a > chance. > http://dpdk.org/doc/api/rte__ring_8h.html#func-members > > Cyril > -- To stop learning is like to stop loving.
[dpdk-dev] Thread preemption and rte_ring
Hello, Documentation for rte_ring says: the ring implementation is not preemptable. A lcore must not be interrupted by another task that uses the same ring. What does it precisely mean? Must all the producers and consumers be non-preemptive? Can we relax that restriction somehow? Say, can I have multiple non-preemptive writers running on dedicated cores and a single reader running as a regular Linux thread? Thanks, Dmitry
[dpdk-dev] [PATCH] doc: fix doxygen parsing of __attribute__
Ignore __attribute__ because it was wrongly parsed as an identifier. Signed-off-by: Thomas Monjalon --- doc/doxy-api.conf |4 1 file changed, 4 insertions(+) diff --git a/doc/doxy-api.conf b/doc/doxy-api.conf index 7ea692a..749db78 100644 --- a/doc/doxy-api.conf +++ b/doc/doxy-api.conf @@ -47,9 +47,13 @@ INPUT = doc/doxy-api-index.md \ FILE_PATTERNS = rte_*.h \ cmdline.h PREDEFINED = __DOXYGEN__ \ + __attribute__(x)= \ RTE_MBUF_SCATTER_GATHER OPTIMIZE_OUTPUT_FOR_C = YES +ENABLE_PREPROCESSING= YES +MACRO_EXPANSION = YES +EXPAND_ONLY_PREDEF = YES EXTRACT_STATIC = YES HIDE_UNDOC_MEMBERS = YES HIDE_UNDOC_CLASSES = YES -- 1.7.10.4
[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue
05/11/2013 11:33, Cyril Cressent : > On a side note, it looks like the API reference page for rte_ring.h is > broken? It's missing a lot of functions. I'll look into it if I get a > chance. > http://dpdk.org/doc/api/rte__ring_8h.html#func-members It is fixed by the patch I just sent. Thanks for reporting. -- Thomas
[dpdk-dev] Surprisingly high TCP ACK packets drop counter
Hello, The role of RSC is to reassemble input TCP segments, so it is possible > that the number of TCP packets sent to the DPDK is lower but some > packets may contain more data. Can you confirm that? > > I don't think out test case can answer your question, because all generated TCP ACK packets were as small as possible (no tcp payload at all). Source IPs and ports were picked at random for each packet, so most of (adjacent) packets belong to different TCP sessions. > In my opinion, this mechanism should be disabled by default because it > could break PMTU discovery on a router. However it could be useful for > somebody doing TCP termination only. > > I was thinking about new rte_eth_rxmode structure option: @@ -280,6 +280,7 @@ struct rte_eth_rxmode { hw_vlan_strip: 1, /**< VLAN strip enable. */ hw_vlan_extend : 1, /**< Extended VLAN enable. */ jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */ + disable_rsc : 1, /**< Disable RSC (receive side convalescing). */ hw_strip_crc : 1; /**< Enable CRC stripping by hardware. */ }; Regards, Alexander
[dpdk-dev] pci_unbind.py failure
Hello I'm trying to install DPDK on my laptop. I have 82579LM NIC which i'm trying to bind to the igb_uio driver. I get the following error # ./tools/pci_unbind.py --status Network devices using IGB_UIO driver Network devices using kernel driver === :00:19.0 '82579LM Gigabit Network Connection' if=eth0 drv=e1000e unused= Other network devices = # ./tools/pci_unbind.py --force --bind=igb_uio 00:19.0 Error: bind failed for :00:19.0 - Cannot bind to driver igb_uio Error: unbind failed for :00:19.0 - Cannot open /sys/bus/pci/drivers//unbind After this, the --status shows this : # ./tools/pci_unbind.py --status Network devices using IGB_UIO driver Network devices using kernel driver === Other network devices = :00:19.0 '82579LM Gigabit Network Connection' unused=e1000e My kernel version is 3.5.0-23-generic ( Ubuntu 12.04.2 LTS). I heard about an UIO bug in 3.10; is this the same bug i;m hitting? Thanks Jyoti
[dpdk-dev] pci_unbind.py failure
On Tue, Nov 05, 2013 at 05:41:17PM +0530, Jyotiswarup Raiturkar wrote: > > I have 82579LM NIC which i'm trying to bind to the igb_uio driver. I get > the following error I can't find the 82579LM listed on http://dpdk.org/doc/nics or in lib/librte_eal/common/include/rte_pci_dev_ids.h My guess is that your NIC is not supported. > My kernel version is 3.5.0-23-generic ( Ubuntu 12.04.2 LTS). I heard about > an UIO bug in 3.10; is this the same bug i;m hitting? No, the bug you mention prevented the correct registration of the second and subsequent ports bound to igb_uio. The first port bound to igb_uio was working fine. Cyril
[dpdk-dev] pci_unbind.py failure
Thanks for the quick reply. I saw some definitions of e1000_phy_82579 hence I thought (hoped) the NIC would be supported. I will try to run my dpdk app inside a VM with an emulated e1000 NIC (just to test the code ..). Thanks On Tue, Nov 5, 2013 at 5:53 PM, Cyril Cressent wrote: > On Tue, Nov 05, 2013 at 05:41:17PM +0530, Jyotiswarup Raiturkar wrote: > > > > I have 82579LM NIC which i'm trying to bind to the igb_uio driver. I get > > the following error > > I can't find the 82579LM listed on > http://dpdk.org/doc/nics > or in > lib/librte_eal/common/include/rte_pci_dev_ids.h > > My guess is that your NIC is not supported. > > > My kernel version is 3.5.0-23-generic ( Ubuntu 12.04.2 LTS). I heard > about > > an UIO bug in 3.10; is this the same bug i;m hitting? > > No, the bug you mention prevented the correct registration of the second > and subsequent ports bound to igb_uio. The first port bound to igb_uio > was working fine. > > Cyril >
[dpdk-dev] Surprisingly high TCP ACK packets drop counter
Hi Alexander, I am also wondering like Olivier ? yours is a nice testcase and setup, hence requesting the information below instead of spending a lot of time reinventing the test case at my end. If you have the time on your side, it would be interesting to know what is the number of packets per second received inside your application on each of your 4 queues individually in both the usecases ? with and without RSC. I am just wondering (since your throughput almost exactly goes down 50 %), that your apparent randomization of packets may not really be random enough and with RSC enabled the packets are coming on two queues only or there might be an uneven distribution. Or it may well be that NIC gets overwhelmed with RSC processing and that brings down the throughput. Either way, it would be very interesting to get stats for packets per second on each queue in both the usecases. Regards -Prashant From: Alexander Belyakov [mailto:abely...@gmail.com] Sent: Tuesday, November 05, 2013 5:29 PM To: Olivier MATZ Cc: Prashant Upadhyaya; dev at dpdk.org Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter Hello, The role of RSC is to reassemble input TCP segments, so it is possible that the number of TCP packets sent to the DPDK is lower but some packets may contain more data. Can you confirm that? I don't think out test case can answer your question, because all generated TCP ACK packets were as small as possible (no tcp payload at all). Source IPs and ports were picked at random for each packet, so most of (adjacent) packets belong to different TCP sessions. In my opinion, this mechanism should be disabled by default because it could break PMTU discovery on a router. However it could be useful for somebody doing TCP termination only. I was thinking about new rte_eth_rxmode structure option: @@ -280,6 +280,7 @@ struct rte_eth_rxmode { hw_vlan_strip: 1, /**< VLAN strip enable. */ hw_vlan_extend : 1, /**< Extended VLAN enable. */ jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */ + disable_rsc : 1, /**< Disable RSC (receive side convalescing). */ hw_strip_crc : 1; /**< Enable CRC stripping by hardware. */ }; Regards, Alexander === Please refer to http://www.aricent.com/legal/email_disclaimer.html for important disclosures regarding this electronic communication. ===
[dpdk-dev] Thread preemption and rte_ring
Hello Dmitry, > Documentation for rte_ring says: the ring implementation is not > preemptable. A lcore must not be interrupted by another task that uses > the same ring. What does it precisely mean? Must all the producers and > consumers be non-preemptive? The "non-preemptive" constraint means: - a pthread doing multi-producers enqueues on a given ring must not be preempted by another pthread doing a multi-producer enqueue on the same ring. - a pthread doing multi-consumers dequeues on a given ring must not be preempted by another pthread doing a multi-consumer dequeue on the same ring. Bypassing this constraints may cause the 2nd pthread to spin until the 1st one is scheduled again. Moreover, if the 1st pthread is preempted by a context that has an higher priority (for instance a kernel thread), it can even cause a dead lock. > Can we relax that restriction somehow? Say, > can I have multiple non-preemptive writers running on dedicated cores > and a single reader running as a regular Linux thread? Yes, this should work. Regards, Olivier
[dpdk-dev] Unable to compile DPDK 1.5 on Debian GNU/Linux: lib/librte_eal/linuxapp/igb_uio
Hi Marc, On Mon, Nov 04, 2013 at 09:53:29PM +0100, Marc Sune wrote: > > I think it is not this variable. When the folder > /lib/modules/$(shell uname -r)/build does not exist, the Makefile > properly warns you (I manually created it, since it was not existing > during the first compilation attempt). build should be a symlink to the corresponding kernel sources or headers, usually in /usr/src/. > marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build > marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/ > build modules.alias modules.builtin modules.dep > modules.devname modules.softdep modules.symbols.bin > kernel modules.alias.bin modules.builtin.bin modules.dep.bin > modules.ordermodules.symbols source > marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build -la > total 8 > drwxr-xr-x 2 root root 4096 jul 31 16:41 . > drwxr-xr-x 4 root root 4096 nov 4 16:43 .. That output shows that "build" is not a symlink to the kernel sources/headers. Make it a symlink to /usr/src/linux-headers-`uname -r`. And double check you have the kernel headers there... > Concerning kernel headers, the kernel headers for the running kernel > were already installed (via apt-get install linux-headers-`uname > -r`), and no custom kernel is installed in the system. That's weird ; you should have had the symlink properly created if you used apt... > Actually, this seems to me more of a variable definition problem, > like the $(wildcard $(RTE_KERNELDIR)) but somehow related to the > DPDK target folders, rather than an issue with the headers/gcc, > since it is 'make' which is not able to find the existing file. But > I could be wrong.. Yes, the target folder is /lib/modules/`uname -r`/build, which contains a Makefile. Yours is empty because it's not the expected symlink, and make then complains because there is no Makefile there.
[dpdk-dev] Unable to compile DPDK 1.5 on Debian GNU/Linux: lib/librte_eal/linuxapp/igb_uio
Dear Thomas, Thank you that really was the problem. I am still puzzled why it happend, since the headers were installed before. I will update the rest of installations. Probably it would be slightly better to check whether the build folder contains the right Makefiles and scripts needed to print the right error, otherwise the output of make is misleading. As I said thank you and regards marc On 05/11/13 16:42, Cyril Cressent wrote: > Hi Marc, > > On Mon, Nov 04, 2013 at 09:53:29PM +0100, Marc Sune wrote: >> I think it is not this variable. When the folder >> /lib/modules/$(shell uname -r)/build does not exist, the Makefile >> properly warns you (I manually created it, since it was not existing >> during the first compilation attempt). > build should be a symlink to the corresponding kernel sources or > headers, usually in /usr/src/. > >> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build >> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/ >> build modules.alias modules.builtin modules.dep >> modules.devname modules.softdep modules.symbols.bin >> kernel modules.alias.bin modules.builtin.bin modules.dep.bin >> modules.ordermodules.symbols source >> marc at bisdn-dev:~/BISDN/dpdk$ ls /lib/modules/`uname -r`/build -la >> total 8 >> drwxr-xr-x 2 root root 4096 jul 31 16:41 . >> drwxr-xr-x 4 root root 4096 nov 4 16:43 .. > That output shows that "build" is not a symlink to the kernel > sources/headers. Make it a symlink to /usr/src/linux-headers-`uname -r`. > And double check you have the kernel headers there... > >> Concerning kernel headers, the kernel headers for the running kernel >> were already installed (via apt-get install linux-headers-`uname >> -r`), and no custom kernel is installed in the system. > That's weird ; you should have had the symlink properly created if you > used apt... > >> Actually, this seems to me more of a variable definition problem, >> like the $(wildcard $(RTE_KERNELDIR)) but somehow related to the >> DPDK target folders, rather than an issue with the headers/gcc, >> since it is 'make' which is not able to find the existing file. But >> I could be wrong.. > Yes, the target folder is /lib/modules/`uname -r`/build, which contains > a Makefile. Yours is empty because it's not the expected symlink, and > make then complains because there is no Makefile there.
[dpdk-dev] pci_unbind.py failure
On Tue, Nov 05, 2013 at 08:01:06PM +0530, Jyotiswarup Raiturkar wrote: > Thanks for the quick reply. I saw some definitions of e1000_phy_82579 hence > I thought (hoped) the NIC would be supported. I will try to run my dpdk app > inside a VM with an emulated e1000 NIC (just to test the code ..). As a general rule, even if you find references to a NIC in the poll mode drivers, if it's not listed in lib/librte_eal/common/include/rte_pci_dev_ids.h then consider the NIC as not supported. Good luck with the VM, Cyril
[dpdk-dev] Surprisingly high TCP ACK packets drop counter
My test is almost same with Alexander. But we only use one rx queue. Sent from Samsung Mobile Original message From: Prashant Upadhyaya Date: 11/05/2013 6:41 AM (GMT-08:00) To: Alexander Belyakov ,Olivier MATZ Cc: dev at dpdk.org Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter Hi Alexander, I am also wondering like Olivier ? yours is a nice testcase and setup, hence requesting the information below instead of spending a lot of time reinventing the test case at my end. If you have the time on your side, it would be interesting to know what is the number of packets per second received inside your application on each of your 4 queues individually in both the usecases ? with and without RSC. I am just wondering (since your throughput almost exactly goes down 50 %), that your apparent randomization of packets may not really be random enough and with RSC enabled the packets are coming on two queues only or there might be an uneven distribution. Or it may well be that NIC gets overwhelmed with RSC processing and that brings down the throughput. Either way, it would be very interesting to get stats for packets per second on each queue in both the usecases. Regards -Prashant From: Alexander Belyakov [mailto:abely...@gmail.com] Sent: Tuesday, November 05, 2013 5:29 PM To: Olivier MATZ Cc: Prashant Upadhyaya; dev at dpdk.org Subject: Re: [dpdk-dev] Surprisingly high TCP ACK packets drop counter Hello, The role of RSC is to reassemble input TCP segments, so it is possible that the number of TCP packets sent to the DPDK is lower but some packets may contain more data. Can you confirm that? I don't think out test case can answer your question, because all generated TCP ACK packets were as small as possible (no tcp payload at all). Source IPs and ports were picked at random for each packet, so most of (adjacent) packets belong to different TCP sessions. In my opinion, this mechanism should be disabled by default because it could break PMTU discovery on a router. However it could be useful for somebody doing TCP termination only. I was thinking about new rte_eth_rxmode structure option: @@ -280,6 +280,7 @@ struct rte_eth_rxmode { hw_vlan_strip: 1, /**< VLAN strip enable. */ hw_vlan_extend : 1, /**< Extended VLAN enable. */ jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */ + disable_rsc : 1, /**< Disable RSC (receive side convalescing). */ hw_strip_crc : 1; /**< Enable CRC stripping by hardware. */ }; Regards, Alexander === Please refer to http://www.aricent.com/legal/email_disclaimer.html for important disclosures regarding this electronic communication. ===
[dpdk-dev] preallocation of void ** obj_p of rte_ring_dequeue
On Wed, Nov 06, 2013 at 12:47:13AM +0800, Jose Gavine Cueto wrote: > Your'e welcome, and by the way the multiprocess example of simple_mp seems > confusing here: > > static int > lcore_recv(__attribute__((unused)) void *arg) > { > unsigned lcore_id = rte_lcore_id(); > > printf("Starting core %u\n", lcore_id); > while (!quit){ > void *msg; > if (rte_ring_dequeue(recv_ring, &msg) < 0){ > usleep(5); > continue; > } > printf("core %u: Received '%s'\n", lcore_id, (char *)msg); > rte_mempool_put(message_pool, msg); > } > > return 0; > } > > It seems that it isn't allocating msg here, or maybe I'm just missing > something I understand your question better now, and in that light I think my previous answer was confusing. Let me try to clarify: A ring only holds *pointers* to objects. You enqueue pointers, and dequeue those pointers later, somewhere else, usually in another thread. The allocation/deallocation of the actual objects is none the concern of the ring and its enqueue/dequeue operations. If we take the simple_mp example, the msg dequeued by the lcore_recv() thread is created in mp_command.c and a pointer to that message is enqueued on "send_ring". If you read carefully how the rings are created you'll understand how "send_ring" and "recv_ring" relate to each other. I hope this is a bit clearer, Cyril