[dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS

2015-06-30 Thread Vladimir Medvedkin
Software implementation of the Toeplitz hash function used by RSS.
Can be used either for packet distribution on single queue NIC
or for simulating of RSS computation on specific NIC (for example
after GRE header decapsulating).

v6 changes
- Fix compilation error
- Rename some defines and function

v5 changes
- Fix errors reported by checkpatch.pl

v4 changes
- Fix copyright
- rename bswap_mask constant, add rte_ prefix
- change rte_ipv[46]_tuple struct
- change rte_thash_load_v6_addr prototype

v3 changes
- Rework API to be more generic
- Add sctp_tag into tuple

v2 changes
- Add ipv6 support
- Various style fixes

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 231 
 2 files changed, 232 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..981230b 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h

 # this lib needs eal
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..1808f47
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,231 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* Byte swap mask used for converting IPv6 address
+ * 4-byte chunks to CPU byte order
+ */
+static const __m128i rte_thash_ipv6_bswap_mask = {
+   0x0405060700010203, 0x0C0D0E0F08090A0B};
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv4 header only
+ */
+#define RTE_THASH_V4_L3_LEN((sizeof(struct rte_ipv4_tuple) -   \
+   sizeof(((struct rte_ipv4_tuple *)0)->sctp_tag)) / 4)
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv4 header +
+ * transport header
+ */
+#define RTE_THASH_V4_L4_LEN ((sizeof(struct rte_ipv4_tuple)) / 4)
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv6 header only
+ */
+#define RTE_THASH_V6_L3_LEN((sizeof(struct rte_ipv6_tuple) -   \
+   sizeof(((struct rte_ipv6_tuple *)0)->sctp_tag)) / 4)
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv6 header +
+ * transport header
+ */
+#define RTE_THASH_V6_L4_LEN((sizeof(struct rte_ipv6_tuple)) / 4)
+
+/**
+ * IPv4 tuple
+ * addreses and ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv4_tuple {
+   uint32_tsrc_addr;
+   uint32_tdst_addr;
+   union {
+   struct {
+

[dpdk-dev] [PATCH v4] Add unit test for thash library

2015-06-30 Thread Vladimir Medvedkin
Add unit test for thash library

v4 changes
- Reflect rte_thash.h changes

v3 changes
- Fix checkpatch errors

v2 changes
- fix typo
- remove unnecessary comments

Signed-off-by: Vladimir Medvedkin 
---
 app/test/Makefile |   1 +
 app/test/test_thash.c | 176 ++
 2 files changed, 177 insertions(+)
 create mode 100644 app/test/test_thash.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 2e2758c..caa359c 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -82,6 +82,7 @@ SRCS-y += test_memcpy.c
 SRCS-y += test_memcpy_perf.c

 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
+SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_thash.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_scaling.c
diff --git a/app/test/test_thash.c b/app/test/test_thash.c
new file mode 100644
index 000..8e9dca0
--- /dev/null
+++ b/app/test/test_thash.c
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+
+struct test_thash_v4 {
+   uint32_tdst_ip;
+   uint32_tsrc_ip;
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+struct test_thash_v6 {
+   uint8_t dst_ip[16];
+   uint8_t src_ip[16];
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+/*From 82599 Datasheet 7.1.2.8.3 RSS Verification Suite*/
+struct test_thash_v4 v4_tbl[] = {
+{IPv4(161, 142, 100, 80), IPv4(66, 9, 149, 187),
+   1766, 2794, 0x323e8fc2, 0x51ccc178},
+{IPv4(65, 69, 140, 83), IPv4(199, 92, 111, 2),
+   4739, 14230, 0xd718262a, 0xc626b0ea},
+{IPv4(12, 22, 207, 184), IPv4(24, 19, 198, 95),
+   38024, 12898, 0xd2d0a5de, 0x5c2b394a},
+{IPv4(209, 142, 163, 6), IPv4(38, 27, 205, 30),
+   2217, 48228, 0x82989176, 0xafc7327f},
+{IPv4(202, 188, 127, 2), IPv4(153, 39, 163, 191),
+   1303, 44251, 0x5d1809c5, 0x10e828a2},
+};
+
+struct test_thash_v6 v6_tbl[] = {
+/*3ffe:2501:200:3::1*/
+{{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x00, 0x03,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,},
+/*3ffe:2501:200:1fff::7*/
+{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x1f, 0xff,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07,},
+1766, 2794, 0x2cc18cd5, 0x40207d3d},
+/*ff02::1*/
+{{0xff, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,},
+/*3ffe:501:8::260:97ff:fe40:efab*/
+{0x3f, 0xfe, 0x05, 0x01, 0x00, 0x08, 0x00, 0x00,
+0x02, 0x60, 0x97, 0xff, 0xfe, 0x40, 0xef, 0xab,},
+4739, 14230, 0x0f0c461c, 0xdde51bbf},
+/*fe80::200:f8ff:fe21:67cf*/
+{{0xfe, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x02, 0x00, 0xf8, 0xff, 0xfe, 0x21, 0x67, 0xcf,},
+/*3ffe:1900:4545:3:200:f8ff:fe21:67cf*/
+{0x3f, 0xfe, 0x19, 0x00, 0x45, 0x45, 0x00, 0x03,
+0x02, 0x00, 0xf8, 0xff, 0xfe, 0x21, 0x67, 0xcf,},
+38024, 44251, 0x4b61e985, 0x02d1feef},
+};
+
+uint8_t default_rss_key[] = {
+0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+

[dpdk-dev] dpdk-2.0.0: crash in ixgbe_recv_scattered_pkts_vec->_recv_raw_pkts_vec->desc_to_olflags_v

2015-06-30 Thread Gopakumar Choorakkot Edakkunni
So update on this. Summary is that its purely my fault, apologies for
prematurely suspecting the wrong areas. Details below

1. So my AWS box had an eth0 interface without DPDK, I enabled dpdk
AND created a KNI interface also AND named the KNI interface to be
eth0

2. So Ubuntu started its dhcpclient on that interface, but my app
doesnt really do anything do read the dhcp (renews) from the KNI and
send it out the physical port and vice versa .. The kni was just
sitting there not doing much of Rx/Tx

3. Now my l2fwd-equivalent code started working fine, after a few
minutes, the dhcp client on ubuntu gave up attempting dhcp renew (eth0
already had an IP) and attempted to take off the IP from eth0

4. At this point the standard KNI examples in dpdk which has callbacks
registered, ended up being invoked - and the examples have a
port_stop() and a port_start() in them - and exactly at this point my
app crashed

So my bad! I just no-oped the callbacks for now and changed AWS eht0
from dhcp to static IP and this are fine now ! My system has been up
for long with no issues.

Thanks again Thomas and Bruce for the quick response and suggestions

Rgds,
Gopa.

On Tue, Jun 30, 2015 at 11:28 AM, Gopakumar Choorakkot Edakkunni
 wrote:
> Hi Thomas, Bruce,
>
> Thanks for the responses. Please find my answers as below.
>
> Thomas>> "You mean you are using SR-IOV from Amazon, right? Do you
> have more hardware details?"
>
> That is correct. I am attaching three files cpuinfo.txt lcpci.txt and
> portconf.txt (just the port config that I am using, nothing special,
> yanked off of l2fwd example). The two 82599 VF interfaces seen in
> lspci output are the ones of interest - I use one of them in dpdk
> mode.
>
> Thomas>> Did you try to disable CONFIG_RTE_IXGBE_INC_VECTOR?
>
> Thanks for the suggestion, I made that change and was giving it some
> time. Now the result of that is not entirely black and white:
> previously (in vector mode) my app used to Rx/Tx packets nicely
> without any hiccups, but would crash in 10 minutes :). Now with this
> suggested change, its been running for a while and doesnt crash, but
> the Tx latency and Tx loss is so high (around 10% tx loss) that the
> app is not doing a great job - but that might just be something that I
> need to adapt to when using non-vector mode ? I will experiment on
> that a bit more. So I "think" its fair to say that with the vector
> disabled, theres no crash, but I need to chase this latency/loss now.
>
> Thomas>> Not needed. A DPDK application is fast enough to do the job
> in 10 minutes ;)
>
> Haha, good one :). Thats where I want to get to eventually, but right
> now some distance from it.
>
> Bruce>> Can you perhaps isolate any further the root cause of the
> issue. For example, does it only occur when you get three packets at
> the receive ring wraps back around to zero?
>
> I will try some more experiments, will read and understand this Rx
> code a bit more to be able to answer the qn about whether ring wraps
> around when the problem happens etc..
>
> Rgds,
> Gopa.
>
>
> On Tue, Jun 30, 2015 at 9:08 AM, Thomas Monjalon
>  wrote:
>> 2015-06-30 08:49, Gopakumar Choorakkot Edakkunni:
>>> I am starting to tryout dpdk-2.0.0 with a simple Rx routine very
>>> similar to the l2fwd example - I am running this on a c3.8xlarge aws
>>> sr-iov enabled vpc instance (inside the vm it uses ixgbevf driver).
>>
>> You mean you are using SR-IOV from Amazon, right?
>> Do you have more hardware details?
>>
>>> Once in every 10 minutes my application crashes in the recieve path.
>>> And whenever I check the crash reason its because it always has three
>>> packets in the burst array (I have provided array size of 32) instead
>>> of the four that it tries to collect in one bunch. And inside
>>> desc_to_olflags_v(), theres the assumption that there are four
>>> packets, and obviously it crashes trying to access the fourth buffer.
>>
>> Did you try to disable CONFIG_RTE_IXGBE_INC_VECTOR?
>>
>>> With a brief look at the code, I really cant make out how its
>>> guaranteed that we will always have four descriptors fully populated ?
>>> After the first iteration, the loop does break out if (likely(var !=
>>> RTE_IXGBE_DESCS_PER_LOOP)), but how about the very first iteration
>>> where we might not have four ?
>>>
>>> Any thoughts will be helpful here, trying to get my app working for
>>> more than 10 minutes :)
>>
>> Not needed. A DPDK application is fast enough to do the job in 10 minutes ;)
>>


[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-06-30 Thread Matthew Hall

On Jun 29, 2015, at 3:19 AM, Thomas Monjalon  
wrote:
> There is no such bug with my compiler:
>   clang version 3.6.1 (tags/RELEASE_361/final)
>   Target: x86_64-unknown-linux-gnu
> 
> Matthew, which version are you using?

Hi Thomas and Roman,

It seems to happen if I have set -mavx in CFLAGS with clang 1:3.4-1ubuntu3.

I get a different issue that only shows up at runtime in clang 
3.6.2-svn240577-1~exp1:

ERROR: This system does not support "FSGSBASE".
Please check that RTE_MACHINE is set correctly.

It appears I probably need to learn how to do a better job on my EXTRA_CFLAGS. 
Do we have some recommendations what should be used on the different Intel CPUs 
to avoid build issues but still get the best performance? This would help a lot.

Matthew.


[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-06-30 Thread Matthew Hall
To be a bit more specific, this is what I had to do to fix it for clang 3.6 SVN 
snapshot release.

I am not sure if there is a better way of handling this situation. I'd love to 
know where I could improve it.

Matthew.

diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index f595cd0..8c883ee 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -77,13 +77,13 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__RDRND__),)
 CPUFLAGS += RDRAND
 endif

-ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
-CPUFLAGS += FSGSBASE
-endif
+#ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
+#CPUFLAGS += FSGSBASE
+#endif

-ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
-CPUFLAGS += F16C
-endif
+#ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
+#CPUFLAGS += F16C
+#endif

 ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
 CPUFLAGS += AVX2


[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-06-30 Thread Matthew Hall
With those two items commented out, and these CFLAGS:

"-g -O0 -fPIC -msse4.2"

it looks like I can reproduce the issue in clang 2.6 series:

/vagrant/external/dpdk/build/include/rte_rtm.h:56:15: error: invalid operand 
for inline asm constraint 'i'
asm volatile(".byte 0xc6,0xf8,%P0" :: "i" (status) : "memory");

So there are definitely some corner cases that seem to be able to trigger it.

On Jun 30, 2015, at 10:17 PM, Matthew Hall  wrote:

> To be a bit more specific, this is what I had to do to fix it for clang 3.6 
> SVN snapshot release.
> 
> I am not sure if there is a better way of handling this situation. I'd love 
> to know where I could improve it.
> 
> Matthew.
> 
> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> index f595cd0..8c883ee 100644
> --- a/mk/rte.cpuflags.mk
> +++ b/mk/rte.cpuflags.mk
> @@ -77,13 +77,13 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__RDRND__),)
> CPUFLAGS += RDRAND
> endif
> 
> -ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
> -CPUFLAGS += FSGSBASE
> -endif
> +#ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
> +#CPUFLAGS += FSGSBASE
> +#endif
> 
> -ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
> -CPUFLAGS += F16C
> -endif
> +#ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
> +#CPUFLAGS += F16C
> +#endif
> 
> ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
> CPUFLAGS += AVX2



[dpdk-dev] rte_lpm4 with expanded next hop support now available

2015-06-30 Thread Matthew Hall
Hello,

Based on the wonderful assistance from Vladimir and Stephen and a close friend 
of mine that is a hypervisor developer who helped me reverse engineer and 
rewrite rte_lpm_lookupx4, I have got a known-working version of rte_lpm4 with 
expanded 24 bit next hop support available here:

https://github.com/megahall/dpdk_mhall/tree/megahall/lpm-expansion

I'm going to be working on rte_lpm6 next, it seems to take a whole ton of 
memory to run the self-test, if anybody knows how much that would help, as it 
seems to run out when I tried it.

Sadly this change is not ABI compatible or performance compatible with the 
original rte_lpm because I had to hack on the bitwise layout to get more data 
in there, and it will run maybe 50% slower because it has to access some more 
memory.

Despite all this I'd really like to do the right thing find a way to contribute 
it back, perhaps as a second kind of rte_lpm, so I wouldn't be the only person 
using it and forking the code when I already met several others who needed it. 
I could use some ideas how to handle the situation.

Matthew.


[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-06-30 Thread Keunhong Lee
I have not used XL710 or i40e.
I have no opinion for those NICs.

Keunhong.

2015-06-29 15:59 GMT+09:00 Pavel Odintsov :

> Hello!
>
> Lee, thank you so much for sharing your experience! What do you think
> about 40GE version of 82599?
>
> On Mon, Jun 29, 2015 at 2:35 AM, Keunhong Lee  wrote:
> > DISCLAIMER: This information is not verified. This is truly my personal
> > opinion.
> >
> > As I know, intel 82599 is the only 10G NIC which supports line rate with
> > minimum sized packets (64 byte).
> > According to our internal tests, Mellanox's 40G NICs even support less
> than
> > 30Mpps.
> > I think 40 Mpps is the hardware capacity.
> >
> > Keunhong.
> >
> >
> >
> > 2015-06-28 19:34 GMT+09:00 Pavel Odintsov :
> >>
> >> Hello, folks!
> >>
> >> We have execute bunch of tests for receive data with Intel XL710 40GE
> >> NIC. We want to achieve wire speed on this platform for traffic
> >> capture.
> >>
> >> But we definitely can't do it. We tried with different versions of
> >> DPDK: 1.4, 1.6, 1.8, 2.0. And have not success.
> >>
> >> We achieved only 40Mpps and could do more.
> >>
> >> Could anybody help us with this issue? Looks like this NIC's could not
> >> work on wire speed :(
> >>
> >> Platform: Intel Xeon E5 e5 2670 + XL 710.
> >>
> >> --
> >> Sincerely yours, Pavel Odintsov
> >
> >
>
>
>
> --
> Sincerely yours, Pavel Odintsov
>


[dpdk-dev] Receiving packets on only one port

2015-06-30 Thread Keunhong Lee
Actually I have no idea why this happens.
If your switch configuration is correct, your program should work just like
Wireshark does.
I wander whether your program uses RSS.
In my opinion, program bug is the most probable reason.

Keunhong.


2015-06-29 23:32 GMT+09:00 Daeyoung Kim :

> Hi Keunhong,
>
> Thank you for your help. Here is the network topology.
>
> DNS Client - Switch 1 - Switch 2 - Switch 3 - DNS Server
>  |  |
>  port 0  port 1
>
> DPDK port 0 receives packets using from the Switch 1 port mirroring and
> port 1 receives packets from the Switch 2 using port mirroring as well. As
> I already said, when I send DNS packets, the wireshark simultaneously gets
> all the packets on two ports. I'm sorry what I told you is incorrect. Using
> my program with promiscuous mode, the port 0 receives only DNS queries from
> the DNS client, but the port 1 receives only DNS replies from the DNS
> server. I'd like to know why it happens.
>
> Thank you very much!
>
> Regards,
> Daeyoung
>
> 2015-06-28 20:22 GMT-04:00 Keunhong Lee :
>
>> I don't know your situation exactly, but here are possible problems.
>>
>> 1. Your switch learned MAC addresses of two ports.
>> 2. Your program bug.
>> 3. l3fwd itself contains some bug.
>> 4. You did not set all ports in promiscuous mode.
>>
>> You'd better try 'pktgen' application to test your environment.
>>
>> Keunhong.
>>
>>
>>
>> 2015-06-27 0:45 GMT+09:00 Daeyoung Kim :
>>
>>> Hello,
>>>
>>> I'm writing a packet capture program from the l3fwd. When I send DNS
>>> packets, the wireshark simultaneously gets all the packets on two ports.
>>> However, using my program with promiscuous mode, one port receives all
>>> the
>>> packets, but the other port gets nothing. Do you know why it happens?
>>> Could
>>> it be network topology problem using DPDK, a DPDK design like forwarding
>>> mechanism, or just my program bugs? Any comments would be appreciated.
>>>
>>> Thanks,
>>> Daeyoung
>>>
>>
>>
>


[dpdk-dev] Receiving packets on only one port

2015-06-30 Thread Keunhong Lee
Check that whether you are polling all RX queues of your program.
You said that your configuration works well with Wireshark.
So I don't think that this is the problem.

You can check your port statistics for the number of received packets.
If the statistics show 2 packets, but you received 1 packet, then your
program might have mistakes.

Keunhong.


2015-06-30 0:39 GMT+09:00 Daeyoung Kim :

> OK, I see. My program uses RSS. Is it related to my problem? If the
> testpmd application does not work, the switch configuration might be wrong.
> Am I right?
>
> Thank you for your answer.
>
> Regards,
> Daeyoung
>
> 2015-06-29 11:28 GMT-04:00 Keunhong Lee :
>
>> Actually I have no idea why this happens.
>> If your switch configuration is correct, your program should work just
>> like Wireshark does.
>> I wander whether your program uses RSS.
>> In my opinion, program bug is the most probable reason.
>>
>> Keunhong.
>>
>>
>> 2015-06-29 23:32 GMT+09:00 Daeyoung Kim :
>>
>>> Hi Keunhong,
>>>
>>> Thank you for your help. Here is the network topology.
>>>
>>> DNS Client - Switch 1 - Switch 2 - Switch 3 - DNS Server
>>>  |  |
>>>  port 0  port 1
>>>
>>> DPDK port 0 receives packets using from the Switch 1 port mirroring and
>>> port 1 receives packets from the Switch 2 using port mirroring as well. As
>>> I already said, when I send DNS packets, the wireshark simultaneously gets
>>> all the packets on two ports. I'm sorry what I told you is incorrect. Using
>>> my program with promiscuous mode, the port 0 receives only DNS queries from
>>> the DNS client, but the port 1 receives only DNS replies from the DNS
>>> server. I'd like to know why it happens.
>>>
>>> Thank you very much!
>>>
>>> Regards,
>>> Daeyoung
>>>
>>> 2015-06-28 20:22 GMT-04:00 Keunhong Lee :
>>>
 I don't know your situation exactly, but here are possible problems.

 1. Your switch learned MAC addresses of two ports.
 2. Your program bug.
 3. l3fwd itself contains some bug.
 4. You did not set all ports in promiscuous mode.

 You'd better try 'pktgen' application to test your environment.

 Keunhong.



 2015-06-27 0:45 GMT+09:00 Daeyoung Kim :

> Hello,
>
> I'm writing a packet capture program from the l3fwd. When I send DNS
> packets, the wireshark simultaneously gets all the packets on two
> ports.
> However, using my program with promiscuous mode, one port receives all
> the
> packets, but the other port gets nothing. Do you know why it happens?
> Could
> it be network topology problem using DPDK, a DPDK design like
> forwarding
> mechanism, or just my program bugs? Any comments would be appreciated.
>
> Thanks,
> Daeyoung
>


>>>
>>
>


[dpdk-dev] [PATCH v3 3/9] cxgbe: add device configuration and RX support for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
On Sun, Jun 28, 2015 at 21:34:28 +0200, Thomas Monjalon wrote:
> 2015-06-18 17:47, Rahul Lakkireddy:
> > Adds RX support for the cxgbe poll mode driver.  This patch:
> > 
> > 1. Adds rx queue related eth_dev_ops.
> > 2. Adds RSS support.
> > 3. Adds dev_configure() and dev_infos_get() eth_dev_ops.
> > 4. Adds rx_pkt_burst for receiving packets.
> > 
> > Signed-off-by: Rahul Lakkireddy 
> > Signed-off-by: Kumar Sanghvi 
> 
> This patch doesn't build with 32-bit GCC because of some printf args (%lu).

I have fixed it in v4. Will post it soon.

Thanks,
Rahul.



[dpdk-dev] [PATCH v3 2/9] cxgbe: add cxgbe poll mode driver.

2015-06-30 Thread Rahul Lakkireddy
Hi Thomas,

On Sun, Jun 28, 2015 at 21:32:32 +0200, Thomas Monjalon wrote:
> 2015-06-18 17:47, Rahul Lakkireddy:
> > +Chelsio cxgbe
> > +M: Rahul Lakkireddy 
> > +F: drivers/net/cxgbe/
> > +F: doc/guides/nics/cxgbe.rst
> 
> Just a detail: the doc file is added in a later patch.
> For consistency, this line should be added later.

I have fixed it in v4. Will post it soon.

> 
> [...]
> > --- a/config/common_linuxapp
> > +++ b/config/common_linuxapp
> > @@ -208,6 +208,16 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
> >  CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
> >  
> >  #
> > +# Compile burst-oriented Chelsio Terminator 10GbE/40GbE (CXGBE) PMD
> > +#
> > +CONFIG_RTE_LIBRTE_CXGBE_PMD=y
> > +CONFIG_RTE_LIBRTE_CXGBE_DEBUG=n
> > +CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG=n
> > +CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX=n
> > +CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX=n
> > +CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX=n
> 
> What is the status of the driver with FreeBSD?

We have compilation fixes in pipeline waiting for internal QA.
We are currently mid-way in testing linux vfio after which we will pick
up FBSD support.

> Could we, at least, add the options in common_bsdapp even disabled?

We tried compilation by keeping it disabled in common_bsdapp and the
compilation goes fine.

However, if its ok with you, we would like to carry out some basic testing
once our QA picks it up before adding this to common_bsdapp.

Thanks,
Rahul


[dpdk-dev] [PATCH v4 0/9] Chelsio Terminator 5 (T5) 10G/40G Poll Mode Driver

2015-06-30 Thread Rahul Lakkireddy
This series of patches add the CXGBE Poll Mode Driver support for Chelsio
Terminator 5 series of 10G/40G adapters.  The CXGBE PMD is split into multiple
patches.  The first patch adds the hardware specific api for all supported
Chelsio T5 adapters and the patches from 2 to 8 add the actual DPDK CXGBE PMD.

Also, the CXGBE PMD is enabled for compilation and linking by patch 2.
MAINTAINERS file is also updated by patch 2 to claim responsibility for the
CXGBE PMD.

More information on the CXGBE PMD can be found in the documentation added by
patch 9.  

v4:
- Fix 32-bit and clang compilation.
- Moved cxgbe doc entry in MAINTAINERS from patch 2 to patch 9 for consistency.

v3:
- Merge patches 10 and 11 with patch 2.
- Add rte_pmd_cxgbe_version.map and add EXPORT_MAP and LIBABIVER to cxgbe
  Makefile.
- Use RTE_DIM macro for calculating ARRAY_SIZE.

v2:
- Move the driver to drivers/net directory and update all config files and
  commit logs.  Also update MAINTAINERS.
- Break the second patch into more patches; incrementally, adding features to
  the cxgbe poll mode driver.
- Replace bitwise operations in finding last (most significant) bit set with
  gcc's __builtin_clz.
- Fix the return value returned by link update eth_dev operation.
- Few bug fixes and code cleanup.

Rahul Lakkireddy (9):
  cxgbe: add hardware specific api for all supported Chelsio T5 series
adapters.
  cxgbe: add cxgbe poll mode driver.
  cxgbe: add device configuration and RX support for cxgbe PMD.
  cxgbe: add TX support for cxgbe PMD.
  cxgbe: add device related operations for cxgbe PMD.
  cxgbe: add port statistics for cxgbe PMD.
  cxgbe: add link related functions for cxgbe PMD.
  cxgbe: add flow control functions for cxgbe PMD.
  doc: add cxgbe PMD documentation under doc/guides/nics/cxgbe.rst

 MAINTAINERS |5 +
 config/common_linuxapp  |   10 +
 doc/guides/nics/cxgbe.rst   |  209 +++
 doc/guides/nics/index.rst   |1 +
 doc/guides/prog_guide/source_org.rst|1 +
 drivers/net/Makefile|1 +
 drivers/net/cxgbe/Makefile  |   78 +
 drivers/net/cxgbe/base/adapter.h|  565 ++
 drivers/net/cxgbe/base/common.h |  401 
 drivers/net/cxgbe/base/t4_chip_type.h   |   79 +
 drivers/net/cxgbe/base/t4_hw.c  | 2686 +++
 drivers/net/cxgbe/base/t4_hw.h  |  149 ++
 drivers/net/cxgbe/base/t4_msg.h |  345 
 drivers/net/cxgbe/base/t4_pci_id_tbl.h  |  148 ++
 drivers/net/cxgbe/base/t4_regs.h|  779 
 drivers/net/cxgbe/base/t4_regs_values.h |  168 ++
 drivers/net/cxgbe/base/t4fw_interface.h | 1730 +
 drivers/net/cxgbe/cxgbe.h   |   60 +
 drivers/net/cxgbe/cxgbe_compat.h|  266 +++
 drivers/net/cxgbe/cxgbe_ethdev.c|  802 
 drivers/net/cxgbe/cxgbe_main.c  | 1207 
 drivers/net/cxgbe/rte_pmd_cxgbe_version.map |4 +
 drivers/net/cxgbe/sge.c | 2241 ++
 mk/rte.app.mk   |1 +
 24 files changed, 11936 insertions(+)
 create mode 100644 doc/guides/nics/cxgbe.rst
 create mode 100644 drivers/net/cxgbe/Makefile
 create mode 100644 drivers/net/cxgbe/base/adapter.h
 create mode 100644 drivers/net/cxgbe/base/common.h
 create mode 100644 drivers/net/cxgbe/base/t4_chip_type.h
 create mode 100644 drivers/net/cxgbe/base/t4_hw.c
 create mode 100644 drivers/net/cxgbe/base/t4_hw.h
 create mode 100644 drivers/net/cxgbe/base/t4_msg.h
 create mode 100644 drivers/net/cxgbe/base/t4_pci_id_tbl.h
 create mode 100644 drivers/net/cxgbe/base/t4_regs.h
 create mode 100644 drivers/net/cxgbe/base/t4_regs_values.h
 create mode 100644 drivers/net/cxgbe/base/t4fw_interface.h
 create mode 100644 drivers/net/cxgbe/cxgbe.h
 create mode 100644 drivers/net/cxgbe/cxgbe_compat.h
 create mode 100644 drivers/net/cxgbe/cxgbe_ethdev.c
 create mode 100644 drivers/net/cxgbe/cxgbe_main.c
 create mode 100644 drivers/net/cxgbe/rte_pmd_cxgbe_version.map
 create mode 100644 drivers/net/cxgbe/sge.c

-- 
2.4.1



[dpdk-dev] [PATCH v4 1/9] cxgbe: add hardware specific api for all supported Chelsio T5 series adapters.

2015-06-30 Thread Rahul Lakkireddy
Adds hardware specific api for all the Chelsio T5 adapters under
drivers/net/cxgbe/base directory.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- No changes

v3:
- No changes

v2:
- Move files to new directory under drivers/net/cxgbe/base and update commit
  log.
- Few bug fixes related to tx.

 drivers/net/cxgbe/base/adapter.h|  565 +++
 drivers/net/cxgbe/base/common.h |  401 +
 drivers/net/cxgbe/base/t4_chip_type.h   |   79 +
 drivers/net/cxgbe/base/t4_hw.c  | 2686 +++
 drivers/net/cxgbe/base/t4_hw.h  |  149 ++
 drivers/net/cxgbe/base/t4_msg.h |  345 
 drivers/net/cxgbe/base/t4_pci_id_tbl.h  |  148 ++
 drivers/net/cxgbe/base/t4_regs.h|  779 +
 drivers/net/cxgbe/base/t4_regs_values.h |  168 ++
 drivers/net/cxgbe/base/t4fw_interface.h | 1730 
 10 files changed, 7050 insertions(+)
 create mode 100644 drivers/net/cxgbe/base/adapter.h
 create mode 100644 drivers/net/cxgbe/base/common.h
 create mode 100644 drivers/net/cxgbe/base/t4_chip_type.h
 create mode 100644 drivers/net/cxgbe/base/t4_hw.c
 create mode 100644 drivers/net/cxgbe/base/t4_hw.h
 create mode 100644 drivers/net/cxgbe/base/t4_msg.h
 create mode 100644 drivers/net/cxgbe/base/t4_pci_id_tbl.h
 create mode 100644 drivers/net/cxgbe/base/t4_regs.h
 create mode 100644 drivers/net/cxgbe/base/t4_regs_values.h
 create mode 100644 drivers/net/cxgbe/base/t4fw_interface.h

diff --git a/drivers/net/cxgbe/base/adapter.h b/drivers/net/cxgbe/base/adapter.h
new file mode 100644
index 000..0ea1c95
--- /dev/null
+++ b/drivers/net/cxgbe/base/adapter.h
@@ -0,0 +1,565 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2014-2015 Chelsio Communications.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Chelsio Communications nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* This file should not be included directly.  Include common.h instead. */
+
+#ifndef __T4_ADAPTER_H__
+#define __T4_ADAPTER_H__
+
+#include 
+
+#include "cxgbe_compat.h"
+#include "t4_regs_values.h"
+
+enum {
+   MAX_ETH_QSETS = 64,   /* # of Ethernet Tx/Rx queue sets */
+};
+
+struct adapter;
+struct sge_rspq;
+
+enum {
+   PORT_RSS_DONE = (1 << 0),
+};
+
+struct port_info {
+   struct adapter *adapter;/* adapter that this port belongs to */
+   struct rte_eth_dev *eth_dev;/* associated rte eth device */
+   struct port_stats stats_base;   /* port statistics base */
+   struct link_config link_cfg;/* link configuration info */
+
+   unsigned long flags;/* port related flags */
+   short int xact_addr_filt;   /* index of exact MAC address filter */
+
+   u16viid;/* associated virtual interface id */
+   s8 mdio_addr;   /* address of the PHY */
+   u8 port_type;   /* firmware port type */
+   u8 mod_type;/* firmware module type */
+   u8 port_id; /* physical port ID */
+   u8 tx_chan; /* associated channel */
+
+   u8 n_rx_qsets;  /* # of rx qsets */
+   u8 n_tx_qsets;  /* # of tx qsets */
+   u8 first_qset;  /* index of first qset */
+
+   u16*rss;/* rss table */
+   u8 rss_mode;/* rss mode */
+   u16rss_size;/* size of VI's RSS tab

[dpdk-dev] [PATCH v4 2/9] cxgbe: add cxgbe poll mode driver.

2015-06-30 Thread Rahul Lakkireddy
Adds cxgbe poll mode driver for DPDK under drivers/net/cxgbe directory. This
patch:

1. Adds the Makefile to compile cxgbe pmd.
2. Registers and initializes the cxgbe pmd driver.

Enable cxgbe PMD for compilation and linking with changes to:
1. config/common_linuxapp to add macros for cxgbe pmd.
2. drivers/net/Makefile to add cxgbe pmd to the compile list.
3. mk/rte.app.mk to add cxgbe pmd to link.

Update MAINTAINERS file to claim responsibility for the cxgbe PMD.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- Moved cxgbe doc entry in MAINTAINERS from patch 2 to patch 9 for consistency.

v3:
- Merge patches 10 and 11 with this patch to enable compilation and update
  MAINTAINERS.  Also, update commit log.
- Add rte_pmd_cxgbe_version.map and add EXPORT_MAP and LIBABIVER to cxgbe
  Makefile.
- Use RTE_DIM macro for calculating ARRAY_SIZE.

v2:
- Move files to new directory under drivers/net/cxgbe and update commit log.
- Move eth_dev_ops to separate patches.
- Update cxgbe Makefile to use relative rather than absolute path of pmd.
- Replace bitwise operations in fls() with gcc's __builtin_clz().
- Fix an issue in adap_init0_tweaks() related to rx_dma_offset.

 MAINTAINERS |   4 +
 config/common_linuxapp  |  10 +
 drivers/net/Makefile|   1 +
 drivers/net/cxgbe/Makefile  |  78 +++
 drivers/net/cxgbe/cxgbe.h   |  48 ++
 drivers/net/cxgbe/cxgbe_compat.h| 266 +++
 drivers/net/cxgbe/cxgbe_ethdev.c| 169 +++
 drivers/net/cxgbe/cxgbe_main.c  | 706 
 drivers/net/cxgbe/rte_pmd_cxgbe_version.map |   4 +
 drivers/net/cxgbe/sge.c | 311 
 mk/rte.app.mk   |   1 +
 11 files changed, 1598 insertions(+)
 create mode 100644 drivers/net/cxgbe/Makefile
 create mode 100644 drivers/net/cxgbe/cxgbe.h
 create mode 100644 drivers/net/cxgbe/cxgbe_compat.h
 create mode 100644 drivers/net/cxgbe/cxgbe_ethdev.c
 create mode 100644 drivers/net/cxgbe/cxgbe_main.c
 create mode 100644 drivers/net/cxgbe/rte_pmd_cxgbe_version.map
 create mode 100644 drivers/net/cxgbe/sge.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 54f0973..ba99f4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -210,6 +210,10 @@ Linux AF_PACKET
 M: John W. Linville 
 F: drivers/net/af_packet/

+Chelsio cxgbe
+M: Rahul Lakkireddy 
+F: drivers/net/cxgbe/
+
 Cisco enic
 F: drivers/net/enic/

diff --git a/config/common_linuxapp b/config/common_linuxapp
index aae22f4..5a16214 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -213,6 +213,16 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1

 #
+# Compile burst-oriented Chelsio Terminator 10GbE/40GbE (CXGBE) PMD
+#
+CONFIG_RTE_LIBRTE_CXGBE_PMD=y
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX=n
+
+#
 # Compile burst-oriented Cisco ENIC PMD driver
 #
 CONFIG_RTE_LIBRTE_ENIC_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 1e6648a..644cacb 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -33,6 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk

 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
+DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
 DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += e1000
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
diff --git a/drivers/net/cxgbe/Makefile b/drivers/net/cxgbe/Makefile
new file mode 100644
index 000..4dfc6b0
--- /dev/null
+++ b/drivers/net/cxgbe/Makefile
@@ -0,0 +1,78 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014-2015 Chelsio Communications.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Chelsio Communications nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE F

[dpdk-dev] [PATCH v4 3/9] cxgbe: add device configuration and RX support for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
Adds RX support for the cxgbe poll mode driver.  This patch:

1. Adds rx queue related eth_dev_ops.
2. Adds RSS support.
3. Adds dev_configure() and dev_infos_get() eth_dev_ops.
4. Adds rx_pkt_burst for receiving packets.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- Fix 32-bit compilation.

v3:
- No changes.

v2:
- This patch is a subset of patch 2/5 submitted in v1.
- Cleanup some RX related macros and code.

 drivers/net/cxgbe/cxgbe.h|   6 +
 drivers/net/cxgbe/cxgbe_ethdev.c | 183 
 drivers/net/cxgbe/cxgbe_main.c   | 350 +++
 drivers/net/cxgbe/sge.c  | 915 +++
 4 files changed, 1454 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h
index 44d48dc..90d1db0 100644
--- a/drivers/net/cxgbe/cxgbe.h
+++ b/drivers/net/cxgbe/cxgbe.h
@@ -44,5 +44,11 @@
 #define CXGBE_DEFAULT_RX_DESC_SIZE1024 /* Default RX ring size */

 int cxgbe_probe(struct adapter *adapter);
+void init_rspq(struct adapter *adap, struct sge_rspq *q, unsigned int us,
+  unsigned int cnt, unsigned int size, unsigned int iqe_size);
+int setup_sge_fwevtq(struct adapter *adapter);
+void cfg_queues(struct rte_eth_dev *eth_dev);
+int cfg_queue_count(struct rte_eth_dev *eth_dev);
+int setup_rss(struct port_info *pi);

 #endif /* _CXGBE_H_ */
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 30d39b4..1c69973 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -85,7 +85,189 @@
  */
 #include "t4_pci_id_tbl.h"

+static uint16_t cxgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct sge_eth_rxq *rxq = (struct sge_eth_rxq *)rx_queue;
+   unsigned int work_done;
+
+   CXGBE_DEBUG_RX(adapter, "%s: rxq->rspq.cntxt_id = %u; nb_pkts = %d\n",
+  __func__, rxq->rspq.cntxt_id, nb_pkts);
+
+   if (cxgbe_poll(&rxq->rspq, rx_pkts, (unsigned int)nb_pkts, &work_done))
+   dev_err(adapter, "error in cxgbe poll\n");
+
+   CXGBE_DEBUG_RX(adapter, "%s: work_done = %u\n", __func__, work_done);
+   return work_done;
+}
+
+static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
+  struct rte_eth_dev_info *device_info)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   int max_queues = adapter->sge.max_ethqsets / adapter->params.nports;
+
+   device_info->min_rx_bufsize = 68; /* XXX: Smallest pkt size */
+   device_info->max_rx_pktlen = 1500; /* XXX: For now we support mtu */
+   device_info->max_rx_queues = max_queues;
+   device_info->max_tx_queues = max_queues;
+   device_info->max_mac_addrs = 1;
+   /* XXX: For now we support one MAC/port */
+   device_info->max_vfs = adapter->params.arch.vfcount;
+   device_info->max_vmdq_pools = 0; /* XXX: For now no support for VMDQ */
+
+   device_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+  DEV_RX_OFFLOAD_IPV4_CKSUM |
+  DEV_RX_OFFLOAD_UDP_CKSUM |
+  DEV_RX_OFFLOAD_TCP_CKSUM;
+
+   device_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+  DEV_TX_OFFLOAD_IPV4_CKSUM |
+  DEV_TX_OFFLOAD_UDP_CKSUM |
+  DEV_TX_OFFLOAD_TCP_CKSUM |
+  DEV_TX_OFFLOAD_TCP_TSO;
+
+   device_info->reta_size = pi->rss_size;
+}
+
+static int cxgbe_dev_rx_queue_start(struct rte_eth_dev *eth_dev,
+   uint16_t tx_queue_id);
+static void cxgbe_dev_rx_queue_release(void *q);
+
+static int cxgbe_dev_configure(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   int err;
+
+   CXGBE_FUNC_TRACE();
+
+   if (!(adapter->flags & FW_QUEUE_BOUND)) {
+   err = setup_sge_fwevtq(adapter);
+   if (err)
+   return err;
+   adapter->flags |= FW_QUEUE_BOUND;
+   }
+
+   err = cfg_queue_count(eth_dev);
+   if (err)
+   return err;
+
+   return 0;
+}
+
+static int cxgbe_dev_rx_queue_start(struct rte_eth_dev *eth_dev,
+   uint16_t rx_queue_id)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adap = pi->adapter;
+   struct sge_rspq *q;
+
+   dev_debug(adapter, "%s: pi->port_id = %d; rx_queue_id = %d\n",
+ __func__, pi->port_id, rx_queue_id);
+
+   q = eth_dev->data->rx_queues[rx_queue_id];
+   return t4_sge_eth_rxq_start(adap, q);
+}
+
+static int cxgbe_dev_rx_queue_stop(s

[dpdk-dev] [PATCH v4 4/9] cxgbe: add TX support for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
Adds TX support for the cxgbe poll mode driver.  This patch:

1. Adds tx queue related eth_dev_ops.
2. Adds tx_pkt_burst for transmitting packets.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- Remove unused code to fix clang compilation.

v3:
- No changes.

v2:
- This patch is a subset of patch 2/5 submitted in v1.
- Few bug fixes for tx path.

 drivers/net/cxgbe/cxgbe_ethdev.c | 133 ++
 drivers/net/cxgbe/cxgbe_main.c   |   1 +
 drivers/net/cxgbe/sge.c  | 957 +++
 3 files changed, 1091 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 1c69973..b6e17e4 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -85,6 +85,39 @@
  */
 #include "t4_pci_id_tbl.h"

+static uint16_t cxgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   struct sge_eth_txq *txq = (struct sge_eth_txq *)tx_queue;
+   uint16_t pkts_sent, pkts_remain;
+   uint16_t total_sent = 0;
+   int ret = 0;
+
+   CXGBE_DEBUG_TX(adapter, "%s: txq = %p; tx_pkts = %p; nb_pkts = %d\n",
+  __func__, txq, tx_pkts, nb_pkts);
+
+   t4_os_lock(&txq->txq_lock);
+   /* free up desc from already completed tx */
+   reclaim_completed_tx(&txq->q);
+   while (total_sent < nb_pkts) {
+   pkts_remain = nb_pkts - total_sent;
+
+   for (pkts_sent = 0; pkts_sent < pkts_remain; pkts_sent++) {
+   ret = t4_eth_xmit(txq, tx_pkts[total_sent + pkts_sent]);
+   if (ret < 0)
+   break;
+   }
+   if (!pkts_sent)
+   break;
+   total_sent += pkts_sent;
+   /* reclaim as much as possible */
+   reclaim_completed_tx(&txq->q);
+   }
+
+   t4_os_unlock(&txq->txq_lock);
+   return total_sent;
+}
+
 static uint16_t cxgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
 {
@@ -131,8 +164,11 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
device_info->reta_size = pi->rss_size;
 }

+static int cxgbe_dev_tx_queue_start(struct rte_eth_dev *eth_dev,
+   uint16_t tx_queue_id);
 static int cxgbe_dev_rx_queue_start(struct rte_eth_dev *eth_dev,
uint16_t tx_queue_id);
+static void cxgbe_dev_tx_queue_release(void *q);
 static void cxgbe_dev_rx_queue_release(void *q);

 static int cxgbe_dev_configure(struct rte_eth_dev *eth_dev)
@@ -157,6 +193,98 @@ static int cxgbe_dev_configure(struct rte_eth_dev *eth_dev)
return 0;
 }

+static int cxgbe_dev_tx_queue_start(struct rte_eth_dev *eth_dev,
+   uint16_t tx_queue_id)
+{
+   struct sge_eth_txq *txq = (struct sge_eth_txq *)
+ (eth_dev->data->tx_queues[tx_queue_id]);
+
+   dev_debug(NULL, "%s: tx_queue_id = %d\n", __func__, tx_queue_id);
+
+   return t4_sge_eth_txq_start(txq);
+}
+
+static int cxgbe_dev_tx_queue_stop(struct rte_eth_dev *eth_dev,
+  uint16_t tx_queue_id)
+{
+   struct sge_eth_txq *txq = (struct sge_eth_txq *)
+ (eth_dev->data->tx_queues[tx_queue_id]);
+
+   dev_debug(NULL, "%s: tx_queue_id = %d\n", __func__, tx_queue_id);
+
+   return t4_sge_eth_txq_stop(txq);
+}
+
+static int cxgbe_dev_tx_queue_setup(struct rte_eth_dev *eth_dev,
+   uint16_t queue_idx, uint16_t nb_desc,
+   unsigned int socket_id,
+   const struct rte_eth_txconf *tx_conf)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   struct sge *s = &adapter->sge;
+   struct sge_eth_txq *txq = &s->ethtxq[pi->first_qset + queue_idx];
+   int err = 0;
+   unsigned int temp_nb_desc;
+
+   RTE_SET_USED(tx_conf);
+
+   dev_debug(adapter, "%s: eth_dev->data->nb_tx_queues = %d; queue_idx = 
%d; nb_desc = %d; socket_id = %d; pi->first_qset = %u\n",
+ __func__, eth_dev->data->nb_tx_queues, queue_idx, nb_desc,
+ socket_id, pi->first_qset);
+
+   /*  Free up the existing queue  */
+   if (eth_dev->data->tx_queues[queue_idx]) {
+   cxgbe_dev_tx_queue_release(eth_dev->data->tx_queues[queue_idx]);
+   eth_dev->data->tx_queues[queue_idx] = NULL;
+   }
+
+   eth_dev->data->tx_queues[queue_idx] = (void *)txq;
+
+   /* Sanity Checking
+*
+* nb_desc should be > 1023 and <= CXGBE_MAX_RING_DESC_SIZE
+*/
+   temp_nb_desc = nb_desc;
+   if (nb_desc < CXGBE_MIN_RING_DESC_SIZE) {
+   dev_warn(adapter, "%s: number of descriptors must be >=

[dpdk-dev] [PATCH v4 5/9] cxgbe: add device related operations for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
Adds dev_start(), dev_stop(), and dev_close() eth_dev_ops for cxgbe poll
mode driver.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- No changes.

v3:
- No changes.

v2:
- This patch is a subset of patch 2/5 submitted in v1.
- Few changes related to tx bug fixes.

 drivers/net/cxgbe/cxgbe.h|   4 ++
 drivers/net/cxgbe/cxgbe_ethdev.c | 114 
 drivers/net/cxgbe/cxgbe_main.c   | 139 +++
 drivers/net/cxgbe/sge.c  |  58 
 4 files changed, 315 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h
index 90d1db0..bf08baf 100644
--- a/drivers/net/cxgbe/cxgbe.h
+++ b/drivers/net/cxgbe/cxgbe.h
@@ -44,6 +44,10 @@
 #define CXGBE_DEFAULT_RX_DESC_SIZE1024 /* Default RX ring size */

 int cxgbe_probe(struct adapter *adapter);
+int cxgbe_up(struct adapter *adap);
+int cxgbe_down(struct port_info *pi);
+void cxgbe_close(struct adapter *adapter);
+int link_start(struct port_info *pi);
 void init_rspq(struct adapter *adap, struct sge_rspq *q, unsigned int us,
   unsigned int cnt, unsigned int size, unsigned int iqe_size);
 int setup_sge_fwevtq(struct adapter *adapter);
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index b6e17e4..cb100fc 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -171,6 +171,117 @@ static int cxgbe_dev_rx_queue_start(struct rte_eth_dev 
*eth_dev,
 static void cxgbe_dev_tx_queue_release(void *q);
 static void cxgbe_dev_rx_queue_release(void *q);

+/*
+ * Stop device.
+ */
+static void cxgbe_dev_close(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   int i, dev_down = 0;
+
+   CXGBE_FUNC_TRACE();
+
+   if (!(adapter->flags & FULL_INIT_DONE))
+   return;
+
+   cxgbe_down(pi);
+
+   /*
+*  We clear queues only if both tx and rx path of the port
+*  have been disabled
+*/
+   t4_sge_eth_clear_queues(pi);
+
+   /*  See if all ports are down */
+   for_each_port(adapter, i) {
+   pi = adap2pinfo(adapter, i);
+   /*
+* Skip first port of the adapter since it will be closed
+* by DPDK
+*/
+   if (i == 0)
+   continue;
+   dev_down += (pi->eth_dev->data->dev_started == 0) ? 1 : 0;
+   }
+
+   /* If rest of the ports are stopped, then free up resources */
+   if (dev_down == (adapter->params.nports - 1))
+   cxgbe_close(adapter);
+}
+
+/* Start the device.
+ * It returns 0 on success.
+ */
+static int cxgbe_dev_start(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   int err = 0, i;
+
+   CXGBE_FUNC_TRACE();
+
+   /*
+* If we don't have a connection to the firmware there's nothing we
+* can do.
+*/
+   if (!(adapter->flags & FW_OK)) {
+   err = -ENXIO;
+   goto out;
+   }
+
+   if (!(adapter->flags & FULL_INIT_DONE)) {
+   err = cxgbe_up(adapter);
+   if (err < 0)
+   goto out;
+   }
+
+   err = setup_rss(pi);
+   if (err)
+   goto out;
+
+   for (i = 0; i < pi->n_tx_qsets; i++) {
+   err = cxgbe_dev_tx_queue_start(eth_dev, i);
+   if (err)
+   goto out;
+   }
+
+   for (i = 0; i < pi->n_rx_qsets; i++) {
+   err = cxgbe_dev_rx_queue_start(eth_dev, i);
+   if (err)
+   goto out;
+   }
+
+   err = link_start(pi);
+   if (err)
+   goto out;
+
+out:
+   return err;
+}
+
+/*
+ * Stop device: disable rx and tx functions to allow for reconfiguring.
+ */
+static void cxgbe_dev_stop(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+
+   CXGBE_FUNC_TRACE();
+
+   if (!(adapter->flags & FULL_INIT_DONE))
+   return;
+
+   cxgbe_down(pi);
+
+   /*
+*  We clear queues only if both tx and rx path of the port
+*  have been disabled
+*/
+   t4_sge_eth_clear_queues(pi);
+}
+
 static int cxgbe_dev_configure(struct rte_eth_dev *eth_dev)
 {
struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
@@ -390,6 +501,9 @@ static void cxgbe_dev_rx_queue_release(void *q)
 }

 static struct eth_dev_ops cxgbe_eth_dev_ops = {
+   .dev_start  = cxgbe_dev_start,
+   .dev_stop   = cxgbe_dev_stop,
+   .dev_close  = cxgbe_dev_close,
.dev_configure  = cxgbe_dev_configure,
   

[dpdk-dev] [PATCH v4 6/9] cxgbe: add port statistics for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
Adds stats_get() and stats_reset() eth_dev_ops for cxgbe poll mode driver.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- No changes.

v3:
- No changes.

v2:
- This patch is a subset of patch 2/5 submitted in v1.

 drivers/net/cxgbe/cxgbe.h|  2 +
 drivers/net/cxgbe/cxgbe_ethdev.c | 83 
 drivers/net/cxgbe/cxgbe_main.c   | 11 ++
 3 files changed, 96 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h
index bf08baf..97c37d2 100644
--- a/drivers/net/cxgbe/cxgbe.h
+++ b/drivers/net/cxgbe/cxgbe.h
@@ -47,6 +47,8 @@ int cxgbe_probe(struct adapter *adapter);
 int cxgbe_up(struct adapter *adap);
 int cxgbe_down(struct port_info *pi);
 void cxgbe_close(struct adapter *adapter);
+void cxgbe_stats_get(struct port_info *pi, struct port_stats *stats);
+void cxgbe_stats_reset(struct port_info *pi);
 int link_start(struct port_info *pi);
 void init_rspq(struct adapter *adap, struct sge_rspq *q, unsigned int us,
   unsigned int cnt, unsigned int size, unsigned int iqe_size);
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index cb100fc..600a16c 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -500,6 +500,87 @@ static void cxgbe_dev_rx_queue_release(void *q)
}
 }

+/*
+ * Get port statistics.
+ */
+static void cxgbe_dev_stats_get(struct rte_eth_dev *eth_dev,
+   struct rte_eth_stats *eth_stats)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   struct sge *s = &adapter->sge;
+   struct port_stats ps;
+   unsigned int i;
+
+   cxgbe_stats_get(pi, &ps);
+
+   /* RX Stats */
+   eth_stats->ipackets = ps.rx_frames;
+   eth_stats->ibytes   = ps.rx_octets;
+   eth_stats->imcasts  = ps.rx_mcast_frames;
+   eth_stats->imissed  = ps.rx_ovflow0 + ps.rx_ovflow1 +
+ ps.rx_ovflow2 + ps.rx_ovflow3 +
+ ps.rx_trunc0 + ps.rx_trunc1 +
+ ps.rx_trunc2 + ps.rx_trunc3;
+   eth_stats->ibadcrc  = ps.rx_fcs_err;
+   eth_stats->ibadlen  = ps.rx_jabber + ps.rx_too_long + ps.rx_runt;
+   eth_stats->ierrors  = ps.rx_symbol_err + eth_stats->ibadcrc +
+ eth_stats->ibadlen + ps.rx_len_err +
+ eth_stats->imissed;
+   eth_stats->rx_pause_xon  = ps.rx_pause;
+
+   /* TX Stats */
+   eth_stats->opackets = ps.tx_frames;
+   eth_stats->obytes   = ps.tx_octets;
+   eth_stats->oerrors  = ps.tx_error_frames;
+   eth_stats->tx_pause_xon  = ps.tx_pause;
+
+   for (i = 0; i < pi->n_rx_qsets; i++) {
+   struct sge_eth_rxq *rxq =
+   &s->ethrxq[pi->first_qset + i];
+
+   eth_stats->q_ipackets[i] = rxq->stats.pkts;
+   eth_stats->q_ibytes[i] = rxq->stats.rx_bytes;
+   }
+
+   for (i = 0; i < pi->n_tx_qsets; i++) {
+   struct sge_eth_txq *txq =
+   &s->ethtxq[pi->first_qset + i];
+
+   eth_stats->q_opackets[i] = txq->stats.pkts;
+   eth_stats->q_obytes[i] = txq->stats.tx_bytes;
+   eth_stats->q_errors[i] = txq->stats.mapping_err;
+   }
+}
+
+/*
+ * Reset port statistics.
+ */
+static void cxgbe_dev_stats_reset(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   struct sge *s = &adapter->sge;
+   unsigned int i;
+
+   cxgbe_stats_reset(pi);
+   for (i = 0; i < pi->n_rx_qsets; i++) {
+   struct sge_eth_rxq *rxq =
+   &s->ethrxq[pi->first_qset + i];
+
+   rxq->stats.pkts = 0;
+   rxq->stats.rx_bytes = 0;
+   }
+   for (i = 0; i < pi->n_tx_qsets; i++) {
+   struct sge_eth_txq *txq =
+   &s->ethtxq[pi->first_qset + i];
+
+   txq->stats.pkts = 0;
+   txq->stats.tx_bytes = 0;
+   txq->stats.mapping_err = 0;
+   }
+}
+
 static struct eth_dev_ops cxgbe_eth_dev_ops = {
.dev_start  = cxgbe_dev_start,
.dev_stop   = cxgbe_dev_stop,
@@ -514,6 +595,8 @@ static struct eth_dev_ops cxgbe_eth_dev_ops = {
.rx_queue_start = cxgbe_dev_rx_queue_start,
.rx_queue_stop  = cxgbe_dev_rx_queue_stop,
.rx_queue_release   = cxgbe_dev_rx_queue_release,
+   .stats_get  = cxgbe_dev_stats_get,
+   .stats_reset= cxgbe_dev_stats_reset,
 };

 /*
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index 7995d3c..dad0a98 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -310,6 +310,17 @@ void cfg_queues(struct rte_eth_dev *eth_de

[dpdk-dev] [PATCH v4 7/9] cxgbe: add link related functions for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
Adds link update, promiscuous and multicast related eth_dev_ops for cxgbe poll
mode driver.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- No changes.

v3:
- No changes.

v2:
- This patch is a subset of patch 2/5 submitted in v1.
- Update cxgbe_dev_link_update() to return correct value.

 drivers/net/cxgbe/cxgbe_ethdev.c | 66 
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 600a16c..c0dd5f3 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -164,6 +164,67 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
device_info->reta_size = pi->rss_size;
 }

+static void cxgbe_dev_promiscuous_enable(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+
+   t4_set_rxmode(adapter, adapter->mbox, pi->viid, -1,
+ 1, -1, 1, -1, false);
+}
+
+static void cxgbe_dev_promiscuous_disable(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+
+   t4_set_rxmode(adapter, adapter->mbox, pi->viid, -1,
+ 0, -1, 1, -1, false);
+}
+
+static void cxgbe_dev_allmulticast_enable(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+
+   /* TODO: address filters ?? */
+
+   t4_set_rxmode(adapter, adapter->mbox, pi->viid, -1,
+ -1, 1, 1, -1, false);
+}
+
+static void cxgbe_dev_allmulticast_disable(struct rte_eth_dev *eth_dev)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+
+   /* TODO: address filters ?? */
+
+   t4_set_rxmode(adapter, adapter->mbox, pi->viid, -1,
+ -1, 0, 1, -1, false);
+}
+
+static int cxgbe_dev_link_update(struct rte_eth_dev *eth_dev,
+__rte_unused int wait_to_complete)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   struct sge *s = &adapter->sge;
+   struct rte_eth_link *old_link = ð_dev->data->dev_link;
+   unsigned int work_done, budget = 4;
+
+   cxgbe_poll(&s->fw_evtq, NULL, budget, &work_done);
+   if (old_link->link_status == pi->link_cfg.link_ok)
+   return -1;  /* link not changed */
+
+   eth_dev->data->dev_link.link_status = pi->link_cfg.link_ok;
+   eth_dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX;
+   eth_dev->data->dev_link.link_speed = pi->link_cfg.speed;
+
+   /* link has changed */
+   return 0;
+}
+
 static int cxgbe_dev_tx_queue_start(struct rte_eth_dev *eth_dev,
uint16_t tx_queue_id);
 static int cxgbe_dev_rx_queue_start(struct rte_eth_dev *eth_dev,
@@ -585,8 +646,13 @@ static struct eth_dev_ops cxgbe_eth_dev_ops = {
.dev_start  = cxgbe_dev_start,
.dev_stop   = cxgbe_dev_stop,
.dev_close  = cxgbe_dev_close,
+   .promiscuous_enable = cxgbe_dev_promiscuous_enable,
+   .promiscuous_disable= cxgbe_dev_promiscuous_disable,
+   .allmulticast_enable= cxgbe_dev_allmulticast_enable,
+   .allmulticast_disable   = cxgbe_dev_allmulticast_disable,
.dev_configure  = cxgbe_dev_configure,
.dev_infos_get  = cxgbe_dev_info_get,
+   .link_update= cxgbe_dev_link_update,
.tx_queue_setup = cxgbe_dev_tx_queue_setup,
.tx_queue_start = cxgbe_dev_tx_queue_start,
.tx_queue_stop  = cxgbe_dev_tx_queue_stop,
-- 
2.4.1



[dpdk-dev] [PATCH v4 8/9] cxgbe: add flow control functions for cxgbe PMD.

2015-06-30 Thread Rahul Lakkireddy
Adds flow control related eth_dev_ops for cxgbe poll mode driver.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- No changes.

v3:
- No changes.

v2:
- This patch is a subset of patch 2/5 submitted in v1.

 drivers/net/cxgbe/cxgbe_ethdev.c | 54 
 1 file changed, 54 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index c0dd5f3..478051a 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -642,6 +642,58 @@ static void cxgbe_dev_stats_reset(struct rte_eth_dev 
*eth_dev)
}
 }

+static int cxgbe_flow_ctrl_get(struct rte_eth_dev *eth_dev,
+  struct rte_eth_fc_conf *fc_conf)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct link_config *lc = &pi->link_cfg;
+   int rx_pause, tx_pause;
+
+   fc_conf->autoneg = lc->fc & PAUSE_AUTONEG;
+   rx_pause = lc->fc & PAUSE_RX;
+   tx_pause = lc->fc & PAUSE_TX;
+
+   if (rx_pause && tx_pause)
+   fc_conf->mode = RTE_FC_FULL;
+   else if (rx_pause)
+   fc_conf->mode = RTE_FC_RX_PAUSE;
+   else if (tx_pause)
+   fc_conf->mode = RTE_FC_TX_PAUSE;
+   else
+   fc_conf->mode = RTE_FC_NONE;
+   return 0;
+}
+
+static int cxgbe_flow_ctrl_set(struct rte_eth_dev *eth_dev,
+  struct rte_eth_fc_conf *fc_conf)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   struct link_config *lc = &pi->link_cfg;
+
+   if (lc->supported & FW_PORT_CAP_ANEG) {
+   if (fc_conf->autoneg)
+   lc->requested_fc |= PAUSE_AUTONEG;
+   else
+   lc->requested_fc &= ~PAUSE_AUTONEG;
+   }
+
+   if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+   (fc_conf->mode & RTE_FC_RX_PAUSE))
+   lc->requested_fc |= PAUSE_RX;
+   else
+   lc->requested_fc &= ~PAUSE_RX;
+
+   if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+   (fc_conf->mode & RTE_FC_TX_PAUSE))
+   lc->requested_fc |= PAUSE_TX;
+   else
+   lc->requested_fc &= ~PAUSE_TX;
+
+   return t4_link_l1cfg(adapter, adapter->mbox, pi->tx_chan,
+&pi->link_cfg);
+}
+
 static struct eth_dev_ops cxgbe_eth_dev_ops = {
.dev_start  = cxgbe_dev_start,
.dev_stop   = cxgbe_dev_stop,
@@ -663,6 +715,8 @@ static struct eth_dev_ops cxgbe_eth_dev_ops = {
.rx_queue_release   = cxgbe_dev_rx_queue_release,
.stats_get  = cxgbe_dev_stats_get,
.stats_reset= cxgbe_dev_stats_reset,
+   .flow_ctrl_get  = cxgbe_flow_ctrl_get,
+   .flow_ctrl_set  = cxgbe_flow_ctrl_set,
 };

 /*
-- 
2.4.1



[dpdk-dev] [PATCH v4 9/9] doc: add cxgbe PMD documentation under doc/guides/nics/cxgbe.rst

2015-06-30 Thread Rahul Lakkireddy
Adds cxgbe poll mode driver documentation under the usual doc/guides/nics/
directory with the rest of the drivers.  The documentation covers cxgbe
implementation details, features and limitations, prerequisites, configuration,
and a sample application usage.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v4:
- Add cxgbe doc entry to MAINTAINERS.

v3:
- No changes.

v2:
- Move cxgbe entry to Drivers section in source_org.rst.
- Order the cxgbe entry alphabetically in index.rst.

 MAINTAINERS  |   1 +
 doc/guides/nics/cxgbe.rst| 209 +++
 doc/guides/nics/index.rst|   1 +
 doc/guides/prog_guide/source_org.rst |   1 +
 4 files changed, 212 insertions(+)
 create mode 100644 doc/guides/nics/cxgbe.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ba99f4b..659672e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -213,6 +213,7 @@ F: drivers/net/af_packet/
 Chelsio cxgbe
 M: Rahul Lakkireddy 
 F: drivers/net/cxgbe/
+F: doc/guides/nics/cxgbe.rst

 Cisco enic
 F: drivers/net/enic/
diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst
new file mode 100644
index 000..54a019d
--- /dev/null
+++ b/doc/guides/nics/cxgbe.rst
@@ -0,0 +1,209 @@
+..  BSD LICENSE
+Copyright 2015 Chelsio Communications.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Chelsio Communications nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+CXGBE Poll Mode Driver
+==
+
+The CXGBE PMD (**librte_pmd_cxgbe**) provides poll mode driver support
+for **Chelsio T5** 10/40 Gbps family of adapters.
+
+More information can be found at `Chelsio Communications
+`_.
+
+Features
+
+
+CXGBE PMD has the support for:
+
+- Multiple queues for TX and RX.
+- Receiver Side Steering (RSS).
+- VLAN filtering.
+- Checksum offload.
+- Promiscuous mode.
+- All multicast mode.
+- Port hardware statistics.
+
+Limitations
+---
+
+The Chelsio T5 devices provide two/four ports but expose a single PCI bus
+address, thus, librte_pmd_cxgbe registers itself as a
+PCI driver that allocates one Ethernet device per detected port.
+
+For this reason, one cannot white/blacklist a single port without also
+white/blacklisting the others on the same device.
+
+Configuration
+-
+
+Compiling CXGBE PMD
+~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_CXGBE_PMD`` (default **y**)
+
+  Toggle compilation of librte_pmd_cxgbe driver.
+
+- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG`` (default **n**)
+
+  Toggle debugging code. Enabling this option adds additional generic debugging
+  messages at the cost of lower performance.
+
+- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG`` (default **n**)
+
+  Toggle debugging code. Enabling this option adds additional registers related
+  run-time checks and debugging messages at the cost of lower performance.
+
+- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX`` (default **n**)
+
+  Toggle debugging code. Enabling this option adds additional firmware mailbox
+  related run-time checks and debugging messages at the cost of lower
+  performance.
+
+- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX`` (default **n**)
+
+  Toggle debugging code. Enabling this option adds additional transmission data
+  path run-time checks and debugging messages at the cost of lower performance.
+
+- ``CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX`` (default **n**)
+
+  Toggle debugging code. Enabling this option adds additional receiving data
+  path run-time checks and de

[dpdk-dev] [PATCH v2] librte_ether: release memory in uninit function.

2015-06-30 Thread Qiu, Michael
On 6/30/2015 12:42 AM, Iremonger, Bernard wrote:
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Monday, June 29, 2015 4:22 PM
>> To: Iremonger, Bernard; dev at dpdk.org
>> Cc: Zhang, Helin; Ananyev, Konstantin; mukawa at igel.co.jp; Stephen
>> Hemminger
>> Subject: Re: [PATCH v2] librte_ether: release memory in uninit function.
>>
>> On 2015/6/29 18:20, Iremonger, Bernard wrote:
 -Original Message-
 From: Qiu, Michael
 Sent: Monday, June 29, 2015 9:55 AM
 To: Iremonger, Bernard; dev at dpdk.org
 Cc: Zhang, Helin; Ananyev, Konstantin; mukawa at igel.co.jp; Stephen
 Hemminger
 Subject: Re: [PATCH v2] librte_ether: release memory in uninit function.

 On 6/26/2015 5:32 PM, Iremonger, Bernard wrote:
> Changes in v2:
> do not free mac_addrs and hash_mac_addrs here.
>
> Signed-off-by: Bernard Iremonger 
> ---
>  lib/librte_ether/rte_ethdev.c |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.c
> b/lib/librte_ether/rte_ethdev.c index e13fde5..7ae101a 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -369,8 +369,12 @@ rte_eth_dev_uninit(struct rte_pci_device
 *pci_dev)
>   /* free ether device */
>   rte_eth_dev_release_port(eth_dev);
>
> - if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> + if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> + rte_free(eth_dev->data->rx_queues);
> + rte_free(eth_dev->data->tx_queues);
>   rte_free(eth_dev->data->dev_private);
> + memset(eth_dev->data, 0, sizeof(struct
 rte_eth_dev_data));
> + }
>
>   eth_dev->pci_dev = NULL;
>   eth_dev->driver = NULL;
 Actually, This could be put in rte_eth_dev_close() becasue queues
 should be released when closed.

 Also before free dev->data->rx_queues you should make sure
 dev->data->rx_queues[i] has been freed in PMD close() function, So
 dev->data->this
 two should be better done at the same time, ether in
 rte_eth_dev_close() or in PMD close() function. For hotplug in fm10k,
 I put it in PMD close() function.

 Thanks,
 Michael
>>> Hi Michael,
>>>
>>> The consensus is that the rx_queue and tx_queue memory should not be
>> released in the PMD as it is not allocated by the PMD. The memory is
>> allocated in rte_eth_dev_rx_queue_config() and
>> rte_eth_dev_tx_queue_config(), which are both called from
>> rte_eth_dev_configure() which is called by the application (for example
>> test_pmd). So it seems to make sense to free this memory  in
>> rte_eth_dev_uninit().
>>
>> It really make sense to free memory in rte_ether level, but when close a port
>> with out detached? just as stop --> close() --> quit(), the memory will not 
>> be
>> released :)
>>
> In the above scenario lots of memory will not be released.
>
> This is why the detach() and the underlying dev_uninit() functions were 
> introduced.

First detach is only for hotplug, for *users do not use hotplug*, that
scenario is the right action. So  "lots of memory will not be released"
is issue need be fixed, actually, in fm10k driver, lots of memory has
been released.

> The dev_uninit() functions currently call dev_close()  which in turn calls 
> dev_stop() which calls dev_clear_queues(). 

Users do hotplug then must call stop() --> close() --> dev_uninit(), it
works fine. But do you think it make sense to release memory when
close() it?

> The dev_clear_queues()  function does not release the queue_memory or the 
> queue array memory. The queue memory is now released in the dev_uninit() and 
> the  queue array memory is released in the rte_eth_dev_uninit() function.

That's your implementation,  make sure not all users will detach a
device, but the right action must include close(), do you agree?

>
> If the queue array memory is released in rte_eth_dev_close() then the release 
> of the queue_memory will have to be moved to the dev_close() functions from 
> the dev_uninit() functions. This will impact all the existing  PMD hotplug 
> patches.   It will also change the existing dev_close() functionality.

Why impact?? Actually it works fine with fm10k driver. What I concern is
*when user do not use hotplug*, it will lead lots of memory not
released, that unacceptable, to move release action to
rte_eth_dev_close()  is just a   suggestion by me, I think *the solution
should cover both scenario*, am I right?


>
> My preference is to leave the existing dev_close() functions unchanged as far 
> as possible and to do what needs to be done in the dev_uninit() functions.
>
> We probably need the view of the maintainers as to whether this should be 
> done in the close() or uninit() functions.  
>
> Regards,
>
> Bernard.
>
>  
>
>
>
>



[dpdk-dev] Why pktgen-dpdk will receive more packet than sent

2015-06-30 Thread Jiang, Yunhong
Hi, all
I'm trying to test the l2fwd throughput using 
https://github.com/pktgen/Pktgen-DPDK/blob/master/dpdk/examples/pktgen/scripts/rfc2544.lua
 .  However, the pktgen will receive more packets than it sent out like " Total 
sent 14786446 recv 14786597, delta 151". I'm really confused why this will 
happen. I have the pktgen machine and l2fwd machine connected to each other 
directly and bind the igb_uio to the NIC devices.

Pktgen> *** Test packet size 64 at rate 100%
Total sent 133856261 recv 131517412, delta -2338849
*** Test packet size 64 at rate 50%
Total sent 40208555 recv 40207420, delta -1135
*** Test packet size 64 at rate 25%
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size 
no less than 32.
Total sent 14786446 recv 14786597, delta 151
*** Test packet size 64 at rate 37%
PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst size 
no less than 32.
Total sent 26824256 recv 26823587, delta -669

Any hints will be really helpful, thanks!

Thanks
--jyh


[dpdk-dev] [PATCH v3 1/8] eal: Add pci_uio_alloc_uio_resource()

2015-06-30 Thread Tetsuya Mukawa
On 2015/06/29 22:24, Iremonger, Bernard wrote:
>
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Monday, June 29, 2015 3:57 AM
>> To: dev at dpdk.org
>> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Tetsuya.Mukawa
>> Subject: [PATCH v3 1/8] eal: Add pci_uio_alloc_uio_resource()
>>
>> From: "Tetsuya.Mukawa" 
>>
>> This patch adds a new function called pci_uio_alloc_uio_resource().
>> The function hides how to prepare uio resource in linuxapp and bsdapp.
>> With the function, pci_uio_map_resource() will be more abstracted.
>>
>> Signed-off-by: Tetsuya Mukawa 
> Hi Tetsuya,
>
> There are two comments inline below.
>
>> ---
>>  lib/librte_eal/bsdapp/eal/eal_pci.c   | 70 +++-
>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 77 ++---
>> --
>>  2 files changed, 104 insertions(+), 43 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> index 06c564f..2d9f3a5 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> @@ -189,28 +189,17 @@ pci_uio_map_secondary(struct rte_pci_device
>> *dev)
>>  return 1;
>>  }
>>
>> -/* map the PCI resource of a PCI device in virtual memory */  static int -
>> pci_uio_map_resource(struct rte_pci_device *dev)
>> +pci_uio_alloc_uio_resource(struct rte_pci_device *dev,
>> +struct mapped_pci_resource **uio_res)
> The name of this function is a bit longwinded,  pci_uio_alloc_resource() 
> might be better.

Hi Bernard,

Thanks for comments.
Sure I will fix it.

>>
>> -/* map the PCI resource of a PCI device in virtual memory */ -int -
>> pci_uio_map_resource(struct rte_pci_device *dev)
>> +static int
>> +pci_uio_alloc_uio_resource(struct rte_pci_device *dev,
>> +struct mapped_pci_resource **uio_res)
> The name of this function is a bit longwinded,  pci_uio_alloc_resource() 
> might be better.

Also, I will fix above.

Regards,
Tetsuya


[dpdk-dev] [PATCH v3 2/8] eal: Add pci_uio_map_uio_resource_by_index()

2015-06-30 Thread Tetsuya Mukawa
On 2015/06/29 22:36, Iremonger, Bernard wrote:
>
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Monday, June 29, 2015 3:57 AM
>> To: dev at dpdk.org
>> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Tetsuya.Mukawa
>> Subject: [PATCH v3 2/8] eal: Add pci_uio_map_uio_resource_by_index()
>>
>> From: "Tetsuya.Mukawa" 
>>
>> This patch adds a new function called pci_uio_map_resource_by_index().
>> The function hides how to map uio resource in linuxapp and bsdapp.
>> With the function, pci_uio_map_resource() will be more abstracted.
>>
>> Signed-off-by: Tetsuya Mukawa 
> Hi Tetsuya,
>
> There are two comments inline below.
>
>> ---
>>  lib/librte_eal/bsdapp/eal/eal_pci.c   | 107 +++---
>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 124 +-
>> 
>>  2 files changed, 133 insertions(+), 98 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> index 2d9f3a5..61d1fe5 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> @@ -240,20 +240,73 @@ close_fd:
>>  return -1;
>>  }
>>
>> +static int
>> +pci_uio_map_uio_resource_by_index(struct rte_pci_device *dev, int
>> res_idx,
>> +struct mapped_pci_resource *uio_res, int map_idx) {
> The name of this function is a bit long winded, 
> pci_uio_map_resource_by_index() might be better.
>
>> a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> index 9e0b617..7da4543 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> @@ -333,19 +333,82 @@ close_fd:
>>  return -1;
>>  }
>>
>> +static int
>> +pci_uio_map_uio_resource_by_index(struct rte_pci_device *dev, int
>> res_idx,
>> +struct mapped_pci_resource *uio_res, int map_idx) {
> The name of this function is a bit long winded, 
> pci_uio_map_resource_by_index() might be better.

I will fix above 2 issues.

Regards,
Tetsuya


[dpdk-dev] [PATCH v3 7/8] eal: Consolidate pci uio functions of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
On 2015/06/29 23:03, Iremonger, Bernard wrote:
>
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Monday, June 29, 2015 3:57 AM
>> To: dev at dpdk.org
>> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Tetsuya.Mukawa
>> Subject: [PATCH v3 7/8] eal: Consolidate pci uio functions of linuxapp and
>> bsdapp
>>
>> From: "Tetsuya.Mukawa" 
>>
>> The patch consolidates below functions, and implement these in
>> eal_common_pci_uio.c.
>>  - pci_uio_map_secondary()
>>  - pci_uio_map_resource()
>>  - pci_uio_unmap()
>>  - pci_uio_find_resource()
>>  - pci_uio_unmap_resource()
>>
>> Signed-off-by: Tetsuya Mukawa 
> Hi Tetsuya,
>
> The copyrights of all files in this patch set should be updated to 2015.

Hi Bernard,

Could I make sure this comment?
Is it okay to change copyright of Intel like below?
- Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ Copyright(c) 2010-2015 Intel Corporation. All rights reserved.

Or do you suggest it's better to add my company copyright below Intel?

Regards,
Tetsuya



[dpdk-dev] Why pktgen-dpdk will receive more packet than sent

2015-06-30 Thread Wiles, Keith


On 6/29/15, 9:01 PM, "dev on behalf of Jiang, Yunhong"
 wrote:

>Hi, all
>   I'm trying to test the l2fwd throughput using
>https://github.com/pktgen/Pktgen-DPDK/blob/master/dpdk/examples/pktgen/scr
>ipts/rfc2544.lua .  However, the pktgen will receive more packets than it
>sent out like " Total sent 14786446 recv 14786597, delta 151". I'm really
>confused why this will happen. I have the pktgen machine and l2fwd
>machine connected to each other directly and bind the igb_uio to the NIC
>devices.
>
>Pktgen> *** Test packet size 64 at rate 100%
>Total sent 133856261 recv 131517412, delta -2338849
>*** Test packet size 64 at rate 50%
>Total sent 40208555 recv 40207420, delta -1135
>*** Test packet size 64 at rate 25%
>PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX
>burst size no less than 32.
>Total sent 14786446 recv 14786597, delta 151
>*** Test packet size 64 at rate 37%
>PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX
>burst size no less than 32.
>Total sent 26824256 recv 26823587, delta -669

The number of packets sent/recv are read from the NIC counters and Pktgen
does not count the packets in anyway.

The script could also be in error. I was just looking at the script and
maybe the timing from stoping the transmit and reading the counters is the
issue. You could try putting a delay before the counter read to make sure
all of the packets have been sent/recv along with the reading of the
packets.

I do not have a setup to test it at this time.

Regards,
++Keith
>
>Any hints will be really helpful, thanks!
>
>Thanks
>--jyh



[dpdk-dev] [PATCH] fm10k: support XEN domain0

2015-06-30 Thread He, Shaopeng
Hi Thomas, Stephen,

> -Original Message-
> From: He, Shaopeng
> Sent: Tuesday, June 23, 2015 9:21 AM
> To: Thomas Monjalon
> Cc: Liu, Jijiang; dev at dpdk.org; Stephen Hemminger
> Subject: RE: [dpdk-dev] [PATCH] fm10k: support XEN domain0
> 
> Hi Thomas,
> 
> > -Original Message-
> > From: Liu, Jijiang
> > Sent: Friday, June 05, 2015 11:18 AM
> > To: dev at dpdk.org
> > Cc: He, Shaopeng
> > Subject: RE: [dpdk-dev] [PATCH] fm10k: support XEN domain0
> >
> >
> > Acked-by: Jijiang Liu 
> >
> > I think this patch could be merged before Stephen's following patch[1] is
> > merged, then Stephen should rework the patch[1].
> > Thanks.
> >
> > [1]http://dpdk.org/ml/archives/dev/2015-March/014992.html
> 
> Do you think we can accept this patch in current no-so-elegant way, so user
> can
> use XEN with fm10k from release 2.1; or better to wait for Stephen's patch?
> Thank you in advance for your attention to this matter.
> 
> Best Regards,
> --Shaopeng

This patch is necessary for fm10k to use on XEN environment with DPDK. How could
we move forward, could you please kindly give some advice?

Thanks,
--Shaopeng

> 
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shaopeng He
> > > Sent: Friday, May 15, 2015 4:56 PM
> > > To: dev at dpdk.org
> > > Cc: He, Shaopeng
> > > Subject: [dpdk-dev] [PATCH] fm10k: support XEN domain0
> > >
> > > fm10k was failing to run in XEN domain0, as the physical memory for DMA
> > > should be allocated and translated in a different way for XEN domain0. So
> > > rte_memzone_reserve_bounded() should be used for DMA memory
> > > allocation, and rte_mem_phy2mch() should be used for DMA memory
> > > address translation to support running fm10k PMD in XEN domain0.
> > >
> > > Signed-off-by: Shaopeng He 
> > > ---
> > >  lib/librte_pmd_fm10k/fm10k_ethdev.c | 8 
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c
> > > b/lib/librte_pmd_fm10k/fm10k_ethdev.c
> > > index 275c19c..c85c856 100644
> > > --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
> > > +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
> > > @@ -1004,7 +1004,11 @@ fm10k_rx_queue_setup(struct rte_eth_dev
> > *dev,
> > > uint16_t queue_id,
> > >   return (-ENOMEM);
> > >   }
> > >   q->hw_ring = mz->addr;
> > > +#ifdef RTE_LIBRTE_XEN_DOM0
> > > + q->hw_ring_phys_addr = rte_mem_phy2mch(mz->memseg_id, mz-
> > > >phys_addr);
> > > +#else
> > >   q->hw_ring_phys_addr = mz->phys_addr;
> > > +#endif
> > >
> > >   dev->data->rx_queues[queue_id] = q;
> > >   return 0;
> > > @@ -1150,7 +1154,11 @@ fm10k_tx_queue_setup(struct rte_eth_dev
> > *dev,
> > > uint16_t queue_id,
> > >   return (-ENOMEM);
> > >   }
> > >   q->hw_ring = mz->addr;
> > > +#ifdef RTE_LIBRTE_XEN_DOM0
> > > + q->hw_ring_phys_addr = rte_mem_phy2mch(mz->memseg_id, mz-
> > > >phys_addr);
> > > +#else
> > >   q->hw_ring_phys_addr = mz->phys_addr;
> > > +#endif
> > >
> > >   /*
> > >* allocate memory for the RS bit tracker. Enough slots to hold the
> > > --
> > > 1.9.3
> >



[dpdk-dev] [PATCH v3 1/2] vhost: vhost unix domain socket cleanup

2015-06-30 Thread Xie, Huawei
On 6/30/2015 5:04 AM, Thomas Monjalon wrote:
> Huawei,
> I don't understand this reply. You forgot quoting, you didn't remove useless 
> lines,
> and you seem to reply to yourself.
> Should this patch be applied?
>
Thomas:
Oh, here i remove useless lines.  I am sending a new patch to fix a
potential issue.



[dpdk-dev] [PATCH v3 2/7] mbuf: use the reserved 16 bits for double vlan

2015-06-30 Thread Olivier MATZ
Hi,

On 06/28/2015 10:36 PM, Thomas Monjalon wrote:
> Neil, Olivier,
> Your opinions are requested here.
> Thanks
>
> 2015-06-25 08:31, Zhang, Helin:
>> Hi Neil
> [...]
>>> -279,7 +285,7 @@ struct rte_mbuf {
>>> uint16_t data_len;/**< Amount of data in segment buffer. */
>>> uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
>>> uint16_t vlan_tci;/**< VLAN Tag Control Identifier (CPU order) 
>>> */
>>> -   uint16_t reserved;
>>> +   uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU
>>> +order) */
>> Do you think here is a ABI break or not? Just using the reserved 16 bits, 
>> which was
>> intended for the second_vlan_tag. Thanks in advance!
>> I did not see any "Incompatible" reported by validate_abi.sh.

I don't feel there's any ABI break here. I think an application
should not use the "reserved" fields.


Regards,
Olivier



[dpdk-dev] [PATCH v3 4/8] eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
On 2015/06/30 0:28, Bruce Richardson wrote:
> On Mon, Jun 29, 2015 at 11:56:46AM +0900, Tetsuya Mukawa wrote:
>> From: "Tetsuya.Mukawa" 
>>
>> This patch consolidates below functions, and implements these in common
>> eal code.
>>  - rte_eal_pci_probe_one_driver()
>>  - rte_eal_pci_close_one_driver()
>>
>> Because pci_map_device() is only implemented in linuxapp, the patch
>> implements it in bsdapp too. This implemented function will be merged to
>> linuxapp one with later patch.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_eal/bsdapp/eal/eal_pci.c|  74 ++---
>>  lib/librte_eal/common/eal_common_pci.c | 129 
>>  lib/librte_eal/common/eal_private.h|  21 ++---
>>  lib/librte_eal/linuxapp/eal/eal_pci.c  | 148 
>> ++---
>>  4 files changed, 153 insertions(+), 219 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> index c7017eb..2a623e3 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> @@ -88,7 +88,7 @@ static struct rte_tailq_elem rte_uio_tailq = {
>>  EAL_REGISTER_TAILQ(rte_uio_tailq)
>>  
>>  /* unbind kernel driver for this device */
>> -static int
>> +int
>>  pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
>>  {
>>  RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
>> @@ -430,6 +430,13 @@ skipdev:
>>  return 0;
>>  }
>>  
>> +/* Map pci device */
>> +int
>> +pci_map_device(struct rte_pci_device *dev)
>> +{
>> +return pci_uio_map_resource(dev);
>> +}
>> +
> These lines are added here, but removed again in the next patch in the series.
> Though not wrong, per-se, it just seems untidy. Perhaps the patchset order 
> needs
> to be changed somewhat?
>
> /Bruce

Hi Bruce,

I appreciate your comment.
Sure, I will change the order of these patches.
Could you please check patches I will send later?

Regards,
Tetsuya





[dpdk-dev] [PATCH v3 0/8] Add Port Hotplug support to BSD

2015-06-30 Thread Tetsuya Mukawa
On 2015/06/30 0:30, Bruce Richardson wrote:
> On Mon, Jun 29, 2015 at 11:56:42AM +0900, Tetsuya Mukawa wrote:
>> This patch series adds port hotplug support to BSD.
>> Before applying, following patches should be applied.
>>  - [PATCH v6 1/5] eal: Fix coding style of eal_pci.c and eal_pci_uio.c
>>  - [PATCH v6 2/5] eal: Close file descriptor of uio configuration
>>  - [PATCH v6 3/5] eal: Fix memory leaks and needless increment of 
>> pci_map_addr
>>  - [PATCH v6 4/5] eal/bsdapp: Change names of pci related data structure
>>  - [PATCH v6 5/5] eal: Fix uio mapping differences between linuxapp and 
>> bsdapp
>>
>> Some functions will be consolidated after applying the patches, because
>> these functions are implemented in both Linux and BSD code.
>>
>> PATCH v2 changes:
>>  - Fix license of eal_common_pci_uio.c
>>
>> PATCH v1 changes:
>>  - Rebase to below latest patch series.
>>- [PATCH v6] Clean up pci uio implementations
>>
>>
> The majority of patches in this set seem to be more concerned with cleaning up
> common code between linux and FreeBSD rather than adding hotplug support. Are 
> those
> changes better made as part of the previous patchset listed as a requirement 
> of
> this?
>
> /Bruce

Hi Bruce,

I will move all most all patches to previous patch series.
Also, when I move, I will just add patches to previous series for reviewers.
(I will not change previous patches at all.)

Regards,
Tetsuya


[dpdk-dev] [PATCH v7 00/12] Clean up pci uio implementations

2015-06-30 Thread Tetsuya Mukawa
Currently Linux implementation and BSD implementation have almost same
code about pci uio. This patch series cleans up it.

PATCH v7 changes:
 - Add below patches. Also, the order of patches are changed.
   - eal: Add pci_uio_alloc_resource()
   - eal: Add pci_uio_map_resource_by_index()
   - eal: Consolidate pci_map and mapped_pci_resource of linuxapp and bsdapp
   - eal: Consolidate pci_map/unmap_resource() of linuxapp and bsdapp
   - eal: Consolidate pci uio functions of linuxapp and bsdapp
   - eal: Consolidate pci_map/unmap_device() of linuxapp and bsdapp
   - eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and 
bsdapp
   (Thanks to Bruce Richardson)
 - While adding above, below patches are not changed at all.
   - eal: Fix coding style of eal_pci.c and eal_pci_uio.c
   - eal: Close file descriptor of uio configuration
   - eal: Fix memory leaks and needless increment of pci_map_addr
   - eal/bsdapp: Change names of pci related data structure
   - eal: Fix uio mapping differences between linuxapp and bsdapp
 - some function names are changed like below.
   - pci_uio_alloc_uio_resource() to pci_uio_alloc_resource().
   - pci_uio_map_uio_resource_by_index() to pci_uio_map_resource_by_index().
   (Thanks to Iremonger, Bernard)

PATCH v6 changes:
 - Free mapped resources in pci_uio_map_resource().
 - Fix error handling in pci_uio_map_resource().
   (Thanks to David, Marchand)

PATCH v5 changes:
 - Rebase to latest master branch.

PATCH v4 changes:
 - Rebase to latest master branch.
 - Fix bug in pci_uio_map_resource() of BSD code. 'maps[i].path' shouldn't be 
freed.
 Fixed in below patch:
 [PATCH 3/5] eal: Fix memory leaks and needless increment of pci_map_addr
 - 'path' member of 'struct mapped_pci_resource' should not be removed because 
it will be used in BSD code.
 Fixed in below patch:
 [PATCH 5/5] eal: Fix uio mapping differences between linuxapp and bsdapp

PATCH v3 changes:
 - Squash patches related with pci_map_resource().
 - Free maps[].path to easy to understand.
   (Thanks to Iremonger, Bernard)
 - Close fds opened in this function.
 - Remove unused path variable from mapped_pci_resource structure.

PATCH v2 changes:
 - Move 'if-condition' to later patch series.
 - Fix memory leaks of path.
 - Fix typos.
   (Thanks to David Marchand)
 - Fix commit title and body.
 - Fix pci_map_resource() to handle MAP_FAILED.
   (Thanks to Iremonger, Bernard)

Changes:
 - This patch set is derived from below.
   "[PATCH v2] eal: Port Hotplug support for BSD"
 - Set cfg_fd as -1, when cfg_fd is closed.
   (Thanks to Iremonger, Bernard)
 - Remove needless coding style fixings.
 - Fix coding style of if-else condition.
   (Thanks to Richardson, Bruce)



Tetsuya.Mukawa (12):
  eal: Fix coding style of eal_pci.c and eal_pci_uio.c
  eal: Close file descriptor of uio configuration
  eal: Fix memory leaks and needless increment of pci_map_addr
  eal/bsdapp: Change names of pci related data structure
  eal: Fix uio mapping differences between linuxapp and bsdapp
  eal: Add pci_uio_alloc_resource()
  eal: Add pci_uio_map_resource_by_index()
  eal: Consolidate pci_map and mapped_pci_resource of linuxapp and
bsdapp
  eal: Consolidate pci_map/unmap_resource() of linuxapp and bsdapp
  eal: Consolidate pci uio functions of linuxapp and bsdapp
  eal: Consolidate pci_map/unmap_device() of linuxapp and bsdapp
  eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and
bsdapp

 lib/librte_eal/bsdapp/eal/Makefile |   1 +
 lib/librte_eal/bsdapp/eal/eal_pci.c| 283 ++--
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |   1 +
 lib/librte_eal/common/eal_common_pci.c | 225 
 lib/librte_eal/common/eal_common_pci_uio.c | 270 +++
 lib/librte_eal/common/eal_private.h|  59 -
 lib/librte_eal/common/include/rte_pci.h|  40 +++
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 236 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  39 +--
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 290 ++---
 lib/librte_ether/rte_ethdev.c  |   1 +
 12 files changed, 753 insertions(+), 693 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_pci_uio.c

-- 
2.1.4



[dpdk-dev] [PATCH v7 01/12] eal: Fix coding style of eal_pci.c and eal_pci_uio.c

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch fixes coding style of below files in linuxapp and bsdapp.
 - eal_pci.c
 - eal_pci_uio.c

Signed-off-by: Tetsuya Mukawa 
Acked-by: Stephen Hemminger 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 12 +++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 12 
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 2df5c1c..8e24fd1 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -161,9 +161,10 @@ fail:
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-size_t i;
-struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list = RTE_TAILQ_CAST(rte_uio_tailq.head, 
uio_res_list);
+   size_t i;
+   struct uio_resource *uio_res;
+   struct uio_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);

TAILQ_FOREACH(uio_res, uio_res_list, next) {

@@ -201,7 +202,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
uint64_t pagesz;
struct rte_pci_addr *loc = &dev->addr;
struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list = RTE_TAILQ_CAST(rte_uio_tailq.head, 
uio_res_list);
+   struct uio_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);
struct uio_map *maps;

dev->intr_handle.fd = -1;
@@ -311,7 +313,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
/* FreeBSD has no NUMA support (yet) */
dev->numa_node = 0;

-/* parse resources */
+   /* parse resources */
switch (conf->pc_hdr & PCIM_HDRTYPE) {
case PCIM_HDRTYPE_NORMAL:
max = PCIR_MAX_BAR_0;
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index b5116a7..5d3354d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -92,7 +92,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 {
int fd, i;
struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list = 
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);

TAILQ_FOREACH(uio_res, uio_res_list, next) {

@@ -272,7 +273,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
uint64_t phaddr;
struct rte_pci_addr *loc = &dev->addr;
struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list = 
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
struct pci_map *maps;

dev->intr_handle.fd = -1;
@@ -417,7 +419,8 @@ static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list = 
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);

if (dev == NULL)
return NULL;
@@ -436,7 +439,8 @@ void
 pci_uio_unmap_resource(struct rte_pci_device *dev)
 {
struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list = 
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);

if (dev == NULL)
return;
-- 
2.1.4



[dpdk-dev] [PATCH v7 02/12] eal: Close file descriptor of uio configuration

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

When pci_uio_unmap_resource() is called, a file descriptor that is used
for uio configuration should be closed.

Signed-off-by: Tetsuya Mukawa 
Acked-by: Stephen Hemminger 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 5d3354d..34316b6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -464,8 +464,12 @@ pci_uio_unmap_resource(struct rte_pci_device *dev)

/* close fd if in primary process */
close(dev->intr_handle.fd);
-
dev->intr_handle.fd = -1;
+
+   /* close cfg_fd if in primary process */
+   close(dev->intr_handle.uio_cfg_fd);
+   dev->intr_handle.uio_cfg_fd = -1;
+
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 }
 #endif /* RTE_LIBRTE_EAL_HOTPLUG */
-- 
2.1.4



[dpdk-dev] [PATCH v7 03/12] eal: Fix memory leaks and needless increment of pci_map_addr

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch fixes following memory leaks.
- When open() is failed, uio_res and fds won't be freed in
  pci_uio_map_resource().
- When pci_map_resource() is failed but path is allocated correctly,
  path and fds won't be freed in pci_uio_map_recource().
  Also, some mapped resources should be freed.
- When pci_uio_unmap() is called, path should be freed.

Also, fixes below.
- When pci_map_resource() is failed, mapaddr will be MAP_FAILED.
  In this case, pci_map_addr should not be incremented in
  pci_uio_map_resource().
- To shrink code, move close().
- Remove fail variable.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 14 ++--
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 56 ---
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 8e24fd1..b071f07 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -235,7 +235,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
RTE_LOG(ERR, EAL,
"%s(): cannot store uio mmap details\n", __func__);
-   return -1;
+   goto close_fd;
}

snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -262,8 +262,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
(mapaddr = pci_map_resource(NULL, devname, (off_t)offset,
(size_t)maps[j].size)
) == NULL) {
-   rte_free(uio_res);
-   return -1;
+   goto free_uio_res;
}

maps[j].addr = mapaddr;
@@ -274,6 +273,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);

return 0;
+
+free_uio_res:
+   rte_free(uio_res);
+close_fd:
+   close(dev->intr_handle.fd);
+   dev->intr_handle.fd = -1;
+   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+   return -1;
 }

 /* Scan one pci sysfs entry, and fill the devices list from it. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 34316b6..c3b259b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -308,7 +308,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (dev->intr_handle.uio_cfg_fd < 0) {
RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
cfgname, strerror(errno));
-   return -1;
+   goto close_fd;
}

if (dev->kdrv == RTE_KDRV_IGB_UIO)
@@ -319,7 +319,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
/* set bus master that is not done by uio_pci_generic */
if (pci_uio_set_bus_master(dev->intr_handle.uio_cfg_fd)) {
RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
-   return -1;
+   goto close_fd;
}
}

@@ -328,7 +328,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (uio_res == NULL) {
RTE_LOG(ERR, EAL,
"%s(): cannot store uio mmap details\n", __func__);
-   return -1;
+   goto close_fd;
}

snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -338,7 +338,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
maps = uio_res->maps;
for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {
int fd;
-   int fail = 0;

/* skip empty BAR */
phaddr = dev->mem_resource[i].phys_addr;
@@ -352,6 +351,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
loc->domain, loc->bus, loc->devid, 
loc->function,
i);

+   /* allocate memory to keep path */
+   maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
+   if (maps[map_idx].path == NULL)
+   goto free_uio_res;
+
/*
 * open resource file, to mmap it
 */
@@ -359,7 +363,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (fd < 0) {
RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
devname, strerror(errno));
-   return -1;
+   rte_free(maps[map_idx].path);
+   goto free_uio_res;
}

/* try mapping somewhere close to the end of hugepages */
@@ -368,23 +373,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)

mapaddr = pci_map_resource(pci_map_addr, fd, 0,
 

[dpdk-dev] [PATCH v7 04/12] eal/bsdapp: Change names of pci related data structure

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

To merge pci code of linuxapp and bsdapp, this patch changes names
like below.
 - uio_map to pci_map
 - uio_resource to mapped_pci_resource
 - uio_res_list to mapped_pci_res_list

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index b071f07..8261e09 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -83,7 +83,7 @@
  * enabling bus master.
  */

-struct uio_map {
+struct pci_map {
void *addr;
uint64_t offset;
uint64_t size;
@@ -94,16 +94,16 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-   TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
size_t nb_maps;
-   struct uio_map maps[PCI_MAX_RESOURCE];
+   struct pci_map maps[PCI_MAX_RESOURCE];
 };

-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);

 static struct rte_tailq_elem rte_uio_tailq = {
.name = "UIO_RESOURCE_LIST",
@@ -162,9 +162,9 @@ static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
size_t i;
-   struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);
+   struct mapped_pci_resource *uio_res;
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);

TAILQ_FOREACH(uio_res, uio_res_list, next) {

@@ -201,10 +201,10 @@ pci_uio_map_resource(struct rte_pci_device *dev)
uint64_t offset;
uint64_t pagesz;
struct rte_pci_addr *loc = &dev->addr;
-   struct uio_resource *uio_res;
-   struct uio_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, uio_res_list);
-   struct uio_map *maps;
+   struct mapped_pci_resource *uio_res;
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct pci_map *maps;

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-- 
2.1.4



[dpdk-dev] [PATCH v7 05/12] eal: Fix uio mapping differences between linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch fixes below.
- bsdapp
 - Use map_id in pci_uio_map_resource().
 - Fix interface of pci_map_resource().
 - Move path variable of mapped_pci_resource structure to pci_map.
- linuxapp
 - Remove redundant error message of linuxapp.

'pci_uio_map_resource()' is implemented in both linuxapp and bsdapp,
but interface is different. The patch fixes the function of bsdapp
to do same as linuxapp. After applying it, file descriptor should be
opened and closed out of pci_map_resource().

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 118 ++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  21 +++---
 2 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 8261e09..06c564f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -85,6 +85,7 @@

 struct pci_map {
void *addr;
+   char *path;
uint64_t offset;
uint64_t size;
uint64_t phaddr;
@@ -99,7 +100,7 @@ struct mapped_pci_resource {

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
-   size_t nb_maps;
+   int nb_maps;
struct pci_map maps[PCI_MAX_RESOURCE];
 };

@@ -121,47 +122,30 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
 {
-   int fd;
void *mapaddr;

-   /*
-* open devname, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   goto fail;
-   }
-
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED, fd, offset);
-   close(fd);
-   if (mapaddr == MAP_FAILED ||
-   (requested_addr != NULL && mapaddr != requested_addr)) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr,
+   MAP_SHARED | additional_flags, fd, offset);
+   if (mapaddr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL,
+   "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
+   __func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
-   goto fail;
-   }
-
-   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+   } else
+   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);

return mapaddr;
-
-fail:
-   return NULL;
 }

 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-   size_t i;
+   int i, fd;
struct mapped_pci_resource *uio_res;
struct mapped_pci_res_list *uio_res_list =
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
@@ -169,19 +153,34 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
TAILQ_FOREACH(uio_res, uio_res_list, next) {

/* skip this element if it doesn't match our PCI address */
-   if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+   if (rte_eal_compare_pci_addr(&uio_res->pci_addr, &dev->addr))
continue;

for (i = 0; i != uio_res->nb_maps; i++) {
-   if (pci_map_resource(uio_res->maps[i].addr,
-uio_res->path,
-(off_t)uio_res->maps[i].offset,
-(size_t)uio_res->maps[i].size)
-   != uio_res->maps[i].addr) {
+   /*
+* open devname, to mmap it
+*/
+   fd = open(uio_res->maps[i].path, O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+   uio_res->maps[i].path, strerror(errno));
+   return -1;
+   }
+
+   void *mapaddr = pci_map_resource(uio_res->maps[i].addr,
+   fd, (off_t)uio_res->maps[i].offset,
+   (size_t)uio_res->maps[i].size, 0);
+   if (mapaddr != uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL,
-   "Cannot mmap

[dpdk-dev] [PATCH v7 06/12] eal: Add pci_uio_alloc_resource()

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch adds a new function called pci_uio_alloc_resource().
The function hides how to prepare uio resource in linuxapp and bsdapp.
With the function, pci_uio_map_resource() will be more abstracted.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 70 +++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 77 ++-
 2 files changed, 104 insertions(+), 43 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 06c564f..7d2f8b5 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -189,28 +189,17 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
return 1;
 }

-/* map the PCI resource of a PCI device in virtual memory */
 static int
-pci_uio_map_resource(struct rte_pci_device *dev)
+pci_uio_alloc_resource(struct rte_pci_device *dev,
+   struct mapped_pci_resource **uio_res)
 {
-   int i, map_idx;
char devname[PATH_MAX]; /* contains the /dev/uioX */
-   void *mapaddr;
-   uint64_t phaddr;
-   uint64_t offset;
-   uint64_t pagesz;
-   struct rte_pci_addr *loc = &dev->addr;
-   struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
-   struct pci_map *maps;
+   struct rte_pci_addr *loc;

-   dev->intr_handle.fd = -1;
-   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+   if ((dev == NULL) || (uio_res == NULL))
+   return -1;

-   /* secondary processes - use already recorded details */
-   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-   return pci_uio_map_secondary(dev);
+   loc = &dev->addr;

snprintf(devname, sizeof(devname), "/dev/uio at pci:%u:%u:%u",
dev->addr.bus, dev->addr.devid, dev->addr.function);
@@ -231,18 +220,56 @@ pci_uio_map_resource(struct rte_pci_device *dev)
dev->intr_handle.type = RTE_INTR_HANDLE_UIO;

/* allocate the mapping details for secondary processes*/
-   if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
+   *uio_res = rte_zmalloc("UIO_RES", sizeof(**uio_res), 0);
+   if (*uio_res == NULL) {
RTE_LOG(ERR, EAL,
"%s(): cannot store uio mmap details\n", __func__);
goto close_fd;
}

-   snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
-   memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
+   snprintf((*uio_res)->path, sizeof((*uio_res)->path), "%s", devname);
+   memcpy(&(*uio_res)->pci_addr, &dev->addr, sizeof((*uio_res)->pci_addr));

+   return 0;
+
+close_fd:
+   close(dev->intr_handle.fd);
+   dev->intr_handle.fd = -1;
+   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+   return -1;
+}
+
+/* map the PCI resource of a PCI device in virtual memory */
+static int
+pci_uio_map_resource(struct rte_pci_device *dev)
+{
+   int i, map_idx, ret;
+   char *devname;
+   void *mapaddr;
+   uint64_t phaddr;
+   uint64_t offset;
+   uint64_t pagesz;
+   struct mapped_pci_resource *uio_res = NULL;
+   struct mapped_pci_res_list *uio_res_list =
+   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
+   struct pci_map *maps;
+
+   dev->intr_handle.fd = -1;
+   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+   /* secondary processes - use already recorded details */
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return pci_uio_map_secondary(dev);
+
+   /* allocate uio resource */
+   ret = pci_uio_alloc_resource(dev, &uio_res);
+   if ((ret != 0) || (uio_res == NULL))
+   return ret;

/* Map all BARs */
pagesz = sysconf(_SC_PAGESIZE);
+   devname = uio_res->path;

maps = uio_res->maps;
for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {
@@ -300,7 +327,8 @@ free_uio_res:
for (i = 0; i < map_idx; i++)
rte_free(maps[i].path);
rte_free(uio_res);
-close_fd:
+
+   /* close fd opened by pci_uio_alloc_resource() */
close(dev->intr_handle.fd);
dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 19620fe..9483667 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -254,30 +254,20 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return uio_num;
 }

-/* map the PCI resource of a PCI device in virtual memory */
-int
-pci_uio_map_resource(struct rte_pci_device *dev)
+static int
+pci_uio_alloc_resource(struct rte_pci_device *dev,
+   struct mapped_pci_resour

[dpdk-dev] [PATCH v7 07/12] eal: Add pci_uio_map_resource_by_index()

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch adds a new function called pci_uio_map_resource_by_index().
The function hides how to map uio resource in linuxapp and bsdapp.
With the function, pci_uio_map_resource() will be more abstracted.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 107 +++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 124 +-
 2 files changed, 133 insertions(+), 98 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 7d2f8b5..da81d76 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -240,20 +240,73 @@ close_fd:
return -1;
 }

+static int
+pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
+   struct mapped_pci_resource *uio_res, int map_idx)
+{
+   int fd;
+   char *devname;
+   void *mapaddr;
+   uint64_t offset;
+   uint64_t pagesz;
+   struct pci_map *maps;
+
+   if ((dev == NULL) || (uio_res == NULL) || (uio_res->path == NULL))
+   return -1;
+
+   maps = uio_res->maps;
+   devname = uio_res->path;
+   pagesz = sysconf(_SC_PAGESIZE);
+
+   /* allocate memory to keep path */
+   maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
+   if (maps[map_idx].path == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate memory for "
+   "path: %s\n", strerror(errno));
+   return -1;
+   }
+
+   /*
+* open resource file, to mmap it
+*/
+   fd = open(devname, O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+   devname, strerror(errno));
+   goto fail;
+   }
+
+   /* if matching map is found, then use it */
+   offset = res_idx * pagesz;
+   mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+   (size_t)dev->mem_resource[res_idx].len, 0);
+   close(fd);
+   if (mapaddr == MAP_FAILED)
+   goto fail;
+
+   maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
+   maps[map_idx].size = dev->mem_resource[res_idx].len;
+   maps[map_idx].addr = mapaddr;
+   maps[map_idx].offset = offset;
+   strcpy(maps[map_idx].path, devname);
+   dev->mem_resource[res_idx].addr = mapaddr;
+
+   return 0;
+
+fail:
+   rte_free(maps[map_idx].path);
+   return -1;
+}
+
 /* map the PCI resource of a PCI device in virtual memory */
 static int
 pci_uio_map_resource(struct rte_pci_device *dev)
 {
int i, map_idx, ret;
-   char *devname;
-   void *mapaddr;
uint64_t phaddr;
-   uint64_t offset;
-   uint64_t pagesz;
struct mapped_pci_resource *uio_res = NULL;
struct mapped_pci_res_list *uio_res_list =
RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
-   struct pci_map *maps;

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -268,53 +321,17 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return ret;

/* Map all BARs */
-   pagesz = sysconf(_SC_PAGESIZE);
-   devname = uio_res->path;
-
-   maps = uio_res->maps;
for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {
-   int fd;
-
/* skip empty BAR */
if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
continue;

-   /* allocate memory to keep path */
-   maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
-   if (maps[map_idx].path == NULL) {
-   RTE_LOG(ERR, EAL, "Cannot allocate memory for "
-   "path: %s\n", strerror(errno));
+   ret = pci_uio_map_resource_by_index(dev, i,
+   uio_res, map_idx);
+   if (ret != 0)
goto free_uio_res;
-   }

-   /*
-* open resource file, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   rte_free(maps[map_idx].path);
-   goto free_uio_res;
-   }
-
-   /* if matching map is found, then use it */
-   offset = i * pagesz;
-   mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
-   (size_t)dev->mem_resource[i].len, 0);
-   close(fd);
-   if (mapaddr == MAP_FAILED) {
-   rte_free(maps[map_idx].path);
-   goto free_uio_res;
-   }
-
-   maps[map_idx].phaddr = dev->mem_resource[i

[dpdk-dev] [PATCH v7 08/12] eal: Consolidate pci_map and mapped_pci_resource of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch consolidates below structures, and defines them in common code.
 - struct pci_map
 - strucy mapped_pci_resouces

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c| 24 
 lib/librte_eal/common/include/rte_pci.h| 28 
 lib/librte_eal/linuxapp/eal/eal_pci_init.h | 23 ---
 3 files changed, 28 insertions(+), 47 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index da81d76..c045674 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -82,30 +82,6 @@
  * network card, only providing access to PCI BAR to applications, and
  * enabling bus master.
  */
-
-struct pci_map {
-   void *addr;
-   char *path;
-   uint64_t offset;
-   uint64_t size;
-   uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-   TAILQ_ENTRY(mapped_pci_resource) next;
-
-   struct rte_pci_addr pci_addr;
-   char path[PATH_MAX];
-   int nb_maps;
-   struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-
 static struct rte_tailq_elem rte_uio_tailq = {
.name = "UIO_RESOURCE_LIST",
 };
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 7801fa0..0a2ef09 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -220,6 +220,34 @@ struct rte_pci_driver {
 /** Device driver supports detaching capability */
 #define RTE_PCI_DRV_DETACHABLE 0x0010

+/**
+ * A structure describing a PCI mapping.
+ */
+struct pci_map {
+   void *addr;
+   char *path;
+   uint64_t offset;
+   uint64_t size;
+   uint64_t phaddr;
+};
+
+/**
+ * A structure describing a mapped PCI resource.
+ * For multi-process we need to reproduce all PCI mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;
+
+   struct rte_pci_addr pci_addr;
+   char path[PATH_MAX];
+   int nb_maps;
+   struct pci_map maps[PCI_MAX_RESOURCE];
+};
+
+/** mapped pci device list */
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+
 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
 do {   \
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index aa7b755..d9d1878 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -36,29 +36,6 @@

 #include "eal_vfio.h"

-struct pci_map {
-   void *addr;
-   char *path;
-   uint64_t offset;
-   uint64_t size;
-   uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-   TAILQ_ENTRY(mapped_pci_resource) next;
-
-   struct rte_pci_addr pci_addr;
-   char path[PATH_MAX];
-   int nb_maps;
-   struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-
 /*
  * Helper function to map PCI resources right after hugepages in virtual memory
  */
-- 
2.1.4



[dpdk-dev] [PATCH v7 09/12] eal: Consolidate pci_map/unmap_resource() of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

The patch consolidates below functions, and implemented in common
eal code.
 - pci_map_resource()
 - pci_unmap_resource()

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c| 22 
 lib/librte_eal/common/eal_common_pci.c | 39 
 lib/librte_eal/common/include/rte_pci.h| 11 
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 41 --
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  5 
 5 files changed, 50 insertions(+), 68 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index c045674..aac4826 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -96,28 +96,6 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)
return -ENOTSUP;
 }

-/* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
-int additional_flags)
-{
-   void *mapaddr;
-
-   /* Map the PCI memory resource of device */
-   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED | additional_flags, fd, offset);
-   if (mapaddr == MAP_FAILED) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
-   __func__, fd, requested_addr,
-   (unsigned long)size, (unsigned long)offset,
-   strerror(errno), mapaddr);
-   } else
-   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
-
-   return mapaddr;
-}
-
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 4229aaf..81b8fd6 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -98,6 +99,44 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

+/* map a particular resource from a file */
+void *
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
+{
+   void *mapaddr;
+
+   /* Map the PCI memory resource of device */
+   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
+   MAP_SHARED | additional_flags, fd, offset);
+   if (mapaddr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
+   __func__, fd, requested_addr,
+   (unsigned long)size, (unsigned long)offset,
+   strerror(errno), mapaddr);
+   } else
+   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+   return mapaddr;
+}
+
+/* unmap a particular resource */
+void
+pci_unmap_resource(void *requested_addr, size_t size)
+{
+   if (requested_addr == NULL)
+   return;
+
+   /* Unmap the PCI memory resource of device */
+   if (munmap(requested_addr, size)) {
+   RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
+   __func__, requested_addr, (unsigned long)size,
+   strerror(errno));
+   } else
+   RTE_LOG(DEBUG, EAL, "  PCI memory unmapped at %p\n",
+   requested_addr);
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of all
  * registered driver for the given device. Return -1 if initialization
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 0a2ef09..56dcb46 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -364,6 +364,17 @@ int rte_eal_pci_scan(void);
  */
 int rte_eal_pci_probe(void);

+/**
+ * Map pci resouce.
+ */
+void *pci_map_resource(void *requested_addr, int fd, off_t offset,
+   size_t size, int additional_flags);
+
+/**
+ * Map pci resouce.
+ */
+void pci_unmap_resource(void *requested_addr, size_t size);
+
 #ifdef RTE_LIBRTE_EAL_HOTPLUG
 /**
  * Probe the single PCI device.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index d2adc66..fc99eaa 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -33,7 +33,6 @@

 #include 
 #include 
-#include 

 #include 
 #include 
@@ -142,46 +141,6 @@ pci_find_max_end_va(void)
return RTE_PTR_ADD(last->addr, last->len);
 }

-
-/* map a particular resource from a file */
-void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
-int additional_flags)
-{
-   void *mapaddr;
-
-   /* Map the PCI memory resource of device */
-  

[dpdk-dev] [PATCH v7 10/12] eal: Consolidate pci uio functions of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

The patch consolidates below functions, and implement these
in eal_common_pci_uio.c.
 - pci_uio_map_secondary()
 - pci_uio_map_resource()
 - pci_uio_unmap()
 - pci_uio_find_resource()
 - pci_uio_unmap_resource()

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/Makefile |   1 +
 lib/librte_eal/bsdapp/eal/eal_pci.c| 110 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |   1 +
 lib/librte_eal/common/eal_common_pci_uio.c | 270 +
 lib/librte_eal/common/eal_private.h|  55 +
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  11 +-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 196 +--
 8 files changed, 336 insertions(+), 309 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_pci_uio.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index c73ffb6..40ec648 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -68,6 +68,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_memzone.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_launch.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_memory.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_tailqs.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_errno.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index aac4826..329c268 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -82,10 +82,6 @@
  * network card, only providing access to PCI BAR to applications, and
  * enabling bus master.
  */
-static struct rte_tailq_elem rte_uio_tailq = {
-   .name = "UIO_RESOURCE_LIST",
-};
-EAL_REGISTER_TAILQ(rte_uio_tailq)

 /* unbind kernel driver for this device */
 static int
@@ -96,54 +92,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)
return -ENOTSUP;
 }

-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{
-   int i, fd;
-   struct mapped_pci_resource *uio_res;
-   struct mapped_pci_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
-
-   TAILQ_FOREACH(uio_res, uio_res_list, next) {
-
-   /* skip this element if it doesn't match our PCI address */
-   if (rte_eal_compare_pci_addr(&uio_res->pci_addr, &dev->addr))
-   continue;
-
-   for (i = 0; i != uio_res->nb_maps; i++) {
-   /*
-* open devname, to mmap it
-*/
-   fd = open(uio_res->maps[i].path, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   uio_res->maps[i].path, strerror(errno));
-   return -1;
-   }
-
-   void *mapaddr = pci_map_resource(uio_res->maps[i].addr,
-   fd, (off_t)uio_res->maps[i].offset,
-   (size_t)uio_res->maps[i].size, 0);
-   if (mapaddr != uio_res->maps[i].addr) {
-   RTE_LOG(ERR, EAL,
-   "Cannot mmap device resource "
-   "file %s to address: %p\n",
-   uio_res->maps[i].path,
-   uio_res->maps[i].addr);
-   close(fd);
-   return -1;
-   }
-   /* fd is not needed in slave process, close it */
-   close(fd);
-   }
-   return 0;
-   }
-
-   RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-   return 1;
-}
-
-static int
+int
 pci_uio_alloc_resource(struct rte_pci_device *dev,
struct mapped_pci_resource **uio_res)
 {
@@ -194,7 +143,7 @@ close_fd:
return -1;
 }

-static int
+int
 pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
struct mapped_pci_resource *uio_res, int map_idx)
 {
@@ -252,61 +201,6 @@ fail:
return -1;
 }

-/* map the PCI resource of a PCI device in virtual memory */
-static int
-pci_uio_map_resource(struct rte_pci_device *dev)
-{
-   int i, map_idx, ret;
-   uint64_t phaddr;
-   struct mapped_pci_resource *uio_res = NULL;
-   struct mapped_pci_res_list *uio_res_list =
-   RTE_TAILQ_CAST(rte_uio_tailq.head, mapped_pci_res_list);
-
-  

[dpdk-dev] [PATCH v7 11/12] eal: Consolidate pci_map/unmap_device() of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

The patch consolidates below functions, and implemented in common
eal code.
 - pci_map_device()
 - pci_unmap_device()

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  3 ++
 lib/librte_eal/common/eal_common_pci.c  | 57 +
 lib/librte_eal/common/eal_private.h | 19 +++
 lib/librte_eal/common/include/rte_pci.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 53 --
 lib/librte_ether/rte_ethdev.c   |  1 +
 6 files changed, 81 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 329c268..c057f6a 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -238,6 +238,9 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
/* FreeBSD has no NUMA support (yet) */
dev->numa_node = 0;

+   /* FreeBSD has only one pass through driver */
+   dev->kdrv = RTE_KDRV_NIC_UIO;
+
/* parse resources */
switch (conf->pc_hdr & PCIM_HDRTYPE) {
case PCIM_HDRTYPE_NORMAL:
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 81b8fd6..c0be292 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -137,6 +137,63 @@ pci_unmap_resource(void *requested_addr, size_t size)
requested_addr);
 }

+/* Map pci device */
+int
+pci_map_device(struct rte_pci_device *dev)
+{
+   int ret = -1;
+
+   /* try mapping the NIC resources using VFIO if it exists */
+   switch (dev->kdrv) {
+   case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+   if (pci_vfio_is_enabled())
+   ret = pci_vfio_map_resource(dev);
+#endif
+   break;
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   /* map resources for devices that use uio */
+   ret = pci_uio_map_resource(dev);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "  Not managed by a supported kernel 
driver,"
+   " skipped\n");
+   ret = 1;
+   break;
+   }
+
+   return ret;
+}
+
+#ifdef RTE_LIBRTE_EAL_HOTPLUG
+/* Unmap pci device */
+void
+pci_unmap_device(struct rte_pci_device *dev)
+{
+   if (dev == NULL)
+   return;
+
+   /* try unmapping the NIC resources using VFIO if it exists */
+   switch (dev->kdrv) {
+   case RTE_KDRV_VFIO:
+   RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n");
+   break;
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   /* unmap resources for devices that use uio */
+   pci_uio_unmap_resource(dev);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "  Not managed by a supported kernel 
driver,"
+   " skipped\n");
+   break;
+   }
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG */
+
 /*
  * If vendor/device ID match, call the devinit() function of all
  * registered driver for the given device. Return -1 if initialization
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index c3a3fe4..eec396c 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -162,6 +162,25 @@ struct rte_pci_device;
  * @return
  *   0 on success, negative on error
  */
+int pci_map_device(struct rte_pci_device *dev);
+
+#ifdef RTE_LIBRTE_EAL_HOTPLUG
+/**
+ * Unmap this device
+ *
+ * This function is private to EAL.
+ */
+void pci_unmap_device(struct rte_pci_device *dev);
+#endif /* RTE_LIBRTE_EAL_HOTPLUG */
+
+/**
+ * Map this device
+ *
+ * This function is private to EAL.
+ *
+ * @return
+ *   0 on success, negative on error
+ */
 int pci_uio_map_resource(struct rte_pci_device *dev);

 #ifdef RTE_LIBRTE_EAL_HOTPLUG
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 56dcb46..475d2dc 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -148,6 +148,7 @@ enum rte_kernel_driver {
RTE_KDRV_IGB_UIO,
RTE_KDRV_VFIO,
RTE_KDRV_UIO_GENERIC,
+   RTE_KDRV_NIC_UIO,
 };

 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fc99eaa..7e8df7d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -522,59 +522,6 @@ pci_config_space_set(struct rte_pci_device *dev)
 }
 #endif

-static int
-pci_map_device(struct rte_pci_device *dev)
-{
-   int ret = -1;
-
-   /* try mapping the NIC resources using VFIO if it exists */
-   switch (dev->kdrv) {
-   case RTE_KDRV_VFIO:
-#ifdef VFIO_PRESENT
-   if (pci_vfio_is_enabled())
-

[dpdk-dev] [PATCH v7 12/12] eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch consolidates below functions, and implements these in common
eal code.
 - rte_eal_pci_probe_one_driver()
 - rte_eal_pci_close_one_driver()

Because pci_map_device() is only implemented in linuxapp, the patch
implements it in bsdapp too. This implemented function will be merged to
linuxapp one with later patch.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c|  67 +---
 lib/librte_eal/common/eal_common_pci.c | 133 +-
 lib/librte_eal/common/eal_private.h|  39 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 142 +
 4 files changed, 135 insertions(+), 246 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index c057f6a..508cfa7 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -84,7 +84,7 @@
  */

 /* unbind kernel driver for this device */
-static int
+int
 pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
 {
RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
@@ -355,71 +355,6 @@ error:
return -1;
 }

-/*
- * If vendor/device ID match, call the devinit() function of the
- * driver.
- */
-int
-rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
-{
-   const struct rte_pci_id *id_table;
-   int ret;
-
-   for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
-
-   /* check if device's identifiers match the driver's ones */
-   if (id_table->vendor_id != dev->id.vendor_id &&
-   id_table->vendor_id != PCI_ANY_ID)
-   continue;
-   if (id_table->device_id != dev->id.device_id &&
-   id_table->device_id != PCI_ANY_ID)
-   continue;
-   if (id_table->subsystem_vendor_id != 
dev->id.subsystem_vendor_id &&
-   id_table->subsystem_vendor_id != PCI_ANY_ID)
-   continue;
-   if (id_table->subsystem_device_id != 
dev->id.subsystem_device_id &&
-   id_table->subsystem_device_id != PCI_ANY_ID)
-   continue;
-
-   struct rte_pci_addr *loc = &dev->addr;
-
-   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket 
%i\n",
-   loc->domain, loc->bus, loc->devid, 
loc->function,
-   dev->numa_node);
-
-   RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n", 
dev->id.vendor_id,
-   dev->id.device_id, dr->name);
-
-   /* no initialization when blacklisted, return without error */
-   if (dev->devargs != NULL &&
-   dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
-
-   RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
-   return 0;
-   }
-
-   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-   /* map resources for devices that use igb_uio */
-   ret = pci_uio_map_resource(dev);
-   if (ret != 0)
-   return ret;
-   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
-  rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   /* unbind current driver */
-   if (pci_unbind_kernel_driver(dev) < 0)
-   return -1;
-   }
-
-   /* reference driver structure */
-   dev->driver = dr;
-
-   /* call the driver devinit() function */
-   return dr->devinit(dr, dev);
-   }
-   /* return positive value if driver is not found */
-   return 1;
-}
-
 /* Init the PCI EAL subsystem */
 int
 rte_eal_pci_init(void)
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index c0be292..8ef8057 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -138,7 +138,7 @@ pci_unmap_resource(void *requested_addr, size_t size)
 }

 /* Map pci device */
-int
+static int
 pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;
@@ -169,7 +169,7 @@ pci_map_device(struct rte_pci_device *dev)

 #ifdef RTE_LIBRTE_EAL_HOTPLUG
 /* Unmap pci device */
-void
+static void
 pci_unmap_device(struct rte_pci_device *dev)
 {
if (dev == NULL)
@@ -195,6 +195,135 @@ pci_unmap_device(struct rte_pci_device *dev)
 #endif /* RTE_LIBRTE_EAL_HOTPLUG */

 /*
+ * If vendor/device ID match, call the devinit() function of the
+ * driver.
+ */
+static int
+rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
+{
+   int ret;
+   const struct rte_pci_id *id_t

[dpdk-dev] [PATCH v4] Add Port Hotplug support to BSD

2015-06-30 Thread Tetsuya Mukawa
This patch adds port hotplug support to BSD.
Before applying, following patches should be applied.
 - [PATCH v7 01/12] eal: Fix coding style of eal_pci.c and eal_pci_uio.c
 - [PATCH v7 02/12] eal: Close file descriptor of uio configuration
 - [PATCH v7 03/12] eal: Fix memory leaks and needless increment of pci_map_addr
 - [PATCH v7 04/12] eal/bsdapp: Change names of pci related data structure
 - [PATCH v7 05/12] eal: Fix uio mapping differences between linuxapp and bsdapp
 - [PATCH v7 06/12] eal: Add pci_uio_alloc_resource()
 - [PATCH v7 07/12] eal: Add pci_uio_map_resource_by_index()
 - [PATCH v7 08/12] eal: Consolidate pci_map and mapped_pci_resource of 
linuxapp and bsdapp
 - [PATCH v7 09/12] eal: Consolidate pci_map/unmap_resource() of linuxapp and 
bsdapp
 - [PATCH v7 10/12] eal: Consolidate pci uio functions of linuxapp and bsdapp
 - [PATCH v7 11/12] eal: Consolidate pci_map/unmap_device() of linuxapp and 
bsdapp
 - [PATCH v7 12/12] eal: Consolidate rte_eal_pci_probe/close_one_driver() of 
linuxapp and bsdapp

PATCH v3 changes:
 - Below patches are removed.
   - eal: Add pci_uio_alloc_resource()
   - eal: Add pci_uio_map_resource_by_index()
   - eal: Consolidate pci_map and mapped_pci_resource of linuxapp and bsdapp
   - eal: Consolidate pci_map/unmap_resource() of linuxapp and bsdapp
   - eal: Consolidate pci uio functions of linuxapp and bsdapp
   - eal: Consolidate pci_map/unmap_device() of linuxapp and bsdapp
   - eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and 
bsdapp
   (Thanks to Bruce Richardson)

PATCH v2 changes:
 - Fix license of eal_common_pci_uio.c

PATCH v1 changes:
 - Rebase to below latest patch series.
   - [PATCH v6] Clean up pci uio implementations

Tetsuya.Mukawa (1):
  eal: Enable Port Hotplug as default in Linux and BSD

 config/common_bsdapp  |  6 --
 config/common_linuxapp|  5 -
 lib/librte_eal/bsdapp/eal/eal_pci.c   |  6 +++---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  6 ++
 lib/librte_eal/common/eal_common_dev.c|  2 --
 lib/librte_eal/common/eal_common_pci.c|  6 --
 lib/librte_eal/common/eal_common_pci_uio.c|  2 --
 lib/librte_eal/common/eal_private.h   |  2 --
 lib/librte_eal/common/include/rte_pci.h   |  2 --
 lib/librte_ether/rte_ethdev.c | 21 -
 10 files changed, 9 insertions(+), 49 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD

2015-06-30 Thread Tetsuya Mukawa
From: "Tetsuya.Mukawa" 

This patch removes CONFIG_RTE_LIBRTE_EAL_HOTPLUG option, and enables it
as default in both Linux and BSD.
Also, to support port hotplug, rte_eal_pci_scan() and below missing
symbols should be exported to ethdev library.
 - rte_eal_parse_devargs_str()
 - rte_eal_pci_close_one()
 - rte_eal_pci_probe_one()
 - rte_eal_pci_scan()
 - rte_eal_vdev_init()
 - rte_eal_vdev_uninit()

Signed-off-by: Tetsuya Mukawa 
---
 config/common_bsdapp  |  6 --
 config/common_linuxapp|  5 -
 lib/librte_eal/bsdapp/eal/eal_pci.c   |  6 +++---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  6 ++
 lib/librte_eal/common/eal_common_dev.c|  2 --
 lib/librte_eal/common/eal_common_pci.c|  6 --
 lib/librte_eal/common/eal_common_pci_uio.c|  2 --
 lib/librte_eal/common/eal_private.h   |  2 --
 lib/librte_eal/common/include/rte_pci.h   |  2 --
 lib/librte_ether/rte_ethdev.c | 21 -
 10 files changed, 9 insertions(+), 49 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 464250b..c6e6e9c 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -121,12 +121,6 @@ CONFIG_RTE_LIBRTE_EAL_BSDAPP=y
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n

 #
-# Compile Environment Abstraction Layer to support hotplug
-# So far, Hotplug functions only support linux
-#
-CONFIG_RTE_LIBRTE_EAL_HOTPLUG=n
-
-#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index aae22f4..c33a6fe 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -119,11 +119,6 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
-# Compile Environment Abstraction Layer to support hotplug
-#
-CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
-
-#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 508cfa7..0724c45 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -309,8 +309,8 @@ skipdev:
  * Scan the content of the PCI bus, and add the devices in the devices
  * list. Call pci_scan_one() for each pci entry found.
  */
-static int
-pci_scan(void)
+int
+rte_eal_pci_scan(void)
 {
int fd;
unsigned dev_count = 0;
@@ -366,7 +366,7 @@ rte_eal_pci_init(void)
if (internal_config.no_pci)
return 0;

-   if (pci_scan() < 0) {
+   if (rte_eal_pci_scan() < 0) {
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 67b6a6c..7e850a9 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -37,14 +37,20 @@ DPDK_2.0 {
rte_eal_lcore_role;
rte_eal_mp_remote_launch;
rte_eal_mp_wait_lcore;
+   rte_eal_parse_devargs_str;
+   rte_eal_pci_close_one;
rte_eal_pci_dump;
rte_eal_pci_probe;
+   rte_eal_pci_probe_one;
rte_eal_pci_register;
+   rte_eal_pci_scan;
rte_eal_pci_unregister;
rte_eal_process_type;
rte_eal_remote_launch;
rte_eal_tailq_lookup;
rte_eal_tailq_register;
+   rte_eal_vdev_init;
+   rte_eal_vdev_uninit;
rte_eal_wait_lcore;
rte_exit;
rte_get_hpet_cycles;
diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 92a5a94..4089d66 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -125,7 +125,6 @@ rte_eal_dev_init(void)
return 0;
 }

-#ifdef RTE_LIBRTE_EAL_HOTPLUG
 int
 rte_eal_vdev_uninit(const char *name)
 {
@@ -151,4 +150,3 @@ rte_eal_vdev_uninit(const char *name)
RTE_LOG(ERR, EAL, "no driver found for %s\n", name);
return -EINVAL;
 }
-#endif /* RTE_LIBRTE_EAL_HOTPLUG */
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 8ef8057..3805aed 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -167,7 +167,6 @@ pci_map_device(struct rte_pci_device *dev)
return ret;
 }

-#ifdef RTE_LIBRTE_EAL_HOTPLUG
 /* Unmap pci device */
 static void
 pci_unmap_device(struct rte_pci_device *dev)
@@ -192,7 +191,6 @@ pci_unmap_device(struct rte_pci_device *dev)
break;
}
 }
-#endif /* RTE_LIBRTE_EAL_HOTPLUG */

 /*
  * If vendor/device ID match, call the devinit() function of the
@@ -265,7 +263,6 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 1;
 }

-#ifdef RTE_LIBRTE_EAL_HOTPLUG
 /*
  * If vendor/devi

[dpdk-dev] [PATCH v3 7/8] eal: Consolidate pci uio functions of linuxapp and bsdapp

2015-06-30 Thread Tetsuya Mukawa
On 2015/06/30 11:43, Tetsuya Mukawa wrote:
> On 2015/06/29 23:03, Iremonger, Bernard wrote:
>>> -Original Message-
>>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>>> Sent: Monday, June 29, 2015 3:57 AM
>>> To: dev at dpdk.org
>>> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Tetsuya.Mukawa
>>> Subject: [PATCH v3 7/8] eal: Consolidate pci uio functions of linuxapp and
>>> bsdapp
>>>
>>> From: "Tetsuya.Mukawa" 
>>>
>>> The patch consolidates below functions, and implement these in
>>> eal_common_pci_uio.c.
>>>  - pci_uio_map_secondary()
>>>  - pci_uio_map_resource()
>>>  - pci_uio_unmap()
>>>  - pci_uio_find_resource()
>>>  - pci_uio_unmap_resource()
>>>
>>> Signed-off-by: Tetsuya Mukawa 
>> Hi Tetsuya,
>>
>> The copyrights of all files in this patch set should be updated to 2015.
> Hi Bernard,
>
> Could I make sure this comment?
> Is it okay to change copyright of Intel like below?
> - Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
>
> Or do you suggest it's better to add my company copyright below Intel?
>
> Regards,
> Tetsuya
>

Hi Bernard,

I haven't involved this fixing in latest patch series.
(Probably deadline will come soon, but my patches still have some
comments, so I will submit the patches that fix such issues.)
Even if we need to change copyright, it'll be very easy. In the case, I
will send new version patch series to fix it.

Regards,
Tetsuya


[dpdk-dev] [PATCH v8 03/18] mbuf: add definitions of unified packet types

2015-06-30 Thread Olivier MATZ
Hi Helin,

This is greatly documented, thanks!
Please find a small comment below.

On 06/23/2015 03:50 AM, Helin Zhang wrote:
> As there are only 6 bit flags in ol_flags for indicating packet
> types, which is not enough to describe all the possible packet
> types hardware can recognize. For example, i40e hardware can
> recognize more than 150 packet types. Unified packet type is
> composed of L2 type, L3 type, L4 type, tunnel type, inner L2 type,
> inner L3 type and inner L4 type fields, and can be stored in
> 'struct rte_mbuf' of 32 bits field 'packet_type'.
> To avoid breaking ABI compatibility, all the changes would be
> enabled by RTE_NEXT_ABI, which is disabled by default.
>
> [...]
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 0315561..0ee0c55 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -201,6 +201,493 @@ extern "C" {
>   /* Use final bit of flags to indicate a control mbuf */
>   #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */
>
> +#ifdef RTE_NEXT_ABI
> +/*
> + * 32 bits are divided into several fields to mark packet types. Note that
> + * each field is indexical.
> + * - Bit 3:0 is for L2 types.
> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
> + * - Bit 11:8 is for L4 or outer L4 (for tunneling case) types.
> + * - Bit 15:12 is for tunnel types.
> + * - Bit 19:16 is for inner L2 types.
> + * - Bit 23:20 is for inner L3 types.
> + * - Bit 27:24 is for inner L4 types.
> + * - Bit 31:28 is reserved.
> + *
> + * To be compatible with Vector PMD, RTE_PTYPE_L3_IPV4, 
> RTE_PTYPE_L3_IPV4_EXT,
> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, 
> RTE_PTYPE_L4_UDP
> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
> + *
> + * Note that L3 types values are selected for checking IPV4/IPV6 header from
> + * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
> + *
> + * Note that the packet types of the same packet recognized by different
> + * hardware may be different, as different hardware may have different
> + * capability of packet type recognition.
> + *
> + * examples:
> + * <'ether type'=0x0800
> + * | 'version'=4, 'protocol'=0x29
> + * | 'version'=6, 'next header'=0x3A
> + * | 'ICMPv6 header'>
> + * will be recognized on i40e hardware as packet type combination of,
> + * RTE_PTYPE_L2_MAC |
> + * RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> + * RTE_PTYPE_TUNNEL_IP |
> + * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
> + * RTE_PTYPE_INNER_L4_ICMP.
> + *
> + * <'ether type'=0x86DD
> + * | 'version'=6, 'next header'=0x2F
> + * | 'GRE header'
> + * | 'version'=6, 'next header'=0x11
> + * | 'UDP header'>
> + * will be recognized on i40e hardware as packet type combination of,
> + * RTE_PTYPE_L2_MAC |
> + * RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
> + * RTE_PTYPE_TUNNEL_GRENAT |
> + * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
> + * RTE_PTYPE_INNER_L4_UDP.
> + */
> +#define RTE_PTYPE_UNKNOWN   0x
> +/**
> + * MAC (Media Access Control) packet type.
> + * It is used for outer packet for tunneling cases.
> + *
> + * Packet format:
> + * <'ether type'=[0x0800|0x86DD|others]>
> + */
> +#define RTE_PTYPE_L2_MAC0x0001

I'm wondering if RTE_PTYPE_L2_ETHER is not a better name?


> +/**
> + * MAC (Media Access Control) packet type for time sync.
> + *
> + * Packet format:
> + * <'ether type'=0x88F7>
> + */
> +#define RTE_PTYPE_L2_MAC_TIMESYNC   0x0002
> +/**
> + * ARP (Address Resolution Protocol) packet type.
> + *
> + * Packet format:
> + * <'ether type'=0x0806>
> + */
> +#define RTE_PTYPE_L2_ARP0x0003
> +/**
> + * LLDP (Link Layer Discovery Protocol) packet type.
> + *
> + * Packet format:
> + * <'ether type'=0x88CC>
> + */
> +#define RTE_PTYPE_L2_LLDP   0x0004

Maybe ETHER should appear in these names too, what do you think?




> +/**
> + * Mask of layer 2 packet types.
> + * It is used for outer packet for tunneling cases.
> + */
> +#define RTE_PTYPE_L2_MASK   0x000f
> +/**
> + * IP (Internet Protocol) version 4 packet type.
> + * It is used for outer packet for tunneling cases, and does not contain any
> + * header option.
> + *
> + * Packet format:
> + * <'ether type'=0x0800
> + * | 'version'=4, 'ihl'=5>
> + */
> +#define RTE_PTYPE_L3_IPV4   0x0010
> +/**
> + * IP (Internet Protocol) version 4 packet type.
> + * It is used for outer packet for tunneling cases, and contains header
> + * options.
> + *
> + * Packet format:
> + * <'ether type'=0x0800
> + * | 'version'=4, 'ihl'=[6-15], 'options'>
> + */
> +#define RTE_PTYPE_L3_IPV4_EXT   0x0030
> +/**
> + * IP (Internet Protocol) version 6 packet type.
> + * It is used for outer packet for tunneling cases, and does not contain any
> + * extension header.
> + *
> + * Packet format:

[dpdk-dev] [PATCH v4 0/4] vhost: vhost unix domain socket cleanup

2015-06-30 Thread Huawei Xie
vhost user could register multiple unix domain socket server, and use the path
to identify the virtio device connecting to it. rte_vhost_driver_unregister
will clean up the unix domain socket for the specified path.

v2 changes:
-minor code style fix, remove unnecessary new line

v3 changes:
update version map file

v4 changes:
-add comment for potential unwanted callback on listenfds
-call fdset_del_slot to remove connection fd 

Huawei Xie (4):
  fdset_del_slot
  vhost socket cleanup
  version map file update
  add comment for potential unwanted call on listenfds

 lib/librte_vhost/rte_vhost_version.map   |  8 
 lib/librte_vhost/rte_virtio_net.h|  3 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |  9 
 lib/librte_vhost/vhost_user/fd_man.c | 34 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c | 68 +++-
 lib/librte_vhost/vhost_user/vhost-net-user.h |  2 +-
 6 files changed, 110 insertions(+), 14 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH v4 1/4] vhost: call fdset_del_slot to remove connection fd

2015-06-30 Thread Huawei Xie
In the event handler of connection fd, the connection fd could be possibly
closed. The event dispatch loop would then try to remove the fd from fdset.
Between these two actions, another thread might register a new listenfd
reusing the val of just closed fd, so we couldn't call fdset_del which would
wrongly clean up the new listenfd. A new function fdset_del_slot is provided
to cleanup the fd at the specified location.

v4 changes:
- call fdset_del_slot to remove connection fd

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/vhost_user/fd_man.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user/fd_man.c 
b/lib/librte_vhost/vhost_user/fd_man.c
index 831c9c1..bd30f8d 100644
--- a/lib/librte_vhost/vhost_user/fd_man.c
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -188,6 +188,24 @@ fdset_del(struct fdset *pfdset, int fd)
 }

 /**
+ *  Unregister the fd at the specified slot from the fdset.
+ */
+static void
+fdset_del_slot(struct fdset *pfdset, int index)
+{
+   if (pfdset == NULL || index < 0 || index >= MAX_FDS)
+   return;
+
+   pthread_mutex_lock(&pfdset->fd_mutex);
+
+   pfdset->fd[index].fd = -1;
+   pfdset->fd[index].rcb = pfdset->fd[index].wcb = NULL;
+   pfdset->num--;
+
+   pthread_mutex_unlock(&pfdset->fd_mutex);
+}
+
+/**
  * This functions runs in infinite blocking loop until there is no fd in
  * pfdset. It calls corresponding r/w handler if there is event on the fd.
  *
@@ -248,8 +266,15 @@ fdset_event_dispatch(struct fdset *pfdset)
 * We don't allow fdset_del to be called in callback
 * directly.
 */
+   /*
+* When we are to clean up the fd from fdset,
+* because the fd is closed in the cb,
+* the old fd val could be reused by when creates new
+* listen fd in another thread, we couldn't call
+* fd_set_del.
+*/
if (remove1 || remove2)
-   fdset_del(pfdset, fd);
+   fdset_del_slot(pfdset, i);
}
}
 }
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 2/4] vhost: vhost unix domain socket cleanup

2015-06-30 Thread Huawei Xie
rte_vhost_driver_unregister API will remove the listenfd from event list,
and then close it.

v2 changes:
-minor code style fix, remove unnecessary new line

Signed-off-by: Huawei Xie 
Signed-off-by: Peng Sun 
---
 lib/librte_vhost/rte_virtio_net.h|  3 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |  9 
 lib/librte_vhost/vhost_user/vhost-net-user.c | 68 +++-
 lib/librte_vhost/vhost_user/vhost-net-user.h |  2 +-
 4 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 5d38185..5630fbc 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -188,6 +188,9 @@ int rte_vhost_enable_guest_notification(struct virtio_net 
*dev, uint16_t queue_i
 /* Register vhost driver. dev_name could be different for multiple instance 
support. */
 int rte_vhost_driver_register(const char *dev_name);

+/* Unregister vhost driver. This is only meaningful to vhost user. */
+int rte_vhost_driver_unregister(const char *dev_name);
+
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);
 /* Start vhost driver session blocking loop. */
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 6b68abf..1ae7c49 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -405,6 +405,15 @@ rte_vhost_driver_register(const char *dev_name)
 }

 /**
+ * An empty function for unregister
+ */
+int
+rte_vhost_driver_unregister(const char *dev_name __rte_unused)
+{
+   return 0;
+}
+
+/**
  * The CUSE session is launched allowing the application to receive open,
  * release and ioctl calls.
  */
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 31f1215..87a4711 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -66,6 +66,8 @@ struct connfd_ctx {
 struct _vhost_server {
struct vhost_server *server[MAX_VHOST_SERVER];
struct fdset fdset;
+   int vserver_cnt;
+   pthread_mutex_t server_mutex;
 };

 static struct _vhost_server g_vhost_server = {
@@ -74,10 +76,10 @@ static struct _vhost_server g_vhost_server = {
.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
.num = 0
},
+   .vserver_cnt = 0,
+   .server_mutex = PTHREAD_MUTEX_INITIALIZER,
 };

-static int vserver_idx;
-
 static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_NONE] = "VHOST_USER_NONE",
[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
@@ -427,7 +429,6 @@ vserver_message_handler(int connfd, void *dat, int *remove)
}
 }

-
 /**
  * Creates and initialise the vhost server.
  */
@@ -436,34 +437,77 @@ rte_vhost_driver_register(const char *path)
 {
struct vhost_server *vserver;

-   if (vserver_idx == 0)
+   pthread_mutex_lock(&g_vhost_server.server_mutex);
+   if (ops == NULL)
ops = get_virtio_net_callbacks();
-   if (vserver_idx == MAX_VHOST_SERVER)
+
+   if (g_vhost_server.vserver_cnt == MAX_VHOST_SERVER) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "error: the number of servers reaches maximum\n");
+   pthread_mutex_unlock(&g_vhost_server.server_mutex);
return -1;
+   }

vserver = calloc(sizeof(struct vhost_server), 1);
-   if (vserver == NULL)
+   if (vserver == NULL) {
+   pthread_mutex_unlock(&g_vhost_server.server_mutex);
return -1;
-
-   unlink(path);
+   }

vserver->listenfd = uds_socket(path);
if (vserver->listenfd < 0) {
free(vserver);
+   pthread_mutex_unlock(&g_vhost_server.server_mutex);
return -1;
}
-   vserver->path = path;
+
+   vserver->path = strdup(path);

fdset_add(&g_vhost_server.fdset, vserver->listenfd,
-   vserver_new_vq_conn, NULL,
-   vserver);
+   vserver_new_vq_conn, NULL, vserver);

-   g_vhost_server.server[vserver_idx++] = vserver;
+   g_vhost_server.server[g_vhost_server.vserver_cnt++] = vserver;
+   pthread_mutex_unlock(&g_vhost_server.server_mutex);

return 0;
 }


+/**
+ * Unregister the specified vhost server
+ */
+int
+rte_vhost_driver_unregister(const char *path)
+{
+   int i;
+   int count;
+
+   pthread_mutex_lock(&g_vhost_server.server_mutex);
+
+   for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
+   if (!strcmp(g_vhost_server.server[i]->path, path)) {
+   fdset_del(&g_vhost_server.fdset,
+   g_vhost_server.server[i]->listenfd);
+
+   close(g_vhost_server.server[i]->listenfd);
+   free(g_vhost_serve

[dpdk-dev] [PATCH v4 3/4] vhost: version map file update

2015-06-30 Thread Huawei Xie
update version map file for rte_vhost_driver_unregister API

v3 changes:
update version map file

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/rte_vhost_version.map | 8 
 1 file changed, 8 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost_version.map 
b/lib/librte_vhost/rte_vhost_version.map
index 163dde0..fb6bb9e 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -13,3 +13,11 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.1 {
+   global:
+
+   rte_vhost_driver_unregister;
+
+   local: *;
+} DPDK_2.0;
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 4/4] vhost: add comment for potential unwanted callback on listenfds

2015-06-30 Thread Huawei Xie
add comment for potential unwanted callback on listenfds

v4 changes:
add comment for potential unwanted callback on listenfds

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/vhost_user/fd_man.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/fd_man.c 
b/lib/librte_vhost/vhost_user/fd_man.c
index bd30f8d..d68b270 100644
--- a/lib/librte_vhost/vhost_user/fd_man.c
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -242,6 +242,13 @@ fdset_event_dispatch(struct fdset *pfdset)

pthread_mutex_unlock(&pfdset->fd_mutex);

+   /*
+* When select is blocked, other threads might unregister
+* listenfds from and register new listenfds into fdset.
+* When select returns, the entries for listenfds in the fdset
+* might have been updated. It is ok if there is unwanted call
+* for new listenfds.
+*/
ret = select(maxfds + 1, &rfds, &wfds, NULL, &tv);
if (ret <= 0)
continue;
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements

2015-06-30 Thread Adrien Mazarguil
This patchset adds compatibility with the upcoming Mellanox OFED 3.0
release (new kernel drivers and userland support libraries), which supports
new features such as L3/L4 checksum validation offloads and addresses
several bugs and limitations at the same time.

v2:
 - Bugfix for a possible crash when allocating mbufs.
 - Several API changes following the release of Mellanox OFED 3.0.
 - Performance improvements made possible by the new API.
 - Add TX checksum offloads.
 - Update documentation to reflect the changes.

Adrien Mazarguil (6):
  mlx4: fix possible crash on scattered mbuf allocation failure
  mlx4: add MOFED 3.0 compatibility to interfaces names retrieval
  mlx4: use MOFED 3.0 fast verbs interface for TX operations
  mlx4: move scattered TX processing to helper function
  mlx4: add L2 tunnel (VXLAN) checksum offload support
  doc: update mlx4 documentation following MOFED 3.0 changes

Alex Rosenbaum (8):
  mlx4: avoid looking up WR ID to improve RX performance
  mlx4: merge RX queue setup functions
  mlx4: use MOFED 3.0 extended flow steering API
  mlx4: use MOFED 3.0 fast verbs interface for RX operations
  mlx4: improve performance by requesting TX completion events less
often
  mlx4: shrink TX queue elements for better performance
  mlx4: prefetch completed TX mbufs before releasing them
  mlx4: associate resource domain with CQs and QPs to enhance
performance

Gilad Berman (1):
  mlx4: add L3 and L4 checksum offload support

Olga Shern (5):
  mlx4: make sure experimental device query function is implemented
  mlx4: allow applications to partially use fork()
  mlx4: improve accuracy of link status information
  mlx4: fix support for multiple VLAN filters
  mlx4: disable multicast echo when device is not VF

Or Ami (3):
  mlx4: fix error message for invalid number of descriptors
  mlx4: remove provision for flow creation failure in DMFS A0 mode
  mlx4: query netdevice to get initial MAC address

 doc/guides/nics/mlx4.rst  |   29 +-
 drivers/net/mlx4/Makefile |9 +-
 drivers/net/mlx4/mlx4.c   | 1433 -
 drivers/net/mlx4/mlx4.h   |3 +
 4 files changed, 921 insertions(+), 553 deletions(-)

-- 
2.1.0



[dpdk-dev] [PATCH v2 01/23] mlx4: fix possible crash on scattered mbuf allocation failure

2015-06-30 Thread Adrien Mazarguil
When failing to allocate a segment, mlx4_rx_burst_sp() may call
rte_pktmbuf_free() on an incomplete scattered mbuf whose next pointer in the
last segment is not set.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 5391b7a..d1166b2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2370,8 +2370,10 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
  " can't allocate a new mbuf",
  (void *)rxq, wr_id);
-   if (pkt_buf != NULL)
+   if (pkt_buf != NULL) {
+   *pkt_buf_next = NULL;
rte_pktmbuf_free(pkt_buf);
+   }
/* Increase out of memory counters. */
++rxq->stats.rx_nombuf;
++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-- 
2.1.0



[dpdk-dev] [PATCH v2 02/23] mlx4: add MOFED 3.0 compatibility to interfaces names retrieval

2015-06-30 Thread Adrien Mazarguil
Since Mellanox OFED 3.0 and Linux 3.15, interface port numbers are stored
in dev_port instead of dev_id sysfs files.

Signed-off-by: Or Ami 
Signed-off-by: Nitzan Weller 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 51 +
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d1166b2..ad37e01 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -337,9 +337,11 @@ priv_unlock(struct priv *priv)
 static int
 priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 {
-   int ret = -1;
DIR *dir;
struct dirent *dent;
+   unsigned int dev_type = 0;
+   unsigned int dev_port_prev = ~0u;
+   char match[IF_NAMESIZE] = "";

{
MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
@@ -351,7 +353,7 @@ priv_get_ifname(const struct priv *priv, char 
(*ifname)[IF_NAMESIZE])
while ((dent = readdir(dir)) != NULL) {
char *name = dent->d_name;
FILE *file;
-   unsigned int dev_id;
+   unsigned int dev_port;
int r;

if ((name[0] == '.') &&
@@ -359,22 +361,47 @@ priv_get_ifname(const struct priv *priv, char 
(*ifname)[IF_NAMESIZE])
 ((name[1] == '.') && (name[2] == '\0'
continue;

-   MKSTR(path, "%s/device/net/%s/dev_id",
- priv->ctx->device->ibdev_path, name);
+   MKSTR(path, "%s/device/net/%s/%s",
+ priv->ctx->device->ibdev_path, name,
+ (dev_type ? "dev_id" : "dev_port"));

file = fopen(path, "rb");
-   if (file == NULL)
+   if (file == NULL) {
+   if (errno != ENOENT)
+   continue;
+   /*
+* Switch to dev_id when dev_port does not exist as
+* is the case with Linux kernel versions < 3.15.
+*/
+try_dev_id:
+   match[0] = '\0';
+   if (dev_type)
+   break;
+   dev_type = 1;
+   dev_port_prev = ~0u;
+   rewinddir(dir);
continue;
-   r = fscanf(file, "%x", &dev_id);
-   fclose(file);
-   if ((r == 1) && (dev_id == (priv->port - 1u))) {
-   snprintf(*ifname, sizeof(*ifname), "%s", name);
-   ret = 0;
-   break;
}
+   r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
+   fclose(file);
+   if (r != 1)
+   continue;
+   /*
+* Switch to dev_id when dev_port returns the same value for
+* all ports. May happen when using a MOFED release older than
+* 3.0 with a Linux kernel >= 3.15.
+*/
+   if (dev_port == dev_port_prev)
+   goto try_dev_id;
+   dev_port_prev = dev_port;
+   if (dev_port == (priv->port - 1u))
+   snprintf(match, sizeof(match), "%s", name);
}
closedir(dir);
-   return ret;
+   if (match[0] == '\0')
+   return -1;
+   strncpy(*ifname, match, sizeof(*ifname));
+   return 0;
 }

 /**
-- 
2.1.0



[dpdk-dev] [PATCH v2 03/23] mlx4: make sure experimental device query function is implemented

2015-06-30 Thread Adrien Mazarguil
From: Olga Shern 

HAVE_EXP_QUERY_DEVICE is used to check whether ibv_exp_query_device() can be
used. RSS and inline receive features depend on it.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/Makefile |  4 
 drivers/net/mlx4/mlx4.c   | 17 ++---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 97b364a..ce1f2b0 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -112,6 +112,10 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
SEND_RAW_WR_SUPPORT \
infiniband/verbs.h \
type 'struct ibv_send_wr_raw' $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_EXP_QUERY_DEVICE \
+   infiniband/verbs.h \
+   type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)

 mlx4.o: mlx4_autoconf.h

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ad37e01..bd20569 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4458,17 +4458,18 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_pd *pd = NULL;
struct priv *priv = NULL;
struct rte_eth_dev *eth_dev;
-#if defined(INLINE_RECV) || defined(RSS_SUPPORT)
+#ifdef HAVE_EXP_QUERY_DEVICE
struct ibv_exp_device_attr exp_device_attr;
-#endif
+#endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
union ibv_gid temp_gid;

+#ifdef HAVE_EXP_QUERY_DEVICE
+   exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
 #ifdef RSS_SUPPORT
-   exp_device_attr.comp_mask =
-   (IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS |
-IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ);
+   exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
 #endif /* RSS_SUPPORT */
+#endif /* HAVE_EXP_QUERY_DEVICE */

DEBUG("using port %u (%08" PRIx32 ")", port, test);

@@ -4513,11 +4514,12 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
-#ifdef RSS_SUPPORT
+#ifdef HAVE_EXP_QUERY_DEVICE
if (ibv_exp_query_device(ctx, &exp_device_attr)) {
-   INFO("experimental ibv_exp_query_device");
+   ERROR("ibv_exp_query_device() failed");
goto port_error;
}
+#ifdef RSS_SUPPORT
if ((exp_device_attr.exp_device_cap_flags &
 IBV_EXP_DEVICE_QPG) &&
(exp_device_attr.exp_device_cap_flags &
@@ -4569,6 +4571,7 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 priv->inl_recv_size);
}
 #endif /* INLINE_RECV */
+#endif /* HAVE_EXP_QUERY_DEVICE */

(void)mlx4_getenv_int;
priv->vf = vf;
-- 
2.1.0



[dpdk-dev] [PATCH v2 04/23] mlx4: avoid looking up WR ID to improve RX performance

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

This is done by storing the current index in the RX queue structure.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index bd20569..08b1b81 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -200,6 +200,7 @@ struct rxq {
struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
+   unsigned int elts_head; /* Current index in (*elts)[]. */
union {
struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
struct rxq_elt (*no_sp)[]; /* RX elements. */
@@ -1640,6 +1641,7 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
DEBUG("%p: allocated and configured %u WRs (%zu segments)",
  (void *)rxq, elts_n, (elts_n * elemof((*elts)[0].sges)));
rxq->elts_n = elts_n;
+   rxq->elts_head = 0;
rxq->elts.sp = elts;
assert(ret == 0);
return 0;
@@ -1785,6 +1787,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, 
struct rte_mbuf **pool)
DEBUG("%p: allocated and configured %u single-segment WRs",
  (void *)rxq, elts_n);
rxq->elts_n = elts_n;
+   rxq->elts_head = 0;
rxq->elts.no_sp = elts;
assert(ret == 0);
return 0;
@@ -2320,6 +2323,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 {
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+   const unsigned int elts_n = rxq->elts_n;
+   unsigned int elts_head = rxq->elts_head;
struct ibv_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = &head.next;
@@ -2346,7 +2351,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct ibv_wc *wc = &wcs[i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
-   struct rxq_elt_sp *elt = &(*elts)[wr_id];
+   struct rxq_elt_sp *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
struct rte_mbuf **pkt_buf_next = &pkt_buf;
@@ -2354,10 +2359,15 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
unsigned int j = 0;

/* Sanity checks. */
+#ifdef NDEBUG
+   (void)wr_id;
+#endif
assert(wr_id < rxq->elts_n);
assert(wr_id == wr->wr_id);
assert(wr->sg_list == elt->sges);
assert(wr->num_sge == elemof(elt->sges));
+   assert(elts_head < rxq->elts_n);
+   assert(rxq->elts_head < rxq->elts_n);
/* Link completed WRs together for repost. */
*next = wr;
next = &wr->next;
@@ -2468,6 +2478,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
rxq->stats.ibytes += wc->byte_len;
 #endif
 repost:
+   if (++elts_head >= elts_n)
+   elts_head = 0;
continue;
}
*next = NULL;
@@ -2485,6 +2497,7 @@ repost:
  strerror(i));
abort();
}
+   rxq->elts_head = elts_head;
 #ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase packets counter. */
rxq->stats.ipackets += ret;
@@ -2514,6 +2527,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 {
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+   const unsigned int elts_n = rxq->elts_n;
+   unsigned int elts_head = rxq->elts_head;
struct ibv_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = &head.next;
@@ -2538,7 +2553,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct ibv_wc *wc = &wcs[i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
-   struct rxq_elt *elt = &(*elts)[WR_ID(wr_id).id];
+   struct rxq_elt *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
WR_ID(wr_id).offset);
@@ -2549,6 +2564,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
assert(wr_id == wr->wr_id);
assert(wr->sg_list == &elt->sge);
assert(wr->num_sge == 1);
+   assert(elts_head < rxq->elts_n);
+   assert(rxq->elts_head < rxq->elts_n);
   

[dpdk-dev] [PATCH v2 05/23] mlx4: merge RX queue setup functions

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

Make rxq_setup_qp() handle inline support like rxq_setup_qp_rss() instead of
having two separate functions.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 61 -
 1 file changed, 9 insertions(+), 52 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 08b1b81..8be1574 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2653,10 +2653,9 @@ repost:
return ret;
 }

-#ifdef INLINE_RECV
-
 /**
- * Allocate a Queue Pair in case inline receive is supported.
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
  *
  * @param priv
  *   Pointer to private structure.
@@ -2676,7 +2675,6 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, 
uint16_t desc)
.send_cq = cq,
/* CQ to be associated with the receive queue. */
.recv_cq = cq,
-   .max_inl_recv = priv->inl_recv_size,
.cap = {
/* Max number of outstanding WRs. */
.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
@@ -2689,61 +2687,22 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, 
uint16_t desc)
 MLX4_PMD_SGE_WR_N),
},
.qp_type = IBV_QPT_RAW_PACKET,
-   .pd = priv->pd
+   .comp_mask = IBV_EXP_QP_INIT_ATTR_PD,
+   .pd = priv->pd,
};

-   attr.comp_mask = IBV_EXP_QP_INIT_ATTR_PD;
+#ifdef INLINE_RECV
+   attr.max_inl_recv = priv->inl_recv_size;
attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-
+#endif
return ibv_exp_create_qp(priv->ctx, &attr);
 }

-#else /* INLINE_RECV */
-
-/**
- * Allocate a Queue Pair.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
-   struct ibv_qp_init_attr attr = {
-   /* CQ to be associated with the send queue. */
-   .send_cq = cq,
-   /* CQ to be associated with the receive queue. */
-   .recv_cq = cq,
-   .cap = {
-   /* Max number of outstanding WRs. */
-   .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-   priv->device_attr.max_qp_wr :
-   desc),
-   /* Max number of scatter/gather elements in a WR. */
-   .max_recv_sge = ((priv->device_attr.max_sge <
- MLX4_PMD_SGE_WR_N) ?
-priv->device_attr.max_sge :
-MLX4_PMD_SGE_WR_N),
-   },
-   .qp_type = IBV_QPT_RAW_PACKET
-   };
-
-   return ibv_create_qp(priv->pd, &attr);
-}
-
-#endif /* INLINE_RECV */
-
 #ifdef RSS_SUPPORT

 /**
  * Allocate a RSS Queue Pair.
+ * Optionally setup inline receive if supported.
  *
  * @param priv
  *   Pointer to private structure.
@@ -2766,9 +2725,6 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, 
uint16_t desc,
.send_cq = cq,
/* CQ to be associated with the receive queue. */
.recv_cq = cq,
-#ifdef INLINE_RECV
-   .max_inl_recv = priv->inl_recv_size,
-#endif
.cap = {
/* Max number of outstanding WRs. */
.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
@@ -2787,6 +2743,7 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, 
uint16_t desc,
};

 #ifdef INLINE_RECV
+   attr.max_inl_recv = priv->inl_recv_size,
attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
 #endif
if (parent) {
-- 
2.1.0



[dpdk-dev] [PATCH v2 06/23] mlx4: allow applications to partially use fork()

2015-06-30 Thread Adrien Mazarguil
From: Olga Shern 

Although using the PMD from a forked process is still unsupported, this
commit makes Verbs safe enough for applications to call fork() for other
purposes.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8be1574..ed68beb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4686,6 +4686,14 @@ rte_mlx4_pmd_init(const char *name, const char *args)
 {
(void)name;
(void)args;
+   /*
+* RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+* huge pages. Calling ibv_fork_init() during init allows
+* applications to use fork() safely for purposes other than
+* using this PMD, which is not supported in forked processes.
+*/
+   setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+   ibv_fork_init();
rte_eal_pci_register(&mlx4_driver.pci_drv);
return 0;
 }
-- 
2.1.0



[dpdk-dev] [PATCH v2 07/23] mlx4: improve accuracy of link status information

2015-06-30 Thread Adrien Mazarguil
From: Olga Shern 

Query interface properties using the ethtool API instead of Verbs
through ibv_query_port(). The returned information is more accurate for
Ethernet links since several link speeds cannot be mapped to Verbs
semantics.

Signed-off-by: Olga Shern 
Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ed68beb..02dd894 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -254,7 +254,6 @@ struct priv {
struct rte_eth_dev *dev; /* Ethernet device. */
struct ibv_context *ctx; /* Verbs context. */
struct ibv_device_attr device_attr; /* Device properties. */
-   struct ibv_port_attr port_attr; /* Physical port properties. */
struct ibv_pd *pd; /* Protection Domain. */
/*
 * MAC addresses array and configuration bit-field.
@@ -3820,29 +3819,37 @@ static int
 mlx4_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
 {
struct priv *priv = dev->data->dev_private;
-   struct ibv_port_attr port_attr;
-   static const uint8_t width_mult[] = {
-   /* Multiplier values taken from devinfo.c in libibverbs. */
-   0, 1, 4, 0, 8, 0, 0, 0, 12, 0
+   struct ethtool_cmd edata = {
+   .cmd = ETHTOOL_GSET
};
+   struct ifreq ifr;
+   struct rte_eth_link dev_link;
+   int link_speed = 0;

(void)wait_to_complete;
-   errno = ibv_query_port(priv->ctx, priv->port, &port_attr);
-   if (errno) {
-   WARN("port query failed: %s", strerror(errno));
+   if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
+   WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
return -1;
}
-   dev->data->dev_link = (struct rte_eth_link){
-   .link_speed = (ibv_rate_to_mbps(mult_to_ibv_rate
-   (port_attr.active_speed)) *
-  width_mult[(port_attr.active_width %
-  sizeof(width_mult))]),
-   .link_duplex = ETH_LINK_FULL_DUPLEX,
-   .link_status = (port_attr.state == IBV_PORT_ACTIVE)
-   };
-   if (memcmp(&port_attr, &priv->port_attr, sizeof(port_attr))) {
+   memset(&dev_link, 0, sizeof(dev_link));
+   dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+   (ifr.ifr_flags & IFF_RUNNING));
+   ifr.ifr_data = &edata;
+   if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+   WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+strerror(errno));
+   return -1;
+   }
+   link_speed = ethtool_cmd_speed(&edata);
+   if (link_speed == -1)
+   dev_link.link_speed = 0;
+   else
+   dev_link.link_speed = link_speed;
+   dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+   ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+   if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) {
/* Link status changed. */
-   priv->port_attr = port_attr;
+   dev->data->dev_link = dev_link;
return 0;
}
/* Link status is still the same. */
@@ -4487,7 +4494,6 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

priv->ctx = ctx;
priv->device_attr = device_attr;
-   priv->port_attr = port_attr;
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
-- 
2.1.0



[dpdk-dev] [PATCH v2 08/23] mlx4: use MOFED 3.0 extended flow steering API

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

This commit drops "exp" from related function and type names to stop using
the experimental API.

Signed-off-by: Olga Shern 
Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 54 -
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 02dd894..028e455 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -195,9 +195,9 @@ struct rxq {
 * may contain several specifications, one per configured VLAN ID.
 */
BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-   struct ibv_exp_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
-   struct ibv_exp_flow *promisc_flow; /* Promiscuous flow. */
-   struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
+   struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+   struct ibv_flow *promisc_flow; /* Promiscuous flow. */
+   struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -1872,7 +1872,7 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
  (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
  mac_index);
assert(rxq->mac_flow[mac_index] != NULL);
-   claim_zero(ibv_exp_destroy_flow(rxq->mac_flow[mac_index]));
+   claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
rxq->mac_flow[mac_index] = NULL;
BITFIELD_RESET(rxq->mac_configured, mac_index);
 }
@@ -1917,7 +1917,7 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
unsigned int vlans = 0;
unsigned int specs = 0;
unsigned int i, j;
-   struct ibv_exp_flow *flow;
+   struct ibv_flow *flow;

assert(mac_index < elemof(priv->mac));
if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
@@ -1929,28 +1929,28 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int 
mac_index)
specs = (vlans ? vlans : 1);

/* Allocate flow specification on the stack. */
-   struct ibv_exp_flow_attr data
+   struct ibv_flow_attr data
[1 +
-(sizeof(struct ibv_exp_flow_spec_eth[specs]) /
- sizeof(struct ibv_exp_flow_attr)) +
-!!(sizeof(struct ibv_exp_flow_spec_eth[specs]) %
-   sizeof(struct ibv_exp_flow_attr))];
-   struct ibv_exp_flow_attr *attr = (void *)&data[0];
-   struct ibv_exp_flow_spec_eth *spec = (void *)&data[1];
+(sizeof(struct ibv_flow_spec_eth[specs]) /
+ sizeof(struct ibv_flow_attr)) +
+!!(sizeof(struct ibv_flow_spec_eth[specs]) %
+   sizeof(struct ibv_flow_attr))];
+   struct ibv_flow_attr *attr = (void *)&data[0];
+   struct ibv_flow_spec_eth *spec = (void *)&data[1];

/*
 * No padding must be inserted by the compiler between attr and spec.
 * This layout is expected by libibverbs.
 */
assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-   *attr = (struct ibv_exp_flow_attr){
-   .type = IBV_EXP_FLOW_ATTR_NORMAL,
+   *attr = (struct ibv_flow_attr){
+   .type = IBV_FLOW_ATTR_NORMAL,
.num_of_specs = specs,
.port = priv->port,
.flags = 0
};
-   *spec = (struct ibv_exp_flow_spec_eth){
-   .type = IBV_EXP_FLOW_SPEC_ETH,
+   *spec = (struct ibv_flow_spec_eth){
+   .type = IBV_FLOW_SPEC_ETH,
.size = sizeof(*spec),
.val = {
.dst_mac = {
@@ -1981,7 +1981,7 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
  vlans);
/* Create related flow. */
errno = 0;
-   flow = ibv_exp_create_flow(rxq->qp, attr);
+   flow = ibv_create_flow(rxq->qp, attr);
if (flow == NULL) {
int err = errno;

@@ -2168,9 +2168,9 @@ end:
 static int
 rxq_allmulticast_enable(struct rxq *rxq)
 {
-   struct ibv_exp_flow *flow;
-   struct ibv_exp_flow_attr attr = {
-   .type = IBV_EXP_FLOW_ATTR_MC_DEFAULT,
+   struct ibv_flow *flow;
+   struct ibv_flow_attr attr = {
+   .type = IBV_FLOW_ATTR_MC_DEFAULT,
.num_of_specs = 0,
.port = rxq->priv->port,
.flags = 0
@@ -2180,7 +2180,7 @@ rxq_allmulticast_enable(struct rxq *rxq)
if (rxq->allmulti_flow != NULL)
return EBUSY;
errno = 0;
-   flow = ibv_exp_create_flow(rxq->qp, &attr);
+   flow = ibv_create_flow(rxq->qp, &attr);
if (flow == NULL) {
/* It's not clear whether errno is always set in this case. */
ERROR("%p: flow configuratio

[dpdk-dev] [PATCH v2 09/23] mlx4: fix error message for invalid number of descriptors

2015-06-30 Thread Adrien Mazarguil
From: Or Ami 

The number of descriptors must be a multiple of MLX4_PMD_SGE_WR_N.

Signed-off-by: Or Ami 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 028e455..c87facb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1353,7 +1353,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
(void)conf; /* Thresholds configuration (ignored). */
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of TX descriptors (must be a"
- " multiple of %d)", (void *)dev, desc);
+ " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
desc /= MLX4_PMD_SGE_WR_N;
@@ -3002,7 +3002,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
}
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of RX descriptors (must be a"
- " multiple of %d)", (void *)dev, desc);
+ " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
/* Get mbuf length. */
-- 
2.1.0



[dpdk-dev] [PATCH v2 10/23] mlx4: remove provision for flow creation failure in DMFS A0 mode

2015-06-30 Thread Adrien Mazarguil
From: Or Ami 

Starting from MLNX_OFED 3.0 FW 2.34.5000 when working with optimized
steering mode (-7) QPs can be attached to the port's MAC, therefore no need
for the check.

Signed-off-by: Or Ami 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index c87facb..8da21cd 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -272,7 +272,6 @@ struct priv {
uint8_t port; /* Physical port number. */
unsigned int started:1; /* Device started, flows enabled. */
unsigned int promisc:1; /* Device in promiscuous mode. */
-   unsigned int promisc_ok:1; /* Promiscuous flow is supported. */
unsigned int allmulti:1; /* Device receives all multicast packets. */
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
@@ -1983,25 +1982,6 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
errno = 0;
flow = ibv_create_flow(rxq->qp, attr);
if (flow == NULL) {
-   int err = errno;
-
-   /* Flow creation failure is not fatal when in DMFS A0 mode.
-* Ignore error if promiscuity is already enabled or can be
-* enabled. */
-   if (priv->promisc_ok)
-   return 0;
-   if ((rxq->promisc_flow != NULL) ||
-   (rxq_promiscuous_enable(rxq) == 0)) {
-   if (rxq->promisc_flow != NULL)
-   rxq_promiscuous_disable(rxq);
-   WARN("cannot configure normal flow but promiscuous"
-" mode is fine, assuming promiscuous optimization"
-" is enabled"
-" (options mlx4_core log_num_mgm_entry_size=-7)");
-   priv->promisc_ok = 1;
-   return 0;
-   }
-   errno = err;
/* It's not clear whether errno is always set in this case. */
ERROR("%p: flow configuration failed, errno=%d: %s",
  (void *)rxq, errno,
-- 
2.1.0



[dpdk-dev] [PATCH v2 11/23] mlx4: fix support for multiple VLAN filters

2015-06-30 Thread Adrien Mazarguil
From: Olga Shern 

This commit fixes the "Multiple RX VLAN filters can be configured, but only
the first one works" bug. Since a single flow specification cannot contain
several VLAN definitions, the flows table is extended with MLX4_MAX_VLAN_IDS
possible specifications per configured MAC address.

Signed-off-by: Olga Shern 
Signed-off-by: Or Ami 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 174 
 1 file changed, 115 insertions(+), 59 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8da21cd..37aca55 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -33,8 +33,6 @@

 /*
  * Known limitations:
- * - Multiple RX VLAN filters can be configured, but only the first one
- *   works properly.
  * - RSS hash key and options cannot be modified.
  * - Hardware counters aren't implemented.
  */
@@ -191,11 +189,10 @@ struct rxq {
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
/*
-* There is exactly one flow configured per MAC address. Each flow
-* may contain several specifications, one per configured VLAN ID.
+* Each VLAN ID requires a separate flow steering rule.
 */
BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-   struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+   struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
struct ibv_flow *promisc_flow; /* Promiscuous flow. */
struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
@@ -1843,15 +1840,17 @@ rxq_free_elts(struct rxq *rxq)
 }

 /**
- * Unregister a MAC address from a RX queue.
+ * Delete flow steering rule.
  *
  * @param rxq
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index.
+ * @param vlan_index
+ *   VLAN index.
  */
 static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 {
 #ifndef NDEBUG
struct priv *priv = rxq->priv;
@@ -1859,20 +1858,43 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int 
mac_index)
(const uint8_t (*)[ETHER_ADDR_LEN])
priv->mac[mac_index].addr_bytes;
 #endif
+   assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
+   DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " (VLAN ID %" PRIu16 ")",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index, priv->vlan_filter[vlan_index].id);
+   claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
+   rxq->mac_flow[mac_index][vlan_index] = NULL;
+}
+
+/**
+ * Unregister a MAC address from a RX queue.
+ *
+ * @param rxq
+ *   Pointer to RX queue structure.
+ * @param mac_index
+ *   MAC address index.
+ */
+static void
+rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+{
+   struct priv *priv = rxq->priv;
+   unsigned int i;
+   unsigned int vlans = 0;

assert(mac_index < elemof(priv->mac));
-   if (!BITFIELD_ISSET(rxq->mac_configured, mac_index)) {
-   assert(rxq->mac_flow[mac_index] == NULL);
+   if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
return;
+   for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
+   if (!priv->vlan_filter[i].enabled)
+   continue;
+   rxq_del_flow(rxq, mac_index, i);
+   vlans++;
+   }
+   if (!vlans) {
+   rxq_del_flow(rxq, mac_index, 0);
}
-   DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x"
- " index %u",
- (void *)rxq,
- (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
- mac_index);
-   assert(rxq->mac_flow[mac_index] != NULL);
-   claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
-   rxq->mac_flow[mac_index] = NULL;
BITFIELD_RESET(rxq->mac_configured, mac_index);
 }

@@ -1896,47 +1918,37 @@ static int rxq_promiscuous_enable(struct rxq *);
 static void rxq_promiscuous_disable(struct rxq *);

 /**
- * Register a MAC address in a RX queue.
+ * Add single flow steering rule.
  *
  * @param rxq
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index to register.
+ * @param vlan_index
+ *   VLAN index. Use -1 for a flow without VLAN.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 {
+   struct ibv_flow *flow;
struct priv *priv = rxq->priv;
const uint8_t (*mac)[ETHER_ADDR_LEN] =
-   (const uint8_t (*)[ETHER_ADDR_LEN])
-   pri

[dpdk-dev] [PATCH v2 12/23] mlx4: query netdevice to get initial MAC address

2015-06-30 Thread Adrien Mazarguil
From: Or Ami 

Querying the netdevice instead of deriving the port's MAC address from its
GID is less prone to errors. There is no guarantee that the GID will always
contain it nor that the algorithm won't change.

Signed-off-by: Or Ami 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 37aca55..cdc679a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4305,22 +4305,25 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device 
*device,
 }

 /**
- * Derive MAC address from port GID.
+ * Get MAC address by querying netdevice.
  *
+ * @param[in] priv
+ *   struct priv for the requested device.
  * @param[out] mac
  *   MAC address output buffer.
- * @param port
- *   Physical port number.
- * @param[in] gid
- *   Port GID.
+ *
+ * @return
+ *   0 on success, -1 on failure and errno is set.
  */
-static void
-mac_from_gid(uint8_t (*mac)[ETHER_ADDR_LEN], uint32_t port, uint8_t *gid)
+static int
+priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 {
-   memcpy(&(*mac)[0], gid + 8, 3);
-   memcpy(&(*mac)[3], gid + 13, 3);
-   if (port == 1)
-   (*mac)[0] ^= 2;
+   struct ifreq request;
+
+   if (priv_ifreq(priv, SIOCGIFHWADDR, &request))
+   return -1;
+   memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+   return 0;
 }

 /* Support up to 32 adapters. */
@@ -4482,7 +4485,6 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_exp_device_attr exp_device_attr;
 #endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
-   union ibv_gid temp_gid;

 #ifdef HAVE_EXP_QUERY_DEVICE
exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
@@ -4594,12 +4596,12 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

(void)mlx4_getenv_int;
priv->vf = vf;
-   if (ibv_query_gid(ctx, port, 0, &temp_gid)) {
-   ERROR("ibv_query_gid() failure");
+   /* Configure the first MAC address by default. */
+   if (priv_get_mac(priv, &mac.addr_bytes)) {
+   ERROR("cannot get MAC address, is mlx4_en loaded?"
+ " (errno: %s)", strerror(errno));
goto port_error;
}
-   /* Configure the first MAC address by default. */
-   mac_from_gid(&mac.addr_bytes, port, temp_gid.raw);
INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
 priv->port,
 mac.addr_bytes[0], mac.addr_bytes[1],
-- 
2.1.0



[dpdk-dev] [PATCH v2 13/23] mlx4: use MOFED 3.0 fast verbs interface for RX operations

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

This commit replaces the CQ polling and QP posting functions
(mlx4_rx_burst() only) with a new low level interface to improve
performance.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Gilad Berman 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 249 +++-
 1 file changed, 162 insertions(+), 87 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cdc679a..1881f5b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -188,6 +188,8 @@ struct rxq {
struct ibv_mr *mr; /* Memory Region (for mp). */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
+   struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+   struct ibv_exp_cq_family *if_cq; /* CQ interface. */
/*
 * Each VLAN ID requires a separate flow steering rule.
 */
@@ -2319,11 +2321,35 @@ rxq_promiscuous_disable(struct rxq *rxq)
 static void
 rxq_cleanup(struct rxq *rxq)
 {
+   struct ibv_exp_release_intf_params params;
+
DEBUG("cleaning up %p", (void *)rxq);
if (rxq->sp)
rxq_free_elts_sp(rxq);
else
rxq_free_elts(rxq);
+   if (rxq->if_qp != NULL) {
+   assert(rxq->priv != NULL);
+   assert(rxq->priv->ctx != NULL);
+   assert(rxq->qp != NULL);
+   params = (struct ibv_exp_release_intf_params){
+   .comp_mask = 0,
+   };
+   claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+   rxq->if_qp,
+   ¶ms));
+   }
+   if (rxq->if_cq != NULL) {
+   assert(rxq->priv != NULL);
+   assert(rxq->priv->ctx != NULL);
+   assert(rxq->cq != NULL);
+   params = (struct ibv_exp_release_intf_params){
+   .comp_mask = 0,
+   };
+   claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+   rxq->if_cq,
+   ¶ms));
+   }
if (rxq->qp != NULL) {
rxq_promiscuous_disable(rxq);
rxq_allmulticast_disable(rxq);
@@ -2360,34 +2386,23 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
const unsigned int elts_n = rxq->elts_n;
unsigned int elts_head = rxq->elts_head;
-   struct ibv_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = &head.next;
struct ibv_recv_wr *bad_wr;
-   int ret = 0;
-   int wcs_n;
-   int i;
+   unsigned int i;
+   unsigned int pkts_ret = 0;
+   int ret;

if (unlikely(!rxq->sp))
return mlx4_rx_burst(dpdk_rxq, pkts, pkts_n);
if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
return 0;
-   wcs_n = ibv_poll_cq(rxq->cq, pkts_n, wcs);
-   if (unlikely(wcs_n == 0))
-   return 0;
-   if (unlikely(wcs_n < 0)) {
-   DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
- (void *)rxq, wcs_n);
-   return 0;
-   }
-   assert(wcs_n <= (int)pkts_n);
-   /* For each work completion. */
-   for (i = 0; (i != wcs_n); ++i) {
-   struct ibv_wc *wc = &wcs[i];
-   uint64_t wr_id = wc->wr_id;
-   uint32_t len = wc->byte_len;
+   for (i = 0; (i != pkts_n); ++i) {
struct rxq_elt_sp *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
+   uint64_t wr_id = wr->wr_id;
+   unsigned int len;
+   unsigned int pkt_buf_len;
struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
struct rte_mbuf **pkt_buf_next = &pkt_buf;
unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
@@ -2398,26 +2413,51 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
(void)wr_id;
 #endif
assert(wr_id < rxq->elts_n);
-   assert(wr_id == wr->wr_id);
assert(wr->sg_list == elt->sges);
assert(wr->num_sge == elemof(elt->sges));
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
-   /* Link completed WRs together for repost. */
-   *next = wr;
-   next = &wr->next;
-   if (unlikely(wc->status != IBV_WC_SUCCESS)) {
-   /* Whatever, just repost the offending WR. */
-   DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
- " status (%d): %s",
- (void *)rxq, wc->wr_id, wc->status,
-

[dpdk-dev] [PATCH v2 14/23] mlx4: improve performance by requesting TX completion events less often

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

Instead of requesting a completion event for each TX burst, request it on a
fixed schedule once every MLX4_PMD_TX_PER_COMP_REQ (currently 64) packets to
improve performance.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 54 -
 drivers/net/mlx4/mlx4.h |  3 +++
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 1881f5b..f76f415 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -243,6 +243,8 @@ struct txq {
unsigned int elts_head; /* Current index in (*elts)[]. */
unsigned int elts_tail; /* First element awaiting completion. */
unsigned int elts_comp; /* Number of completion requests. */
+   unsigned int elts_comp_cd; /* Countdown for next completion request. */
+   unsigned int elts_comp_cd_init; /* Initial value for countdown. */
struct mlx4_txq_stats stats; /* TX queue counters. */
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
@@ -810,6 +812,12 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
txq->elts_head = 0;
txq->elts_tail = 0;
txq->elts_comp = 0;
+   /* Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
+* at least 4 times per ring. */
+   txq->elts_comp_cd_init =
+   ((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
+MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
+   txq->elts_comp_cd = txq->elts_comp_cd_init;
txq->elts_linear = elts_linear;
txq->mr_linear = mr_linear;
assert(ret == 0);
@@ -896,9 +904,9 @@ txq_cleanup(struct txq *txq)
  * Manage TX completions.
  *
  * When sending a burst, mlx4_tx_burst() posts several WRs.
- * To improve performance, a completion event is only required for the last of
- * them. Doing so discards completion information for other WRs, but this
- * information would not be used anyway.
+ * To improve performance, a completion event is only required once every
+ * MLX4_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
+ * for other WRs, but this information would not be used anyway.
  *
  * @param txq
  *   Pointer to TX queue structure.
@@ -910,7 +918,7 @@ static int
 txq_complete(struct txq *txq)
 {
unsigned int elts_comp = txq->elts_comp;
-   unsigned int elts_tail;
+   unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
struct ibv_wc wcs[elts_comp];
int wcs_n;
@@ -932,17 +940,12 @@ txq_complete(struct txq *txq)
elts_comp -= wcs_n;
assert(elts_comp <= txq->elts_comp);
/*
-* Work Completion ID contains the associated element index in
-* (*txq->elts)[]. Since WCs are returned in order, we only need to
-* look at the last WC to clear older Work Requests.
-*
 * Assume WC status is successful as nothing can be done about it
 * anyway.
 */
-   elts_tail = WR_ID(wcs[wcs_n - 1].wr_id).id;
-   /* Consume the last WC. */
-   if (++elts_tail >= elts_n)
-   elts_tail = 0;
+   elts_tail += wcs_n * txq->elts_comp_cd_init;
+   if (elts_tail >= elts_n)
+   elts_tail -= elts_n;
txq->elts_tail = elts_tail;
txq->elts_comp = elts_comp;
return 0;
@@ -1062,10 +1065,13 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
unsigned int elts_head = txq->elts_head;
const unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
+   unsigned int elts_comp_cd = txq->elts_comp_cd;
+   unsigned int elts_comp = 0;
unsigned int i;
unsigned int max;
int err;

+   assert(elts_comp_cd != 0);
txq_complete(txq);
max = (elts_n - (elts_head - elts_tail));
if (max > elts_n)
@@ -1243,6 +1249,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
else
 #endif
wr->send_flags = 0;
+   /* Request TX completion. */
+   if (unlikely(--elts_comp_cd == 0)) {
+   elts_comp_cd = txq->elts_comp_cd_init;
+   ++elts_comp;
+   wr->send_flags |= IBV_SEND_SIGNALED;
+   }
if (++elts_head >= elts_n)
elts_head = 0;
 #ifdef MLX4_PMD_SOFT_COUNTERS
@@ -1259,14 +1271,11 @@ stop:
txq->stats.opackets += i;
 #endif
*wr_next = NULL;
-   /* The last WR is the only one asking for a completion event. */
-   containerof(wr_next, mlx4_send_wr_t, next)->
-   send_flags |= IBV_SEND_SIGNALED;
err = mlx4_post_send(txq->qp, head.next, &bad_wr);
if (unlikely(err)) {
unsigned int u

[dpdk-dev] [PATCH v2 15/23] mlx4: use MOFED 3.0 fast verbs interface for TX operations

2015-06-30 Thread Adrien Mazarguil
The "raw" post send interface was experimental and has been deprecated. This
commit replaces it with a new low level interface that dissociates post and
flush (doorbell) operations for improved QP performance.

The CQ polling function is updated as well.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/Makefile |   4 --
 drivers/net/mlx4/mlx4.c   | 167 +++---
 2 files changed, 85 insertions(+), 86 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index ce1f2b0..fd74dc8 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -109,10 +109,6 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_DEVICE_ATTR_INLINE_RECV_SZ $(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   SEND_RAW_WR_SUPPORT \
-   infiniband/verbs.h \
-   type 'struct ibv_send_wr_raw' $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
HAVE_EXP_QUERY_DEVICE \
infiniband/verbs.h \
type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f76f415..3dff64d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -139,15 +139,6 @@ static inline void wr_id_t_check(void)
(void)wr_id_t_check;
 }

-/* If raw send operations are available, use them since they are faster. */
-#ifdef SEND_RAW_WR_SUPPORT
-typedef struct ibv_send_wr_raw mlx4_send_wr_t;
-#define mlx4_post_send ibv_post_send_raw
-#else
-typedef struct ibv_send_wr mlx4_send_wr_t;
-#define mlx4_post_send ibv_post_send
-#endif
-
 struct mlx4_rxq_stats {
unsigned int idx; /**< Mapping index. */
 #ifdef MLX4_PMD_SOFT_COUNTERS
@@ -212,7 +203,7 @@ struct rxq {

 /* TX element. */
 struct txq_elt {
-   mlx4_send_wr_t wr; /* Work Request. */
+   struct ibv_send_wr wr; /* Work Request. */
struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
 };
@@ -235,6 +226,8 @@ struct txq {
} mp2mr[MLX4_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
+   struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+   struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 #if MLX4_PMD_MAX_INLINE > 0
uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
 #endif
@@ -797,7 +790,7 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
}
for (i = 0; (i != elts_n); ++i) {
struct txq_elt *elt = &(*elts)[i];
-   mlx4_send_wr_t *wr = &elt->wr;
+   struct ibv_send_wr *wr = &elt->wr;

/* Configure WR. */
WR_ID(wr->wr_id).id = i;
@@ -883,10 +876,33 @@ txq_free_elts(struct txq *txq)
 static void
 txq_cleanup(struct txq *txq)
 {
+   struct ibv_exp_release_intf_params params;
size_t i;

DEBUG("cleaning up %p", (void *)txq);
txq_free_elts(txq);
+   if (txq->if_qp != NULL) {
+   assert(txq->priv != NULL);
+   assert(txq->priv->ctx != NULL);
+   assert(txq->qp != NULL);
+   params = (struct ibv_exp_release_intf_params){
+   .comp_mask = 0,
+   };
+   claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+   txq->if_qp,
+   ¶ms));
+   }
+   if (txq->if_cq != NULL) {
+   assert(txq->priv != NULL);
+   assert(txq->priv->ctx != NULL);
+   assert(txq->cq != NULL);
+   params = (struct ibv_exp_release_intf_params){
+   .comp_mask = 0,
+   };
+   claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+   txq->if_cq,
+   ¶ms));
+   }
if (txq->qp != NULL)
claim_zero(ibv_destroy_qp(txq->qp));
if (txq->cq != NULL)
@@ -920,7 +936,6 @@ txq_complete(struct txq *txq)
unsigned int elts_comp = txq->elts_comp;
unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
-   struct ibv_wc wcs[elts_comp];
int wcs_n;

if (unlikely(elts_comp == 0))
@@ -929,7 +944,7 @@ txq_complete(struct txq *txq)
DEBUG("%p: processing %u work requests completions",
  (void *)txq, elts_comp);
 #endif
-   wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
+   wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
@@ -1059,9 +1074,8 @@ static uint16_t
 mlx4_tx_burst(void *dpdk_txq, struct rte_

[dpdk-dev] [PATCH v2 16/23] mlx4: move scattered TX processing to helper function

2015-06-30 Thread Adrien Mazarguil
This commit makes scattered TX support entirely optional by moving it to a
separate function that is only available when MLX4_PMD_SGE_WR_N > 1.

Improves performance when scattered support is not needed.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 248 +---
 1 file changed, 170 insertions(+), 78 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 3dff64d..acf1290 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1025,6 +1025,8 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
return txq->mp2mr[i].lkey;
 }

+#if MLX4_PMD_SGE_WR_N > 1
+
 /**
  * Copy scattered mbuf contents to a single linear buffer.
  *
@@ -1058,6 +1060,146 @@ linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
 }

 /**
+ * Handle scattered buffers for mlx4_tx_burst().
+ *
+ * @param txq
+ *   TX queue structure.
+ * @param segs
+ *   Number of segments in buf.
+ * @param elt
+ *   TX queue element to fill.
+ * @param[in] buf
+ *   Buffer to process.
+ * @param elts_head
+ *   Index of the linear buffer to use if necessary (normally txq->elts_head).
+ *
+ * @return
+ *   Processed packet size in bytes or (unsigned int)-1 in case of failure.
+ */
+static unsigned int
+tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
+   struct rte_mbuf *buf, unsigned int elts_head)
+{
+   struct ibv_send_wr *wr = &elt->wr;
+   unsigned int sent_size = 0;
+   unsigned int j;
+   int linearize = 0;
+
+   /* When there are too many segments, extra segments are
+* linearized in the last SGE. */
+   if (unlikely(segs > elemof(elt->sges))) {
+   segs = (elemof(elt->sges) - 1);
+   linearize = 1;
+   }
+   /* Set WR fields. */
+   assert((rte_pktmbuf_mtod(buf, uintptr_t) -
+   (uintptr_t)buf) <= 0x);
+   WR_ID(wr->wr_id).offset =
+   (rte_pktmbuf_mtod(buf, uintptr_t) -
+(uintptr_t)buf);
+   wr->num_sge = segs;
+   /* Register segments as SGEs. */
+   for (j = 0; (j != segs); ++j) {
+   struct ibv_sge *sge = &elt->sges[j];
+   uint32_t lkey;
+
+   /* Retrieve Memory Region key for this memory pool. */
+   lkey = txq_mp2mr(txq, buf->pool);
+   if (unlikely(lkey == (uint32_t)-1)) {
+   /* MR does not exist. */
+   DEBUG("%p: unable to get MP <-> MR association",
+ (void *)txq);
+   /* Clean up TX element. */
+   WR_ID(elt->wr.wr_id).offset = 0;
+#ifndef NDEBUG
+   /* For assert(). */
+   while (j) {
+   --j;
+   --sge;
+   sge->addr = 0;
+   sge->length = 0;
+   sge->lkey = 0;
+   }
+   wr->num_sge = 0;
+#endif
+   goto stop;
+   }
+   /* Sanity checks, only relevant with debugging enabled. */
+   assert(sge->addr == 0);
+   assert(sge->length == 0);
+   assert(sge->lkey == 0);
+   /* Update SGE. */
+   sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+   if (txq->priv->vf)
+   rte_prefetch0((volatile void *)
+ (uintptr_t)sge->addr);
+   sge->length = DATA_LEN(buf);
+   sge->lkey = lkey;
+   sent_size += sge->length;
+   buf = NEXT(buf);
+   }
+   /* If buf is not NULL here and is not going to be linearized,
+* nb_segs is not valid. */
+   assert(j == segs);
+   assert((buf == NULL) || (linearize));
+   /* Linearize extra segments. */
+   if (linearize) {
+   struct ibv_sge *sge = &elt->sges[segs];
+   linear_t *linear = &(*txq->elts_linear)[elts_head];
+   unsigned int size = linearize_mbuf(linear, buf);
+
+   assert(segs == (elemof(elt->sges) - 1));
+   if (size == 0) {
+   /* Invalid packet. */
+   DEBUG("%p: packet too large to be linearized.",
+ (void *)txq);
+   /* Clean up TX element. */
+   WR_ID(elt->wr.wr_id).offset = 0;
+#ifndef NDEBUG
+   /* For assert(). */
+   while (j) {
+   --j;
+   --sge;
+   sge->addr = 0;
+   sge->length = 0;
+   sge->lkey = 0;
+   }
+   wr->num_sge = 0;
+#endif
+   goto stop;
+   }
+

[dpdk-dev] [PATCH v2 17/23] mlx4: shrink TX queue elements for better performance

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

TX queue elements (struct txq_elt) contain WR and SGE structures required by
ibv_post_send(). This commit replaces them with a single pointer to the
related TX mbuf considering that:

- There is no need to keep these structures around forever since the
  hardware doesn't access them after ibv_post_send() and send_pending*()
  have returned.

- The TX queue index stored in the WR ID field is not used for completions
  anymore since they use a separate counter (elts_comp_cd).

- The WR structure itself was only useful for ibv_post_send(), it is
  currently only used to store the mbuf data address and an offset to the
  mbuf structure in the WR ID field. send_pending*() callbacks only require
  SGEs or buffer pointers.

Therefore for single segment mbufs, send_pending() or send_pending_inline()
can be used directly without involving SGEs. For scattered mbufs, SGEs are
allocated on the stack and passed to send_pending_sg_list().

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 244 +---
 1 file changed, 84 insertions(+), 160 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index acf1290..f251eb4 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -203,9 +203,7 @@ struct rxq {

 /* TX element. */
 struct txq_elt {
-   struct ibv_send_wr wr; /* Work Request. */
-   struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
-   /* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+   struct rte_mbuf *buf;
 };

 /* Linear buffer type. It is used when transmitting buffers with too many
@@ -790,14 +788,8 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
}
for (i = 0; (i != elts_n); ++i) {
struct txq_elt *elt = &(*elts)[i];
-   struct ibv_send_wr *wr = &elt->wr;

-   /* Configure WR. */
-   WR_ID(wr->wr_id).id = i;
-   WR_ID(wr->wr_id).offset = 0;
-   wr->sg_list = &elt->sges[0];
-   wr->opcode = IBV_WR_SEND;
-   /* Other fields are updated during TX. */
+   elt->buf = NULL;
}
DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
txq->elts_n = elts_n;
@@ -856,10 +848,9 @@ txq_free_elts(struct txq *txq)
for (i = 0; (i != elemof(*elts)); ++i) {
struct txq_elt *elt = &(*elts)[i];

-   if (WR_ID(elt->wr.wr_id).offset == 0)
+   if (elt->buf == NULL)
continue;
-   rte_pktmbuf_free((void *)((uintptr_t)elt->sges[0].addr -
-   WR_ID(elt->wr.wr_id).offset));
+   rte_pktmbuf_free(elt->buf);
}
rte_free(elts);
 }
@@ -1072,35 +1063,37 @@ linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
  *   Buffer to process.
  * @param elts_head
  *   Index of the linear buffer to use if necessary (normally txq->elts_head).
+ * @param[out] sges
+ *   Array filled with SGEs on success.
  *
  * @return
- *   Processed packet size in bytes or (unsigned int)-1 in case of failure.
+ *   A structure containing the processed packet size in bytes and the
+ *   number of SGEs. Both fields are set to (unsigned int)-1 in case of
+ *   failure.
  */
-static unsigned int
+static struct tx_burst_sg_ret {
+   unsigned int length;
+   unsigned int num;
+}
 tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
-   struct rte_mbuf *buf, unsigned int elts_head)
+   struct rte_mbuf *buf, unsigned int elts_head,
+   struct ibv_sge (*sges)[MLX4_PMD_SGE_WR_N])
 {
-   struct ibv_send_wr *wr = &elt->wr;
unsigned int sent_size = 0;
unsigned int j;
int linearize = 0;

/* When there are too many segments, extra segments are
 * linearized in the last SGE. */
-   if (unlikely(segs > elemof(elt->sges))) {
-   segs = (elemof(elt->sges) - 1);
+   if (unlikely(segs > elemof(*sges))) {
+   segs = (elemof(*sges) - 1);
linearize = 1;
}
-   /* Set WR fields. */
-   assert((rte_pktmbuf_mtod(buf, uintptr_t) -
-   (uintptr_t)buf) <= 0x);
-   WR_ID(wr->wr_id).offset =
-   (rte_pktmbuf_mtod(buf, uintptr_t) -
-(uintptr_t)buf);
-   wr->num_sge = segs;
+   /* Update element. */
+   elt->buf = buf;
/* Register segments as SGEs. */
for (j = 0; (j != segs); ++j) {
-   struct ibv_sge *sge = &elt->sges[j];
+   struct ibv_sge *sge = &(*sges)[j];
uint32_t lkey;

/* Retrieve Memory Region key for this memory pool. */
@@ -1110,24 +1103,9 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct 
txq_elt *elt,
DEBUG("%p: unable to get MP <-> MR association",
  (void *)txq);
 

[dpdk-dev] [PATCH v2 18/23] mlx4: prefetch completed TX mbufs before releasing them

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f251eb4..52f3fbb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1205,6 +1205,9 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
max = pkts_n;
for (i = 0; (i != max); ++i) {
struct rte_mbuf *buf = pkts[i];
+   unsigned int elts_head_next =
+   (((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
+   struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
struct txq_elt *elt = &(*txq->elts)[elts_head];
unsigned int segs = NB_SEGS(buf);
 #ifdef MLX4_PMD_SOFT_COUNTERS
@@ -1253,6 +1256,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
if (txq->priv->vf)
rte_prefetch0((volatile void *)
  (uintptr_t)addr);
+   RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Put packet into send queue. */
 #if MLX4_PMD_MAX_INLINE > 0
if (length <= txq->max_inline)
@@ -1283,6 +1287,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
  &sges);
if (ret.length == (unsigned int)-1)
goto stop;
+   RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Put SG list into send queue. */
err = txq->if_qp->send_pending_sg_list
(txq->qp,
@@ -1300,8 +1305,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
goto stop;
 #endif /* MLX4_PMD_SGE_WR_N > 1 */
}
-   if (++elts_head >= elts_n)
-   elts_head = 0;
+   elts_head = elts_head_next;
 #ifdef MLX4_PMD_SOFT_COUNTERS
/* Increment sent bytes counter. */
txq->stats.obytes += sent_size;
-- 
2.1.0



[dpdk-dev] [PATCH v2 19/23] mlx4: add L3 and L4 checksum offload support

2015-06-30 Thread Adrien Mazarguil
From: Gilad Berman 

Mellanox ConnectX-3 adapters can handle L3 (IPv4) and L4 (TCP, UDP, TCP6,
UDP6) RX checksums validation and TX checksums generation, with and without
802.1Q (VLAN) headers.

Signed-off-by: Gilad Berman 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 82 ++---
 1 file changed, 78 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 52f3fbb..fa9216f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -139,6 +139,12 @@ static inline void wr_id_t_check(void)
(void)wr_id_t_check;
 }

+/* Transpose flags. Useful to convert IBV to DPDK flags. */
+#define TRANSPOSE(val, from, to) \
+   (((from) >= (to)) ? \
+(((val) & (from)) / ((from) / (to))) : \
+(((val) & (from)) * ((to) / (from
+
 struct mlx4_rxq_stats {
unsigned int idx; /**< Mapping index. */
 #ifdef MLX4_PMD_SOFT_COUNTERS
@@ -196,6 +202,7 @@ struct rxq {
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
+   unsigned int csum:1; /* Enable checksum offloading. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
@@ -268,6 +275,7 @@ struct priv {
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
+   unsigned int hw_csum:1; /* Checksum offload is supported. */
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
 #ifdef INLINE_RECV
@@ -1233,6 +1241,10 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
++elts_comp;
send_flags |= IBV_EXP_QP_BURST_SIGNALED;
}
+   /* Should we enable HW CKSUM offload */
+   if (buf->ol_flags &
+   (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM))
+   send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
if (likely(segs == 1)) {
uintptr_t addr;
uint32_t length;
@@ -2404,6 +2416,36 @@ rxq_cleanup(struct rxq *rxq)
memset(rxq, 0, sizeof(*rxq));
 }

+/**
+ * Translate RX completion flags to offload flags.
+ *
+ * @param[in] rxq
+ *   Pointer to RX queue structure.
+ * @param flags
+ *   RX completion flags returned by poll_length_flags().
+ *
+ * @return
+ *   Offload flags (ol_flags) for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
+{
+   uint32_t ol_flags;
+
+   ol_flags =
+   TRANSPOSE(flags, IBV_EXP_CQ_RX_IPV4_PACKET, PKT_RX_IPV4_HDR) |
+   TRANSPOSE(flags, IBV_EXP_CQ_RX_IPV6_PACKET, PKT_RX_IPV6_HDR);
+   if (rxq->csum)
+   ol_flags |=
+   TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+   TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
+   return ol_flags;
+}
+
 static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);

@@ -2448,6 +2490,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct rte_mbuf **pkt_buf_next = &pkt_buf;
unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
unsigned int j = 0;
+   uint32_t flags;

/* Sanity checks. */
 #ifdef NDEBUG
@@ -2458,7 +2501,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
assert(wr->num_sge == elemof(elt->sges));
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
-   ret = rxq->if_cq->poll_length(rxq->cq, NULL, NULL);
+   ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+   &flags);
if (unlikely(ret < 0)) {
struct ibv_wc wc;
int wcs_n;
@@ -2584,7 +2628,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
PKT_LEN(pkt_buf) = pkt_buf_len;
-   pkt_buf->ol_flags = 0;
+   pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);

/* Return packet. */
*(pkts++) = pkt_buf;
@@ -2661,6 +2705,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct rte_mbuf *seg = (vo

[dpdk-dev] [PATCH v2 20/23] mlx4: add L2 tunnel (VXLAN) checksum offload support

2015-06-30 Thread Adrien Mazarguil
Depending on adapters features and VXLAN support in the kernel, VXLAN frames
can be automatically recognized, in which case checksum validation and
generation occurs on inner and outer L3 and L4.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 49 -
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index fa9216f..3c72235 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -203,6 +203,7 @@ struct rxq {
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
unsigned int csum:1; /* Enable checksum offloading. */
+   unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
@@ -276,6 +277,7 @@ struct priv {
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
unsigned int hw_csum:1; /* Checksum offload is supported. */
+   unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
 #ifdef INLINE_RECV
@@ -1243,8 +1245,21 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
}
/* Should we enable HW CKSUM offload */
if (buf->ol_flags &
-   (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM))
+   (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
+   /* HW does not support checksum offloads at arbitrary
+* offsets but automatically recognizes the packet
+* type. For inner L3/L4 checksums, only VXLAN (UDP)
+* tunnels are currently supported.
+*
+* FIXME: since PKT_TX_UDP_TUNNEL_PKT has been removed,
+* the outer packet type is unknown. All we know is
+* that the L2 header is of unusual length (not
+* ETHER_HDR_LEN with or without 802.1Q header). */
+   if ((buf->l2_len != ETHER_HDR_LEN) &&
+   (buf->l2_len != (ETHER_HDR_LEN + 4)))
+   send_flags |= IBV_EXP_QP_BURST_TUNNEL;
+   }
if (likely(segs == 1)) {
uintptr_t addr;
uint32_t length;
@@ -2443,6 +2458,25 @@ rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
TRANSPOSE(~flags,
  IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
  PKT_RX_L4_CKSUM_BAD);
+   /*
+* PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD are used in place
+* of PKT_RX_EIP_CKSUM_BAD because the latter is not functional
+* (its value is 0).
+*/
+   if ((flags & IBV_EXP_CQ_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
+   ol_flags |=
+   TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV4_PACKET,
+ PKT_RX_TUNNEL_IPV4_HDR) |
+   TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV6_PACKET,
+ PKT_RX_TUNNEL_IPV6_HDR) |
+   TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+   TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
return ol_flags;
 }

@@ -2976,6 +3010,10 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
rxq->csum = tmpl.csum;
}
+   if (priv->hw_csum_l2tun) {
+   tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   rxq->csum_l2tun = tmpl.csum_l2tun;
+   }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -3200,6 +3238,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
/* Toggle RX checksum offload if hardware supports it. */
if (priv->hw_csum)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   if (priv->hw_csum_l2tun)
+   tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
 

[dpdk-dev] [PATCH v2 21/23] mlx4: associate resource domain with CQs and QPs to enhance performance

2015-06-30 Thread Adrien Mazarguil
From: Alex Rosenbaum 

RDs are a new feature of MOFED 3.0 that makes Verbs aware of how CQ and QP
resources are being used for internal performance tuning.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 96 ++---
 1 file changed, 84 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 3c72235..631ab02 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -207,6 +207,7 @@ struct rxq {
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
+   struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };

 /* TX element. */
@@ -248,6 +249,7 @@ struct txq {
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
unsigned int socket; /* CPU socket ID for allocations. */
+   struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };

 struct priv {
@@ -908,6 +910,17 @@ txq_cleanup(struct txq *txq)
claim_zero(ibv_destroy_qp(txq->qp));
if (txq->cq != NULL)
claim_zero(ibv_destroy_cq(txq->cq));
+   if (txq->rd != NULL) {
+   struct ibv_exp_destroy_res_domain_attr attr = {
+   .comp_mask = 0,
+   };
+
+   assert(txq->priv != NULL);
+   assert(txq->priv->ctx != NULL);
+   claim_zero(ibv_exp_destroy_res_domain(txq->priv->ctx,
+ txq->rd,
+ &attr));
+   }
for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
if (txq->mp2mr[i].mp == NULL)
break;
@@ -1388,7 +1401,9 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
};
union {
struct ibv_exp_query_intf_params params;
-   struct ibv_qp_init_attr init;
+   struct ibv_exp_qp_init_attr init;
+   struct ibv_exp_res_domain_init_attr rd;
+   struct ibv_exp_cq_init_attr cq;
struct ibv_exp_qp_attr mod;
} attr;
enum ibv_exp_query_intf_status status;
@@ -1402,7 +1417,24 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
}
desc /= MLX4_PMD_SGE_WR_N;
/* MRs will be registered in mp2mr[] later. */
-   tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+   attr.rd = (struct ibv_exp_res_domain_init_attr){
+   .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+   .thread_model = IBV_EXP_THREAD_SINGLE,
+   .msg_model = IBV_EXP_MSG_HIGH_BW,
+   };
+   tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+   if (tmpl.rd == NULL) {
+   ret = ENOMEM;
+   ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+   goto error;
+   }
+   attr.cq = (struct ibv_exp_cq_init_attr){
+   .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+   .res_domain = tmpl.rd,
+   };
+   tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
if (tmpl.cq == NULL) {
ret = ENOMEM;
ERROR("%p: CQ creation failure: %s",
@@ -1413,7 +1445,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
  priv->device_attr.max_qp_wr);
DEBUG("priv->device_attr.max_sge is %d",
  priv->device_attr.max_sge);
-   attr.init = (struct ibv_qp_init_attr){
+   attr.init = (struct ibv_exp_qp_init_attr){
/* CQ to be associated with the send queue. */
.send_cq = tmpl.cq,
/* CQ to be associated with the receive queue. */
@@ -1435,9 +1467,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
.qp_type = IBV_QPT_RAW_PACKET,
/* Do *NOT* enable this, completions events are managed per
 * TX burst. */
-   .sq_sig_all = 0
+   .sq_sig_all = 0,
+   .pd = priv->pd,
+   .res_domain = tmpl.rd,
+   .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
};
-   tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
+   tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init);
if (tmpl.qp == NULL) {
ret = (errno ? errno : EINVAL);
ERROR("%p: QP creation failure: %s",
@@ -2426,6 +2462,17 @@ rxq_cleanup(struct rxq *rxq)
}
if (rxq->cq != NULL)
claim_zero(ibv_destroy_cq(rxq->cq));
+   if (rxq->

[dpdk-dev] [PATCH v2 22/23] mlx4: disable multicast echo when device is not VF

2015-06-30 Thread Adrien Mazarguil
From: Olga Shern 

Multicast loopback must be disabled on PF devices to prevent the adapter
from sending frames back. Required with MOFED 3.0.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/Makefile | 5 +
 drivers/net/mlx4/mlx4.c   | 7 +++
 2 files changed, 12 insertions(+)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index fd74dc8..725717f 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -112,6 +112,11 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
HAVE_EXP_QUERY_DEVICE \
infiniband/verbs.h \
type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
+   infiniband/verbs.h \
+   enum IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
+   $(AUTOCONF_OUTPUT)

 mlx4.o: mlx4_autoconf.h

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 631ab02..f4491e7 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1534,6 +1534,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
.intf_scope = IBV_EXP_INTF_GLOBAL,
.intf = IBV_EXP_INTF_QP_BURST,
.obj = tmpl.qp,
+#ifdef HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK
+   /* MC loopback must be disabled when not using a VF. */
+   .family_flags =
+   (!priv->vf ?
+IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK :
+0),
+#endif
};
tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
if (tmpl.if_qp == NULL) {
-- 
2.1.0



[dpdk-dev] [PATCH v2 23/23] doc: update mlx4 documentation following MOFED 3.0 changes

2015-06-30 Thread Adrien Mazarguil
- Add RX/TX L3/L4 checksum offloading and validation.
- Update kernel module parameters section.
- Update prerequisites for MOFED and firmware versions.
- Remove optimized external libraries section. MOFED now provides enhanced
  support directly without having to install modified libraries.

Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/mlx4.rst | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index ac2dd56..c33aa38 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -84,12 +84,13 @@ Features and limitations
 - All multicast mode is supported.
 - Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
+- Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
+- Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.

 .. break

 - RSS hash key cannot be modified.
 - Hardware counters are not implemented (they are software counters).
-- Checksum offloads are not supported yet.

 Configuration
 -
@@ -175,9 +176,8 @@ below.

   - **-1**: force device-managed flow steering (DMFS).
   - **-7**: configure optimized steering mode to improve performance with the
-following limitation: Ethernet frames with the port MAC address as the
-destination cannot be received, even in promiscuous mode. Additional MAC
-addresses can still be set by ``rte_eth_dev_mac_addr_addr()``.
+following limitation: VLAN filtering is not supported with this mode.
+This is the recommended mode in case VLAN filter is not needed.

 Prerequisites
 -
@@ -232,8 +232,8 @@ DPDK and must be installed separately:

 Currently supported by DPDK:

-- Mellanox OFED **2.4-1**.
-- Firmware version **2.33.5000** and higher.
+- Mellanox OFED **3.0**.
+- Firmware version **2.34.5000** and higher.

 Getting Mellanox OFED
 ~
@@ -255,23 +255,6 @@ required from that distribution.
this DPDK release was developed and tested against is strongly
recommended. Please check the `prerequisites`_.

-Getting libibverbs and libmlx4 from DPDK.org
-
-
-Based on Mellanox OFED, optimized libibverbs and libmlx4 versions can be
-optionally downloaded from DPDK.org:
-
-``_
-
-Some enhancements are done for better performance with DPDK applications and
-are not merged upstream yet.
-
-Since it is partly achieved by tuning compilation options to disable features
-not needed by DPDK, linking these libraries statically and avoid system-wide
-installation is the preferred method.
-
-Installation documentation is available from the above link.
-
 Usage example
 -

-- 
2.1.0



[dpdk-dev] [PATCH] add rx and tx byte counter statistics for PCAP PMD

2015-06-30 Thread Klaus Degner
PCAP PMD vdev accounts only rx and tx packet statistics.
This patch adds support for rx and tx bytes statistics.

Signed-off-by: Klaus Degner 
---
 drivers/net/pcap/rte_eth_pcap.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index a6ed5bd..cca75ec 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -68,6 +68,7 @@ struct pcap_rx_queue {
uint8_t in_port;
struct rte_mempool *mb_pool;
volatile unsigned long rx_pkts;
+   volatile unsigned long rx_bytes;
volatile unsigned long err_pkts;
char name[PATH_MAX];
char type[ETH_PCAP_ARG_MAXLEN];
@@ -77,6 +78,7 @@ struct pcap_tx_queue {
pcap_dumper_t *dumper;
pcap_t *pcap;
volatile unsigned long tx_pkts;
+   volatile unsigned long tx_bytes;
volatile unsigned long err_pkts;
char name[PATH_MAX];
char type[ETH_PCAP_ARG_MAXLEN];
@@ -140,6 +142,7 @@ eth_pcap_rx(void *queue,
struct pcap_rx_queue *pcap_q = queue;
uint16_t num_rx = 0;
uint16_t buf_size;
+   uint32_t bytes_rx = 0;

if (unlikely(pcap_q->pcap == NULL || nb_pkts == 0))
return 0;
@@ -170,6 +173,7 @@ eth_pcap_rx(void *queue,
mbuf->port = pcap_q->in_port;
bufs[num_rx] = mbuf;
num_rx++;
+   bytes_rx += header.len;
} else {
/* pcap packet will not fit in the mbuf, so drop packet 
*/
RTE_LOG(ERR, PMD,
@@ -179,6 +183,7 @@ eth_pcap_rx(void *queue,
}
}
pcap_q->rx_pkts += num_rx;
+   pcap_q->rx_bytes += bytes_rx;
return num_rx;
 }

@@ -205,6 +210,7 @@ eth_pcap_tx_dumper(void *queue,
struct rte_mbuf *mbuf;
struct pcap_tx_queue *dumper_q = queue;
uint16_t num_tx = 0;
+   uint32_t bytes_tx = 0;
struct pcap_pkthdr header;

if (dumper_q->dumper == NULL || nb_pkts == 0)
@@ -220,6 +226,7 @@ eth_pcap_tx_dumper(void *queue,
rte_pktmbuf_mtod(mbuf, void*));
rte_pktmbuf_free(mbuf);
num_tx++;
+   bytes_tx += header.len;
}

/*
@@ -229,6 +236,7 @@ eth_pcap_tx_dumper(void *queue,
 */
pcap_dump_flush(dumper_q->dumper);
dumper_q->tx_pkts += num_tx;
+   dumper_q->tx_bytes += bytes_tx;
dumper_q->err_pkts += nb_pkts - num_tx;
return num_tx;
 }
@@ -402,25 +410,31 @@ eth_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *igb_stats)
 {
unsigned i;
-   unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+   unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0, rx_b_total 
= 0, tx_b_total = 0;
const struct pmd_internals *internal = dev->data->dev_private;

for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS && i < 
internal->nb_rx_queues;
i++) {
igb_stats->q_ipackets[i] = internal->rx_queue[i].rx_pkts;
+   igb_stats->q_ibytes[i] = internal->rx_queue[i].rx_bytes;
rx_total += igb_stats->q_ipackets[i];
+   rx_b_total += igb_stats->q_ibytes[i];
}

for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS && i < 
internal->nb_tx_queues;
i++) {
igb_stats->q_opackets[i] = internal->tx_queue[i].tx_pkts;
+   igb_stats->q_obytes[i] = internal->tx_queue[i].tx_bytes;
igb_stats->q_errors[i] = internal->tx_queue[i].err_pkts;
tx_total += igb_stats->q_opackets[i];
+   tx_b_total += igb_stats->q_obytes[i];
tx_err_total += igb_stats->q_errors[i];
}

igb_stats->ipackets = rx_total;
+   igb_stats->ibytes = rx_b_total;
igb_stats->opackets = tx_total;
+   igb_stats->obytes = tx_b_total;
igb_stats->oerrors = tx_err_total;
 }

@@ -429,10 +443,13 @@ eth_stats_reset(struct rte_eth_dev *dev)
 {
unsigned i;
struct pmd_internals *internal = dev->data->dev_private;
-   for (i = 0; i < internal->nb_rx_queues; i++)
+   for (i = 0; i < internal->nb_rx_queues; i++) {
internal->rx_queue[i].rx_pkts = 0;
+   internal->rx_queue[i].rx_bytes = 0;
+   }
for (i = 0; i < internal->nb_tx_queues; i++) {
internal->tx_queue[i].tx_pkts = 0;
+   internal->tx_queue[i].tx_bytes = 0;
internal->tx_queue[i].err_pkts = 0;
}
 }
-- 
2.4.5



[dpdk-dev] [PATCH] mempool: improbe cache search

2015-06-30 Thread Olivier MATZ
Hi Zoltan,

On 06/25/2015 08:48 PM, Zoltan Kiss wrote:
> The current way has a few problems:
>
> - if cache->len < n, we copy our elements into the cache first, then
>into obj_table, that's unnecessary
> - if n >= cache_size (or the backfill fails), and we can't fulfil the
>request from the ring alone, we don't try to combine with the cache
> - if refill fails, we don't return anything, even if the ring has enough
>for our request
>
> This patch rewrites it severely:
> - at the first part of the function we only try the cache if cache->len < n
> - otherwise take our elements straight from the ring
> - if that fails but we have something in the cache, try to combine them
> - the refill happens at the end, and its failure doesn't modify our return
>value

Indeed, it looks easier to read that way. I checked the performance with
"mempool_perf_autotest" of app/test and it show that there is no
regression (it is even slightly better in some test cases).

There is a small typo in the title: s/improbe/improve
Please see also a comment below.

>
> Signed-off-by: Zoltan Kiss 
> ---
>   lib/librte_mempool/rte_mempool.h | 63 
> +---
>   1 file changed, 39 insertions(+), 24 deletions(-)
>
> diff --git a/lib/librte_mempool/rte_mempool.h 
> b/lib/librte_mempool/rte_mempool.h
> index a8054e1..896946c 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -948,34 +948,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void 
> **obj_table,
>   unsigned lcore_id = rte_lcore_id();
>   uint32_t cache_size = mp->cache_size;
>
> - /* cache is not enabled or single consumer */
> + cache = &mp->local_cache[lcore_id];
> + /* cache is not enabled or single consumer or not enough */
>   if (unlikely(cache_size == 0 || is_mc == 0 ||
> -  n >= cache_size || lcore_id >= RTE_MAX_LCORE))
> +  cache->len < n || lcore_id >= RTE_MAX_LCORE))
>   goto ring_dequeue;
>
> - cache = &mp->local_cache[lcore_id];
>   cache_objs = cache->objs;
>
> - /* Can this be satisfied from the cache? */
> - if (cache->len < n) {
> - /* No. Backfill the cache first, and then fill from it */
> - uint32_t req = n + (cache_size - cache->len);
> -
> - /* How many do we require i.e. number to fill the cache + the 
> request */
> - ret = rte_ring_mc_dequeue_bulk(mp->ring, 
> &cache->objs[cache->len], req);
> - if (unlikely(ret < 0)) {
> - /*
> -  * In the offchance that we are buffer constrained,
> -  * where we are not able to allocate cache + n, go to
> -  * the ring directly. If that fails, we are truly out of
> -  * buffers.
> -  */
> - goto ring_dequeue;
> - }
> -
> - cache->len += req;
> - }
> -
>   /* Now fill in the response ... */
>   for (index = 0, len = cache->len - 1; index < n; ++index, len--, 
> obj_table++)
>   *obj_table = cache_objs[len];
> @@ -984,7 +964,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void 
> **obj_table,
>
>   __MEMPOOL_STAT_ADD(mp, get_success, n);
>
> - return 0;
> + ret = 0;
> + goto cache_refill;
>
>   ring_dequeue:
>   #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
> @@ -995,11 +976,45 @@ ring_dequeue:
>   else
>   ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n);
>
> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> + if (ret < 0 && is_mc == 1 && cache->len > 0) {

if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0))  ?


> + uint32_t req = n - cache->len;
> +
> + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req);
> + if (ret == 0) {
> + cache_objs = cache->objs;
> + obj_table += req;
> + for (index = 0; index < cache->len;
> +  ++index, ++obj_table)
> + *obj_table = cache_objs[index];
> + cache->len = 0;
> + }
> + }
> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
> +
>   if (ret < 0)
>   __MEMPOOL_STAT_ADD(mp, get_fail, n);
>   else
>   __MEMPOOL_STAT_ADD(mp, get_success, n);
>
> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> +cache_refill:
> + /* If previous dequeue was OK and we have less than n, start refill */
> + if (ret == 0 && cache_size > 0 && cache->len < n) {

Not sure it's likely or unlikely there. I'll tend to say unlikely
as the cache size is probably big compared to n most of the time.

I don't know if it would have a real performance impact thought, but
I think it won't hurt.


Regards,
Olivier


> + uint32_t req = cache_size - cache->len;
> +
> + cache_objs = cache->objs;
> + ret = rte_ring_mc_dequeue_bulk(mp->ring,

[dpdk-dev] [PATCH v5] Add toeplitz hash algorithm used by RSS

2015-06-30 Thread Vladimir Medvedkin
Hi Bruce,

2015-06-29 15:40 GMT+03:00 Bruce Richardson :

> On Fri, Jun 19, 2015 at 01:31:13PM -0400, Vladimir Medvedkin wrote:
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC
> > or for simulating of RSS computation on specific NIC (for example
> > after GRE header decapsulating).
> >
> > v5 changes
> > - Fix errors reported by checkpatch.pl
> >
> > v4 changes
> > - Fix copyright
> > - rename bswap_mask constant, add rte_ prefix
> > - change rte_ipv[46]_tuple struct
> > - change rte_thash_load_v6_addr prototype
> >
> > v3 changes
> > - Rework API to be more generic
> > - Add sctp_tag into tuple
> >
> > v2 changes
> > - Add ipv6 support
> > - Various style fixes
> >
> > Signed-off-by: Vladimir Medvedkin 
> > ---
> >  lib/librte_hash/Makefile|   1 +
> >  lib/librte_hash/rte_thash.h | 207
> 
> >  2 files changed, 208 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> 
> > +static const __m128i rte_thash_ipv6_bswap_mask = {
> > + 0x0405060700010203, 0x0C0D0E0F08090A0B};
> > +
> > +#define RTE_THASH_V4_L3   2  /*calculate hash of ipv4 header
> only*/
> > +#define RTE_THASH_V4_L4   3  /*calculate hash of ipv4 +
> transport headers*/
> > +#define RTE_THASH_V6_L3   8  /*calculate hash of ipv6 header
> only */
> > +#define RTE_THASH_V6_L4   9  /*calculate hash of ipv6 +
> transport headers */
> > +
>
> Comment as on V4 patch - add LEN to name to make it clear they are lengths
> in quadwords.
>
Agree, but length dwords

>
> > +/**
> > + * IPv4 tuple
> > + * addreses and ports/sctp_tag have to be CPU byte order
> > + */
> > +struct rte_ipv4_tuple {
> > + uint32_tsrc_addr;
> > + uint32_tdst_addr;
> > + union {
> > + struct {
> > + uint16_t dport;
> > + uint16_t sport;
> > + };
> > + uint32_tsctp_tag;
> > + };
> > +};
> > +
> > +/**
> > + * IPv6 tuple
> > + * Addresses have to be filled by rte_thash_load_v6_addr()
> > + * ports/sctp_tag have to be CPU byte order
> > + */
> > +struct rte_ipv6_tuple {
> > + uint8_t src_addr[16];
> > + uint8_t dst_addr[16];
> > + union {
> > + struct {
> > + uint16_t dport;
> > + uint16_t sport;
> > + };
> > + uint32_tsctp_tag;
> > + };
> > +};
> > +
> > +union rte_thash_tuple {
> > + struct rte_ipv4_tuple   v4;
> > + struct rte_ipv6_tuple   v6;
> > +} __aligned(size);
> > +
> This throws an error on compilation [with unit test patch also applied].
>
I will never copy paste
I will never copy paste
I will never... =)

>
> In file included from /home/bruce/dpdk.org/app/test/test_thash.c:40:0:
> /home/bruce/dpdk.org/x86_64-native-linuxapp-gcc/include/rte_thash.h:106:1:
> error: parameter names (without types) in function declaration [-Werror]
>  } __aligned(size);
>   ^
>
> +/**
> > + * Prepare special converted key to use with rte_softrss_be()
> > + * @param orig
> > + *   pointer to original RSS key
> > + * @param targ
> > + *   pointer to target RSS key
> > + * @param len
> > + *   RSS key length
> > + */
> > +static inline void
> > +rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < (len >> 2); i++)
> > + targ[i] = rte_be_to_cpu_32(orig[i]);
> > +}
> > +
> > +/**
> > + * Prepare and load IPv6 address
> > + * @param orig
> > + *   Pointer to ipv6 header of the original packet
> > + * @param targ
> > + *   Pointer to rte_ipv6_tuple structure
> > + */
> > +static inline void
> > +rte_thash_load_v6_addr(const struct ipv6_hdr *orig, union
> rte_thash_tuple *targ)
> > +{
> > + __m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
> > + *(__m128i *)targ->v6.src_addr =
> > + _mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
> > + ipv6 = _mm_loadu_si128((const __m128i *)orig->dst_addr);
> > + *(__m128i *)targ->v6.dst_addr =
> > + _mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
> > +}
>
> I think the function name needs to be pluralized, and the comment updated
> to make
> it clear that it is not just one IPv6 address that is loaded, but rather
> both
> source and destination.
>
I think no need to make too long function name like
rte_thash_load_v6_addresses(), instead I'll make comment more clear.
Is it enough?

>
> Regards,
> /Bruce
>


[dpdk-dev] [PATCH v7 04/12] eal/bsdapp: Change names of pci related data structure

2015-06-30 Thread Iremonger, Bernard
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 04/12] eal/bsdapp: Change names of pci related data
> structure
> 
> From: "Tetsuya.Mukawa" 
> 
> To merge pci code of linuxapp and bsdapp, this patch changes names like
> below.
>  - uio_map to pci_map
>  - uio_resource to mapped_pci_resource
>  - uio_res_list to mapped_pci_res_list
> 
> Signed-off-by: Tetsuya Mukawa 
Acked-by: Bernard Iremonger 


[dpdk-dev] [PATCH v7 03/12] eal: Fix memory leaks and needless increment of pci_map_addr

2015-06-30 Thread Iremonger, Bernard
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 03/12] eal: Fix memory leaks and needless increment of
> pci_map_addr
> 
> From: "Tetsuya.Mukawa" 
> 
> This patch fixes following memory leaks.
> - When open() is failed, uio_res and fds won't be freed in
>   pci_uio_map_resource().
> - When pci_map_resource() is failed but path is allocated correctly,
>   path and fds won't be freed in pci_uio_map_recource().
>   Also, some mapped resources should be freed.
> - When pci_uio_unmap() is called, path should be freed.
> 
> Also, fixes below.
> - When pci_map_resource() is failed, mapaddr will be MAP_FAILED.
>   In this case, pci_map_addr should not be incremented in
>   pci_uio_map_resource().
> - To shrink code, move close().
> - Remove fail variable.
> 
> Signed-off-by: Tetsuya Mukawa 
Acked-by:  Bernard Iremonger 



[dpdk-dev] DPDK Issues with OVS

2015-06-30 Thread Sundar Ramakrishnan
Hello,
I have openvswitch compiled with dpdk. The way I installed is 
posted?sundar-ramki/ovs-dpdk.
| ? |
| ? |  | ? | ? | ? | ? | ? |
| sundar-ramki/ovs-dpdkovs-dpdk - Openvswitch with DPDK |
|  |
| View on?github.com | Preview by Yahoo |
|  |
| ? |



I have the following questions -?
1. How do I figure out if my VM is really using Openvswitch with DPDK?
2. Are there any counters for packets entering and leaving the switch and/or 
dpdk?
3. Also, I am unable to use the ofctl commands. Following is what I get -?
[root at test1 ~]# ovs-ofctl show br23ovs-ofctl: br23 is not a bridge or a 
socket[root at test1 ~]# ovs-vsctl showd8cfc890-a4fc-4398-869c-59c5412a3e79? ? 
Bridge "br23"? ? ? ? Port "eth3"? ? ? ? ? ? Interface "eth3"? ? ? ? Port 
"br23"? ? ? ? ? ? Interface "br23"? ? ? ? ? ? ? ? type: internal? ? ? ? Port 
"eth2"? ? ? ? ? ? Interface "eth2"
4. I don't see my interfaces after I bind them with dpdk drivers. How should I 
attach ?my nics to virt-manager in order to start a vm?
Any input will be appreciated.
Thank youSRK


[dpdk-dev] [PATCH v5] Add toeplitz hash algorithm used by RSS

2015-06-30 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 03:14:55PM +0300, Vladimir Medvedkin wrote:
> Hi Bruce,
> 
> 2015-06-29 15:40 GMT+03:00 Bruce Richardson :
> 
> > On Fri, Jun 19, 2015 at 01:31:13PM -0400, Vladimir Medvedkin wrote:
> > > Software implementation of the Toeplitz hash function used by RSS.
> > > Can be used either for packet distribution on single queue NIC
> > > or for simulating of RSS computation on specific NIC (for example
> > > after GRE header decapsulating).
> > >
> > > v5 changes
> > > - Fix errors reported by checkpatch.pl
> > >
> > > v4 changes
> > > - Fix copyright
> > > - rename bswap_mask constant, add rte_ prefix
> > > - change rte_ipv[46]_tuple struct
> > > - change rte_thash_load_v6_addr prototype
> > >
> > > v3 changes
> > > - Rework API to be more generic
> > > - Add sctp_tag into tuple
> > >
> > > v2 changes
> > > - Add ipv6 support
> > > - Various style fixes
> > >
> > > Signed-off-by: Vladimir Medvedkin 
> > > ---
> > >  lib/librte_hash/Makefile|   1 +
> > >  lib/librte_hash/rte_thash.h | 207
> > 
> > >  2 files changed, 208 insertions(+)
> > >  create mode 100644 lib/librte_hash/rte_thash.h
> > >
> > 
> > > +static const __m128i rte_thash_ipv6_bswap_mask = {
> > > + 0x0405060700010203, 0x0C0D0E0F08090A0B};
> > > +
> > > +#define RTE_THASH_V4_L3   2  /*calculate hash of ipv4 header
> > only*/
> > > +#define RTE_THASH_V4_L4   3  /*calculate hash of ipv4 +
> > transport headers*/
> > > +#define RTE_THASH_V6_L3   8  /*calculate hash of ipv6 header
> > only */
> > > +#define RTE_THASH_V6_L4   9  /*calculate hash of ipv6 +
> > transport headers */
> > > +
> >
> > Comment as on V4 patch - add LEN to name to make it clear they are lengths
> > in quadwords.
> >
> Agree, but length dwords
> 
Yes, my mistake.

> >
> > > +/**
> > > + * IPv4 tuple
> > > + * addreses and ports/sctp_tag have to be CPU byte order
> > > + */
> > > +struct rte_ipv4_tuple {
> > > + uint32_tsrc_addr;
> > > + uint32_tdst_addr;
> > > + union {
> > > + struct {
> > > + uint16_t dport;
> > > + uint16_t sport;
> > > + };
> > > + uint32_tsctp_tag;
> > > + };
> > > +};
> > > +
> > > +/**
> > > + * IPv6 tuple
> > > + * Addresses have to be filled by rte_thash_load_v6_addr()
> > > + * ports/sctp_tag have to be CPU byte order
> > > + */
> > > +struct rte_ipv6_tuple {
> > > + uint8_t src_addr[16];
> > > + uint8_t dst_addr[16];
> > > + union {
> > > + struct {
> > > + uint16_t dport;
> > > + uint16_t sport;
> > > + };
> > > + uint32_tsctp_tag;
> > > + };
> > > +};
> > > +
> > > +union rte_thash_tuple {
> > > + struct rte_ipv4_tuple   v4;
> > > + struct rte_ipv6_tuple   v6;
> > > +} __aligned(size);
> > > +
> > This throws an error on compilation [with unit test patch also applied].
> >
> I will never copy paste
> I will never copy paste
> I will never... =)
> 
> >
> > In file included from /home/bruce/dpdk.org/app/test/test_thash.c:40:0:
> > /home/bruce/dpdk.org/x86_64-native-linuxapp-gcc/include/rte_thash.h:106:1:
> > error: parameter names (without types) in function declaration [-Werror]
> >  } __aligned(size);
> >   ^
> >
> > +/**
> > > + * Prepare special converted key to use with rte_softrss_be()
> > > + * @param orig
> > > + *   pointer to original RSS key
> > > + * @param targ
> > > + *   pointer to target RSS key
> > > + * @param len
> > > + *   RSS key length
> > > + */
> > > +static inline void
> > > +rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < (len >> 2); i++)
> > > + targ[i] = rte_be_to_cpu_32(orig[i]);
> > > +}
> > > +
> > > +/**
> > > + * Prepare and load IPv6 address
> > > + * @param orig
> > > + *   Pointer to ipv6 header of the original packet
> > > + * @param targ
> > > + *   Pointer to rte_ipv6_tuple structure
> > > + */
> > > +static inline void
> > > +rte_thash_load_v6_addr(const struct ipv6_hdr *orig, union
> > rte_thash_tuple *targ)
> > > +{
> > > + __m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
> > > + *(__m128i *)targ->v6.src_addr =
> > > + _mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
> > > + ipv6 = _mm_loadu_si128((const __m128i *)orig->dst_addr);
> > > + *(__m128i *)targ->v6.dst_addr =
> > > + _mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
> > > +}
> >
> > I think the function name needs to be pluralized, and the comment updated
> > to make
> > it clear that it is not just one IPv6 address that is loaded, but rather
> > both
> > source and destination.
> >
> I think no need to make too long function name like
> rte_thash_load_v6_addresses(), instead I'll make comment more clear.
> Is it enough?

I was actually 

[dpdk-dev] [PATCH v3 4/8] eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp

2015-06-30 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 05:08:41PM +0900, Tetsuya Mukawa wrote:
> On 2015/06/30 0:28, Bruce Richardson wrote:
> > On Mon, Jun 29, 2015 at 11:56:46AM +0900, Tetsuya Mukawa wrote:
> >> From: "Tetsuya.Mukawa" 
> >>
> >> This patch consolidates below functions, and implements these in common
> >> eal code.
> >>  - rte_eal_pci_probe_one_driver()
> >>  - rte_eal_pci_close_one_driver()
> >>
> >> Because pci_map_device() is only implemented in linuxapp, the patch
> >> implements it in bsdapp too. This implemented function will be merged to
> >> linuxapp one with later patch.
> >>
> >> Signed-off-by: Tetsuya Mukawa 
> >> ---
> >>  lib/librte_eal/bsdapp/eal/eal_pci.c|  74 ++---
> >>  lib/librte_eal/common/eal_common_pci.c | 129 
> >>  lib/librte_eal/common/eal_private.h|  21 ++---
> >>  lib/librte_eal/linuxapp/eal/eal_pci.c  | 148 
> >> ++---
> >>  4 files changed, 153 insertions(+), 219 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> >> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> index c7017eb..2a623e3 100644
> >> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> @@ -88,7 +88,7 @@ static struct rte_tailq_elem rte_uio_tailq = {
> >>  EAL_REGISTER_TAILQ(rte_uio_tailq)
> >>  
> >>  /* unbind kernel driver for this device */
> >> -static int
> >> +int
> >>  pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
> >>  {
> >>RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
> >> @@ -430,6 +430,13 @@ skipdev:
> >>return 0;
> >>  }
> >>  
> >> +/* Map pci device */
> >> +int
> >> +pci_map_device(struct rte_pci_device *dev)
> >> +{
> >> +  return pci_uio_map_resource(dev);
> >> +}
> >> +
> > These lines are added here, but removed again in the next patch in the 
> > series.
> > Though not wrong, per-se, it just seems untidy. Perhaps the patchset order 
> > needs
> > to be changed somewhat?
> >
> > /Bruce
> 
> Hi Bruce,
> 
> I appreciate your comment.
> Sure, I will change the order of these patches.
> Could you please check patches I will send later?
>
Yes, I'll test them on FreeBSD shortly.

/Bruce


[dpdk-dev] [PATCH v7 05/12] eal: Fix uio mapping differences between linuxapp and bsdapp

2015-06-30 Thread Iremonger, Bernard
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 05/12] eal: Fix uio mapping differences between linuxapp
> and bsdapp
> 
> From: "Tetsuya.Mukawa" 
> 
> This patch fixes below.
> - bsdapp
>  - Use map_id in pci_uio_map_resource().
>  - Fix interface of pci_map_resource().
>  - Move path variable of mapped_pci_resource structure to pci_map.
> - linuxapp
>  - Remove redundant error message of linuxapp.
> 
> 'pci_uio_map_resource()' is implemented in both linuxapp and bsdapp, but
> interface is different. The patch fixes the function of bsdapp to do same as
> linuxapp. After applying it, file descriptor should be opened and closed out 
> of
> pci_map_resource().
> 
> Signed-off-by: Tetsuya Mukawa 
Acked-by: Bernard Iremonger 


[dpdk-dev] [PATCH] mempool: improbe cache search

2015-06-30 Thread Zoltan Kiss


On 30/06/15 12:58, Olivier MATZ wrote:
> Hi Zoltan,
>
> On 06/25/2015 08:48 PM, Zoltan Kiss wrote:
>> The current way has a few problems:
>>
>> - if cache->len < n, we copy our elements into the cache first, then
>>into obj_table, that's unnecessary
>> - if n >= cache_size (or the backfill fails), and we can't fulfil the
>>request from the ring alone, we don't try to combine with the cache
>> - if refill fails, we don't return anything, even if the ring has enough
>>for our request
>>
>> This patch rewrites it severely:
>> - at the first part of the function we only try the cache if
>> cache->len < n
>> - otherwise take our elements straight from the ring
>> - if that fails but we have something in the cache, try to combine them
>> - the refill happens at the end, and its failure doesn't modify our
>> return
>>value
>
> Indeed, it looks easier to read that way. I checked the performance with
> "mempool_perf_autotest" of app/test and it show that there is no
> regression (it is even slightly better in some test cases).
>
> There is a small typo in the title: s/improbe/improve

Yes, I'll fix that.


> Please see also a comment below.
>
>>
>> Signed-off-by: Zoltan Kiss 
>> ---
>>   lib/librte_mempool/rte_mempool.h | 63
>> +---
>>   1 file changed, 39 insertions(+), 24 deletions(-)
>>
>> diff --git a/lib/librte_mempool/rte_mempool.h
>> b/lib/librte_mempool/rte_mempool.h
>> index a8054e1..896946c 100644
>> --- a/lib/librte_mempool/rte_mempool.h
>> +++ b/lib/librte_mempool/rte_mempool.h
>> @@ -948,34 +948,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void
>> **obj_table,
>>   unsigned lcore_id = rte_lcore_id();
>>   uint32_t cache_size = mp->cache_size;
>>
>> -/* cache is not enabled or single consumer */
>> +cache = &mp->local_cache[lcore_id];
>> +/* cache is not enabled or single consumer or not enough */
>>   if (unlikely(cache_size == 0 || is_mc == 0 ||
>> - n >= cache_size || lcore_id >= RTE_MAX_LCORE))
>> + cache->len < n || lcore_id >= RTE_MAX_LCORE))
>>   goto ring_dequeue;
>>
>> -cache = &mp->local_cache[lcore_id];
>>   cache_objs = cache->objs;
>>
>> -/* Can this be satisfied from the cache? */
>> -if (cache->len < n) {
>> -/* No. Backfill the cache first, and then fill from it */
>> -uint32_t req = n + (cache_size - cache->len);
>> -
>> -/* How many do we require i.e. number to fill the cache + the
>> request */
>> -ret = rte_ring_mc_dequeue_bulk(mp->ring,
>> &cache->objs[cache->len], req);
>> -if (unlikely(ret < 0)) {
>> -/*
>> - * In the offchance that we are buffer constrained,
>> - * where we are not able to allocate cache + n, go to
>> - * the ring directly. If that fails, we are truly out of
>> - * buffers.
>> - */
>> -goto ring_dequeue;
>> -}
>> -
>> -cache->len += req;
>> -}
>> -
>>   /* Now fill in the response ... */
>>   for (index = 0, len = cache->len - 1; index < n; ++index, len--,
>> obj_table++)
>>   *obj_table = cache_objs[len];
>> @@ -984,7 +964,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void
>> **obj_table,
>>
>>   __MEMPOOL_STAT_ADD(mp, get_success, n);
>>
>> -return 0;
>> +ret = 0;
>> +goto cache_refill;
>>
>>   ring_dequeue:
>>   #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
>> @@ -995,11 +976,45 @@ ring_dequeue:
>>   else
>>   ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n);
>>
>> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>> +if (ret < 0 && is_mc == 1 && cache->len > 0) {
>
> if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0))  ?

Ok

>
>
>> +uint32_t req = n - cache->len;
>> +
>> +ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req);
>> +if (ret == 0) {
>> +cache_objs = cache->objs;
>> +obj_table += req;
>> +for (index = 0; index < cache->len;
>> + ++index, ++obj_table)
>> +*obj_table = cache_objs[index];
>> +cache->len = 0;
>> +}
>> +}
>> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
>> +
>>   if (ret < 0)
>>   __MEMPOOL_STAT_ADD(mp, get_fail, n);
>>   else
>>   __MEMPOOL_STAT_ADD(mp, get_success, n);
>>
>> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>> +cache_refill:
>> +/* If previous dequeue was OK and we have less than n, start
>> refill */
>> +if (ret == 0 && cache_size > 0 && cache->len < n) {
>
> Not sure it's likely or unlikely there. I'll tend to say unlikely
> as the cache size is probably big compared to n most of the time.
>
> I don't know if it would have a real performance impact thought, but
> I think it won't hurt.

I think it's not obvious here which one should happen more often on the 
hot path. I think it's better to follow the rule of thumb: if you are 
not confident about the likelihood, just d

[dpdk-dev] [PATCH v7 08/12] eal: Consolidate pci_map and mapped_pci_resource of linuxapp and bsdapp

2015-06-30 Thread Iremonger, Bernard


> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 08/12] eal: Consolidate pci_map and
> mapped_pci_resource of linuxapp and bsdapp
> 
> From: "Tetsuya.Mukawa" 
> 
> This patch consolidates below structures, and defines them in common code.
>  - struct pci_map
>  - strucy mapped_pci_resouces
> 
> Signed-off-by: Tetsuya Mukawa 
Acked-by: Bernard Iremonger 


[dpdk-dev] [PATCH v7 09/12] eal: Consolidate pci_map/unmap_resource() of linuxapp and bsdapp

2015-06-30 Thread Iremonger, Bernard
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 09/12] eal: Consolidate pci_map/unmap_resource() of
> linuxapp and bsdapp
> 
> From: "Tetsuya.Mukawa" 
> 
> The patch consolidates below functions, and implemented in common eal
> code.
>  - pci_map_resource()
>  - pci_unmap_resource()
> 
> Signed-off-by: Tetsuya Mukawa 
Acked-by: Bernard Iremonger 



[dpdk-dev] [PATCH v7 11/12] eal: Consolidate pci_map/unmap_device() of linuxapp and bsdapp

2015-06-30 Thread Iremonger, Bernard


> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 11/12] eal: Consolidate pci_map/unmap_device() of
> linuxapp and bsdapp
> 
> From: "Tetsuya.Mukawa" 
> 
> The patch consolidates below functions, and implemented in common eal
> code.
>  - pci_map_device()
>  - pci_unmap_device()
> 
> Signed-off-by: Tetsuya Mukawa 
Acked-by: Bernard Iremonger 


[dpdk-dev] [PATCH v7 12/12] eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp

2015-06-30 Thread Iremonger, Bernard
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:24 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v7 12/12] eal: Consolidate
> rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp
> 
> From: "Tetsuya.Mukawa" 
> 
> This patch consolidates below functions, and implements these in common
> eal code.
>  - rte_eal_pci_probe_one_driver()
>  - rte_eal_pci_close_one_driver()
> 
> Because pci_map_device() is only implemented in linuxapp, the patch
> implements it in bsdapp too. This implemented function will be merged to
> linuxapp one with later patch.

Hi Tetsuya,

The description lines above seem to be out of date now as pci_map_device() is 
not implemented in the bsdapp now.

Regards,

Bernard.

> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/bsdapp/eal/eal_pci.c|  67 +---
>  lib/librte_eal/common/eal_common_pci.c | 133
> +-
>  lib/librte_eal/common/eal_private.h|  39 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c  | 142 
> +
>  4 files changed, 135 insertions(+), 246 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index c057f6a..508cfa7 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -84,7 +84,7 @@
>   */
> 
>  /* unbind kernel driver for this device */ -static int
> +int
>  pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)  {
>   RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not
> implemented "
> @@ -355,71 +355,6 @@ error:
>   return -1;
>  }
> 
> -/*
> - * If vendor/device ID match, call the devinit() function of the
> - * driver.
> - */
> -int
> -rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct
> rte_pci_device *dev) -{
> - const struct rte_pci_id *id_table;
> - int ret;
> -
> - for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
> -
> - /* check if device's identifiers match the driver's ones */
> - if (id_table->vendor_id != dev->id.vendor_id &&
> - id_table->vendor_id != PCI_ANY_ID)
> - continue;
> - if (id_table->device_id != dev->id.device_id &&
> - id_table->device_id != PCI_ANY_ID)
> - continue;
> - if (id_table->subsystem_vendor_id != dev-
> >id.subsystem_vendor_id &&
> - id_table->subsystem_vendor_id !=
> PCI_ANY_ID)
> - continue;
> - if (id_table->subsystem_device_id != dev-
> >id.subsystem_device_id &&
> - id_table->subsystem_device_id !=
> PCI_ANY_ID)
> - continue;
> -
> - struct rte_pci_addr *loc = &dev->addr;
> -
> - RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA
> socket %i\n",
> - loc->domain, loc->bus, loc->devid, loc-
> >function,
> - dev->numa_node);
> -
> - RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n", dev-
> >id.vendor_id,
> - dev->id.device_id, dr->name);
> -
> - /* no initialization when blacklisted, return without error */
> - if (dev->devargs != NULL &&
> - dev->devargs->type ==
> RTE_DEVTYPE_BLACKLISTED_PCI) {
> -
> - RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not
> initializing\n");
> - return 0;
> - }
> -
> - if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
> - /* map resources for devices that use igb_uio */
> - ret = pci_uio_map_resource(dev);
> - if (ret != 0)
> - return ret;
> - } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
> -rte_eal_process_type() == RTE_PROC_PRIMARY) {
> - /* unbind current driver */
> - if (pci_unbind_kernel_driver(dev) < 0)
> - return -1;
> - }
> -
> - /* reference driver structure */
> - dev->driver = dr;
> -
> - /* call the driver devinit() function */
> - return dr->devinit(dr, dev);
> - }
> - /* return positive value if driver is not found */
> - return 1;
> -}
> -
>  /* Init the PCI EAL subsystem */
>  int
>  rte_eal_pci_init(void)
> diff --git a/lib/librte_eal/common/eal_common_pci.c
> b/lib/librte_eal/common/eal_common_pci.c
> index c0be292..8ef8057 100644
> --- a/lib/librte_eal/common/eal_common_pci.c
> +++ b/lib/librte_eal/common/eal_common_pci.c
> @@ -138,7 +138,7 @@ pci_unmap_resource(void *requested_addr, size_t
> size)  }
> 
>  /* Map pci device */
> -i

[dpdk-dev] [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD

2015-06-30 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, June 30, 2015 9:27 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> Tetsuya.Mukawa
> Subject: [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD
> 
> From: "Tetsuya.Mukawa" 
> 
> This patch removes CONFIG_RTE_LIBRTE_EAL_HOTPLUG option, and enables
> it as default in both Linux and BSD.
> Also, to support port hotplug, rte_eal_pci_scan() and below missing symbols
> should be exported to ethdev library.
>  - rte_eal_parse_devargs_str()
>  - rte_eal_pci_close_one()
>  - rte_eal_pci_probe_one()
>  - rte_eal_pci_scan()
>  - rte_eal_vdev_init()
>  - rte_eal_vdev_uninit()
> 
> Signed-off-by: Tetsuya Mukawa 

Hi Tetsuya,

Would it be cleaner to add this patch to the  [PATCH  v7 12/12] patch set as  
patch 13 rather than having it as a separate patch?

Regards,

Bernard.

> ---
>  config/common_bsdapp  |  6 --
>  config/common_linuxapp|  5 -
>  lib/librte_eal/bsdapp/eal/eal_pci.c   |  6 +++---
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map |  6 ++
>  lib/librte_eal/common/eal_common_dev.c|  2 --
>  lib/librte_eal/common/eal_common_pci.c|  6 --
>  lib/librte_eal/common/eal_common_pci_uio.c|  2 --
>  lib/librte_eal/common/eal_private.h   |  2 --
>  lib/librte_eal/common/include/rte_pci.h   |  2 --
>  lib/librte_ether/rte_ethdev.c | 21 -
>  10 files changed, 9 insertions(+), 49 deletions(-)
> 
> diff --git a/config/common_bsdapp b/config/common_bsdapp index
> 464250b..c6e6e9c 100644
> --- a/config/common_bsdapp
> +++ b/config/common_bsdapp
> @@ -121,12 +121,6 @@ CONFIG_RTE_LIBRTE_EAL_BSDAPP=y
> CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n
> 
>  #
> -# Compile Environment Abstraction Layer to support hotplug -# So far,
> Hotplug functions only support linux -# -
> CONFIG_RTE_LIBRTE_EAL_HOTPLUG=n
> -
> -#
>  # Compile Environment Abstraction Layer to support Vmware TSC map  #
> CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
> diff --git a/config/common_linuxapp b/config/common_linuxapp index
> aae22f4..c33a6fe 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -119,11 +119,6 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
>  CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y
> 
>  #
> -# Compile Environment Abstraction Layer to support hotplug -# -
> CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
> -
> -#
>  # Compile Environment Abstraction Layer to support Vmware TSC map  #
> CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 508cfa7..0724c45 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -309,8 +309,8 @@ skipdev:
>   * Scan the content of the PCI bus, and add the devices in the devices
>   * list. Call pci_scan_one() for each pci entry found.
>   */
> -static int
> -pci_scan(void)
> +int
> +rte_eal_pci_scan(void)
>  {
>   int fd;
>   unsigned dev_count = 0;
> @@ -366,7 +366,7 @@ rte_eal_pci_init(void)
>   if (internal_config.no_pci)
>   return 0;
> 
> - if (pci_scan() < 0) {
> + if (rte_eal_pci_scan() < 0) {
>   RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n",
> __func__);
>   return -1;
>   }
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index 67b6a6c..7e850a9 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -37,14 +37,20 @@ DPDK_2.0 {
>   rte_eal_lcore_role;
>   rte_eal_mp_remote_launch;
>   rte_eal_mp_wait_lcore;
> + rte_eal_parse_devargs_str;
> + rte_eal_pci_close_one;
>   rte_eal_pci_dump;
>   rte_eal_pci_probe;
> + rte_eal_pci_probe_one;
>   rte_eal_pci_register;
> + rte_eal_pci_scan;
>   rte_eal_pci_unregister;
>   rte_eal_process_type;
>   rte_eal_remote_launch;
>   rte_eal_tailq_lookup;
>   rte_eal_tailq_register;
> + rte_eal_vdev_init;
> + rte_eal_vdev_uninit;
>   rte_eal_wait_lcore;
>   rte_exit;
>   rte_get_hpet_cycles;
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index 92a5a94..4089d66 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -125,7 +125,6 @@ rte_eal_dev_init(void)
>   return 0;
>  }
> 
> -#ifdef RTE_LIBRTE_EAL_HOTPLUG
>  int
>  rte_eal_vdev_uninit(const char *name)
>  {
> @@ -151,4 +150,3 @@ rte_eal_vdev_uninit(const char *name)
>   RTE_LOG(ERR, EAL, "no driver found for %s\n", name);
>   return -EINVAL;
>  }
> -#endif /* RTE_LIBRTE_EAL_HOTPLUG */
> diff --git a/lib/librte_eal/common/eal_common_pci.c
> b/lib/librte_eal/common/eal_common_pci.c
> index 8ef8057..3805aed 100644
> --- a/l

[dpdk-dev] Strange behavior with dpdk-2.0.0/examples/rxtx_callbacks

2015-06-30 Thread Abhishek Verma
Hi,

I am trying to understand DPDK and i
modified dpdk-2.0.0/examples/rxtx_callbacks so that i get a message
whenever i RX and TX a frame.

I added a printf in add_timestamps() and calc_latency() to know whenever i
am able to RX and TX a packet along with the # of pkts that i get --
print nb_pkts.

When i run i see that i am continuously getting packets that i try to TX
out. This is odd, because i was expecting to only see the frames that i
would especially send to the port that is DPDK enabled.

Is this the expected behavior?

Would be great if you guys can tell me in case i am missing something.

Screen SHOTS:

akabra at VirtualBox:~/dpdk/dpdk-2.0.0/examples/rxtx_callbacks$
dpdk_nic_bind.py --status
Network devices using DPDK-compatible driver

:00:08.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
unused=uio_pci_generic
:00:09.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
unused=uio_pci_generic

Network devices using kernel driver
===
:00:03.0 '82540EM Gigabit Ethernet Controller' if=eth1 drv=e1000
unused=igb_uio,uio_pci_generic *Active*

Other network devices
=

akabra at VirtualBox:~/dpdk/dpdk-2.0.0/examples/rxtx_callbacks$


akabra at VirtualBox:~/dpdk/dpdk-2.0.0/examples/rxtx_callbacks$ sudo
build/rxtx_callbacks -c 1 -n 4
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 3 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f7ec8e0 (size = 0x20)
EAL: Ask a virtual area of 0xa40 bytes
EAL: Virtual area found at 0x7f7ebe80 (size = 0xa40)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f7ebe40 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f7ebe00 (size = 0x20)
EAL: Ask a virtual area of 0x1540 bytes
EAL: Virtual area found at 0x7f7ea8a0 (size = 0x1540)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7f7ea860 (size = 0x20)
EAL: Requesting 256 pages of size 2MB from socket 0
EAL: TSC frequency is ~2594691 KHz
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable
clock cycles !
EAL: Master lcore 0 is ready (tid=ca023900;cpuset=[0])
PMD: ENICPMD trace: rte_enic_pmd_init
EAL: PCI device :00:03.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device :00:08.0 on NUMA socket -1
EAL:   probe driver: 8086:100f rte_em_pmd
EAL:   PCI memory mapped at 0x7f7ec900
PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x100f
EAL: PCI device :00:09.0 on NUMA socket -1
EAL:   probe driver: 8086:100f rte_em_pmd
EAL:   PCI memory mapped at 0x7f7ec9fcd000
PMD: eth_em_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x100f
PMD: eth_em_rx_queue_setup(): sw_ring=0x7f7ebf2f3a80 hw_ring=0x7f7ec8e41500
dma_addr=0xbf841500
PMD: eth_em_tx_queue_setup(): sw_ring=0x7f7ebf2f1980 hw_ring=0x7f7ec8e51500
dma_addr=0xbf851500
PMD: eth_em_start(): <<
Port 0 MAC: 08 00 27 f7 13 f5
PMD: eth_em_rx_queue_setup(): sw_ring=0x7f7ebf2f1280 hw_ring=0x7f7ec8e61500
dma_addr=0xbf861500
PMD: eth_em_tx_queue_setup(): sw_ring=0x7f7ebf2ef180 hw_ring=0x7f7ec8e71500
dma_addr=0xbf871500
PMD: eth_em_start(): <<
Port 1 MAC: 08 00 27 7b 2d b2

Core 0 forwarding packets. [Ctrl+C to quit]
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32
Getting a packet 32!
Txing Packets 32

Thanks, Abhishek


[dpdk-dev] [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD

2015-06-30 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 04:08:08PM +0100, Iremonger, Bernard wrote:
> 
> > -Original Message-
> > From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> > Sent: Tuesday, June 30, 2015 9:27 AM
> > To: dev at dpdk.org
> > Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
> > Tetsuya.Mukawa
> > Subject: [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD
> > 
> > From: "Tetsuya.Mukawa" 
> > 
> > This patch removes CONFIG_RTE_LIBRTE_EAL_HOTPLUG option, and enables
> > it as default in both Linux and BSD.
> > Also, to support port hotplug, rte_eal_pci_scan() and below missing symbols
> > should be exported to ethdev library.
> >  - rte_eal_parse_devargs_str()
> >  - rte_eal_pci_close_one()
> >  - rte_eal_pci_probe_one()
> >  - rte_eal_pci_scan()
> >  - rte_eal_vdev_init()
> >  - rte_eal_vdev_uninit()
> > 
> > Signed-off-by: Tetsuya Mukawa 
> 
> Hi Tetsuya,
> 
> Would it be cleaner to add this patch to the  [PATCH  v7 12/12] patch set as  
> patch 13 rather than having it as a separate patch?
>
The other patch set is cleanup, merging BSD and Linuxapp code, so I think it's
best kept as a separate set. New features I'd suggest keeping separate from
cleanup. 
That being said, it is only one patch, so it probably doesn't matter much either
way. :-)

/Bruce


[dpdk-dev] dpdk-2.0.0: crash in ixgbe_recv_scattered_pkts_vec->_recv_raw_pkts_vec->desc_to_olflags_v

2015-06-30 Thread Gopakumar Choorakkot Edakkunni
Hi,

I am starting to tryout dpdk-2.0.0 with a simple Rx routine very
similar to the l2fwd example - I am running this on a c3.8xlarge aws
sr-iov enabled vpc instance (inside the vm it uses ixgbevf driver).

Once in every 10 minutes my application crashes in the recieve path.
And whenever I check the crash reason its because it always has three
packets in the burst array (I have provided array size of 32) instead
of the four that it tries to collect in one bunch. And inside
desc_to_olflags_v(), theres the assumption that there are four
packets, and obviously it crashes trying to access the fourth buffer.

With a brief look at the code, I really cant make out how its
guaranteed that we will always have four descriptors fully populated ?
After the first iteration, the loop does break out if (likely(var !=
RTE_IXGBE_DESCS_PER_LOOP)), but how about the very first iteration
where we might not have four ?

Any thoughts will be helpful here, trying to get my app working for
more than 10 minutes :)

#0  0x004c8c58 in desc_to_olflags_v (rx_pkts=0x7f1cb0ff17a0,
descs=)
at /home/gopa/dpdk-2.0.0/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:173
#1  _recv_raw_pkts_vec (rxq=0x7f1d364fcbc0, rx_pkts=,
nb_pkts=, split_packet=0x7f1cb0ff16d0 "")
at /home/gopa/dpdk-2.0.0/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:305
#2  0x004c9cea in ixgbe_recv_scattered_pkts_vec
(rx_queue=0x7f1d364fcbc0, rx_pkts=0x7f1cb0ff17a0,
nb_pkts=) at
/home/gopa/dpdk-2.0.0/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:475
#3  0x006b9bd8 in rte_eth_rx_burst (port_id=0 '\000',
queue_id=0, rx_pkts=0x7f1cb0ff17a0, nb_pkts=32)
at /home/gopa/usr/dpdk/include/rte_ethdev.h:2417

---

(gdb) frame 0
#0  0x004c8c58 in desc_to_olflags_v (rx_pkts=0x7f1cb0ff17a0,
descs=)
at /home/gopa/dpdk-2.0.0/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c:173
173/home/gopa/dpdk-2.0.0/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c: No
such file or directory.
(gdb) p rx_pkts[0]
$9 = (struct rte_mbuf *) 0x7f1cf8284640
(gdb) p rx_pkts[1]
$10 = (struct rte_mbuf *) 0x7f1d3991061c
(gdb) p rx_pkts[2]
$11 = (struct rte_mbuf *) 0x7f1d364fc740
(gdb) p rx_pkts[3]
$12 = (struct rte_mbuf *) 0x0
(gdb)


[dpdk-dev] [PATCH] lib: remove redundant definition of local symbols

2015-06-30 Thread Thomas Monjalon
2015-06-29 18:35, Thomas Monjalon:
> The new version nodes inherit from the previous ones which
> already include a default catch-all line for not exported symbols.
> 
> Reported-by: Helin Zhang 
> Signed-off-by: Thomas Monjalon 

Applied


[dpdk-dev] [PATCH v4 0/4] vhost: vhost unix domain socket cleanup

2015-06-30 Thread Thomas Monjalon
2015-06-30 17:20, Huawei Xie:
> vhost user could register multiple unix domain socket server, and use the path
> to identify the virtio device connecting to it. rte_vhost_driver_unregister
> will clean up the unix domain socket for the specified path.
> 
> v2 changes:
> -minor code style fix, remove unnecessary new line
> 
> v3 changes:
> update version map file
> 
> v4 changes:
> -add comment for potential unwanted callback on listenfds
> -call fdset_del_slot to remove connection fd 
> 
> Huawei Xie (4):
>   fdset_del_slot
>   vhost socket cleanup
>   version map file update
>   add comment for potential unwanted call on listenfds

Applied, thanks


[dpdk-dev] dpdk-2.0.0: crash in ixgbe_recv_scattered_pkts_vec->_recv_raw_pkts_vec->desc_to_olflags_v

2015-06-30 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 08:49:32AM -0700, Gopakumar Choorakkot Edakkunni wrote:
> Hi,
> 
> I am starting to tryout dpdk-2.0.0 with a simple Rx routine very
> similar to the l2fwd example - I am running this on a c3.8xlarge aws
> sr-iov enabled vpc instance (inside the vm it uses ixgbevf driver).
> 
> Once in every 10 minutes my application crashes in the recieve path.
> And whenever I check the crash reason its because it always has three
> packets in the burst array (I have provided array size of 32) instead
> of the four that it tries to collect in one bunch. And inside
> desc_to_olflags_v(), theres the assumption that there are four
> packets, and obviously it crashes trying to access the fourth buffer.
> 
> With a brief look at the code, I really cant make out how its
> guaranteed that we will always have four descriptors fully populated ?
> After the first iteration, the loop does break out if (likely(var !=
> RTE_IXGBE_DESCS_PER_LOOP)), but how about the very first iteration
> where we might not have four ?
> 
> Any thoughts will be helpful here, trying to get my app working for
> more than 10 minutes :)
> 

The code is designed to work off the fact that it will always process four
descriptors at a time, and fill in the contents of four mbufs. The main loop
will always do the work to receive four packets, and then subsequently make
a decision as to how many of the four are actually valid packets. If the 4th
descriptor processed has not actually been written back by the NIC, then we
in effect just "throw away" the work we have done for that packet. The mbuf
that was just filled in by the receive, will be filled in again later when
the descriptor has actually been written back. This way we can get linear code
without branching for each packet, at the cost of some additional instructions
for packets that are not yet ready. [And if packets are not yet ready, then
the software is working faster than packets are arriving so we have spare cycles
to spend doing the extra bit of work].

As for the specific problem you are seeing, I'll have to try and reproduce it.
My initial test [with a 10G NIC on the host not a VM - they use the same RX 
path]
sending in 3 packets at a time, did not show up any issues. Can you perhaps
isolate any further the root cause of the issue. For example, does it only
occur when you get three packets at the receive ring wraps back around to zero?

Regards,
/Bruce



  1   2   >