[dpdk-dev] FW: [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-11 Thread Yong Wang

On 11/7/14, 9:16 AM, "Olivier MATZ"  wrote:

>Hello Yong,
>
>On 11/07/2014 01:43 AM, Yong Wang wrote:
 As to HW TX checksum offload, do you have special requirement for
implementing TSO?
>>
>>> Yes. TSO implies TX TCP and IP checksum offload.
>>
>> Is this a general requirement or something specific to ixgbe/i40e? FWIW,
>> vmxnet3 device does not support tx IP checksum offload but doe support
>> TSO.  In that case, we cannot leave IP checksum field as 0 (the correct
>> checksum needs to be filled in the header) before passing it the the NIC
>> when TSO is enabled.
>
>This is a good question because we need to define the proper API that
>will work on other PMDs in the future.
>
>Indeed, there is a hardware specificity in ixgbe: when TSO is enabled,
>the IP checksum flag must also be passed to the driver if it's IPv4.
>From 82599 datasheets (7.2.3.2.4 Advanced Transmit Data Descriptor):
>
>IXSM (bit 0) ? Insert IP Checksum: This field indicates that IP
>checksum must be inserted. In IPv6 mode, it must be reset to 0b.
>If DCMD.TSE and TUCMD.IPV4 are set, IXSM must be set as well.
>If this bit is set, the packet should at least contain an
>IP header.
>
>If we allow the user to give the TSO flag without the IP checksum
>flag in mbuf flags, the ixgbe driver would have to set the IP checksum
>flag in hardware descriptors if the packet is IPv4. The driver would
>have to parse the IP header: this is not a problem as we already need
>it for TCP checksum.
>
>To summarize, I think we have 3 options when transmitting a packet to be
>segmented using TSO:
>
>- set IP checksum to 0 in the application: in this case, it would
>  require additional work in virtual drivers if the peer expects
>  to receive a packet with a valid IP checksum. But I'm wondering
>  what is the need for calculating a checksum when transmitting on
>  a virtual device (the peer receiving the packet knows that the
>  packet is not corrupted as it comes from memory). Moreover, if the
>  device advertise TSO, I assume it can also advertise IP checksum
>  offload.

Checksum is still needed if the packet has to be transmitted over the wire.

The device is capable of IP checksum but for various reasons, it is
designed to only support TSO and TCP/UDP checksum. So I guess we still
have to deal with this discrepancy.

>
>- calculate the IP checksum in the application. It would take additional
>  cycles although it may not be needed as the driver probably knows
>  how to calculate it.
>
>- if the driver supports both TSO and IP checksum, the 2 flags MUST
>  be given to the driver and the IP checksum must be set to 0 and the
>  checksum cannot be calculated in software. If the driver only
>  supports TSO, the checksum has to be calculated in software.
>
>Currently, I choosen the first solution, but I'm open to change the
>design. Maybe the 3rd one is also a good solution.

I think option (3) is cleaner and can accommodate device differences
without requiring a new API.  But I don?t really have a strong preference
here and I am fine with option (1) or a new API (dev_prep_tx()) as long as
the assumptions/requirements are clearly documented.

Thanks,
Yong

>
>By the way, we had the same kind of discussion with Konstantin [1]
>about what to do with the TCP checksum. My feeling is that setting it
>to the pseudo-header checksum is the best we can do:
> - linux does that
> - many hardware requires that (this is not the case for ixgbe, which
>   need a pshdr checksum without the IP len)
> - it can be reused if received by a virtual device and sent to a
>   physical device supporting TSO
>
>Best regards,
>Olivier
>
>
>[1]
>https://urldefense.proofpoint.com/v2/url?u=http-3A__dpdk.org_ml_archives_d
>ev_2014-2DMay_002766.html&d=AAID-g&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNt
>Xt-uEs&r=44mSO5N5yEs4CeCdtQE0xt0F7J0p67_mApYVAzyYms0&m=Sb_uMbXc4QNWb6fbk2n
>yDga1IfEZQeJUbx731-gSHU4&s=p3oIaLnY_38j2i4oxMGmtBAoQsQbeko01aEUojzSnIo&e=




[dpdk-dev] Performance impact with QoS

2014-11-11 Thread satish
Hi,
I need comments on performance impact with DPDK-QoS.

We are working on developing a application based on DPDK.
Our application supports IPv4 forwarding with and without QoS.

Without QOS, we are achieving almost full wire rate (bi-directional
traffic) with 128, 256 and 512 byte packets.
But when we enabled QoS, performance dropped to half for 128 and 256 byte
packets.
For 512 byte packet, we didn't observe any drop even after enabling QoS
(Achieving full wire rate).
Traffic used in both the cases is same. ( One stream with Qos match to
first queue in traffic class 0)

In our application, we are using memory buffer pools to receive the packet
bursts (Ring buffer is not used).
Same buffer is used during packet processing and TX (enqueue and dequeue).
All above handled on the same core.

For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.

For forwarding with QoS, using rte_sched_port_pkt_write(),
rte_sched_port_enqueue () and rte_sched_port_dequeue ()
before rte_eth_tx_burst ().

We understood that performance dip in case of 128 and 256 byte packet is
bacause
of processing more number of packets compared to 512 byte packet.

Can some comment on performance dip in my case with QOS enabled?
[1] can this be because of inefficient use of RTE calls for QoS?
[2] Is it the poor buffer management?
[3] any other comments?

To achieve good performance in QoS case, is it must to use worker thread
(running on different core) with ring buffer?

Please provide your comments.

Thanks in advance.

Regards,
Satish Babu


[dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload

2014-11-11 Thread Liu, Jijiang


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, November 10, 2014 11:59 PM
> To: dev at dpdk.org
> Cc: olivier.matz at 6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw at gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH 07/12] mbuf: generic support for TCP segmentation offload
> 
> Some of the NICs supported by DPDK have a possibility to accelerate TCP 
> traffic
> by using segmentation offload. The application prepares a packet with valid 
> TCP
> header with size up to 64K and deleguates the segmentation to the NIC.
> 
> Implement the generic part of TCP segmentation offload in rte_mbuf. It
> introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes) and
> tso_segsz (MSS of packets).
> 
> To delegate the TCP segmentation to the hardware, the user has to:
> 
> - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>   PKT_TX_TCP_CKSUM)
> - set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
>   the packet
> - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> - calculate the pseudo header checksum and set it in the TCP header,
>   as required when doing hardware TCP checksum offload
> 
> The API is inspired from ixgbe hardware (the next commit adds the support for
> ixgbe), but it seems generic enough to be used for other hw/drivers in the 
> future.
> 
> This commit also reworks the way l2_len and l3_len are used in igb and ixgbe
> drivers as the l2_l3_len is not available anymore in mbuf.
> 
> Signed-off-by: Mirek Walukiewicz 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/testpmd.c|  3 ++-
>  examples/ipv4_multicast/main.c|  3 ++-
>  lib/librte_mbuf/rte_mbuf.h| 44 
> +++
>  lib/librte_pmd_e1000/igb_rxtx.c   | 11 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +-
>  5 files changed, 50 insertions(+), 22 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 12adafa..a831e31 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -408,7 +408,8 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>   mb->ol_flags = 0;
>   mb->data_off = RTE_PKTMBUF_HEADROOM;
>   mb->nb_segs  = 1;
> - mb->l2_l3_len   = 0;
> + mb->l2_len   = 0;
> + mb->l3_len   = 0;

The mb->inner_l2_len and  mb->inner_l3_len are missed here;   I also can add 
them later.

>   mb->vlan_tci = 0;
>   mb->hash.rss = 0;
>  }
> diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
> index de5e6be..a31d43d 100644
> --- a/examples/ipv4_multicast/main.c
> +++ b/examples/ipv4_multicast/main.c
> @@ -302,7 +302,8 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
>   /* copy metadata from source packet*/
>   hdr->port = pkt->port;
>   hdr->vlan_tci = pkt->vlan_tci;
> - hdr->l2_l3_len = pkt->l2_l3_len;
> + hdr->l2_len = pkt->l2_len;
> + hdr->l3_len = pkt->l3_len;

The mb->inner_l2_len and  mb->inner_l3_len are missed here, too.

>   hdr->hash = pkt->hash;
> 
>   hdr->ol_flags = pkt->ol_flags;
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> bcd8996..f76b768 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -126,6 +126,19 @@ extern "C" {
> 
>  #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN
> computed by NIC */
> 
> +/**
> + * TCP segmentation offload. To enable this offload feature for a
> + * packet to be transmitted on hardware supporting TSO:
> + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> + *PKT_TX_TCP_CKSUM)
> + *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
> + *to 0 in the packet
> + *  - fill the mbuf offload information: l2_len, l3_len, l4_len,
> +tso_segsz
> + *  - calculate the pseudo header checksum and set it in the TCP header,
> + *as required when doing hardware TCP checksum offload
> + */
> +#define PKT_TX_TCP_SEG   (1ULL << 49)
> +
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */
> 
> @@ -185,6 +198,7 @@ static inline const char
> *rte_get_tx_ol_flag_name(uint64_t mask)
>   case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>   case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>   case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> + case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
>   default: return NULL;
>   }
>  }
> @@ -264,22 +278,18 @@ struct rte_mbuf {
> 
>   /* fields to support TX offloads */
>   union {
> - uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
> + uint64_t tx_offload;   /**< combined for easy fetch */
>   struct {
> - uint16_t l3_len:9;  /**< L3 (IP) Header Length. */

[dpdk-dev] building shared library

2014-11-11 Thread Chi, Xiaobo (NSN - CN/Hangzhou)
Hi,
I am using DPDK based shared lib, but never met such problems. Can you please 
share this the result of "ldd x.so" and check if all those depended lib are 
all avalible? 

brgs,
chi xiaobo

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of ext Newman Poborsky
Sent: Monday, November 10, 2014 10:23 PM
To: dev at dpdk.org
Subject: [dpdk-dev] building shared library

Hi,

is it possible to build a  dpdk app as a shared library?

I tried to put 'include $(RTE_SDK)/mk/rte.extshared.mk' in my Makefile (and
define SHARED) and it builds .so lib, but all rte_* symbols are undefined.

After that i tried adding:
LDLIBS += -lrte_eal -lrte_mbuf -lrte_cmdline -lrte_timer  -lrte_mempool
-lrte_ring  -lrte_pmd_ring -lethdev -lrte_malloc

And now almost all symbols in .so file are defined (missing only
rte_hexdump).

I thought this was gonna be it. But after using this library, pci probe-ing
fails since I don't have any pmd drivers registered, and
rte_eth_dev_count() returns 0.

But how are drivers supposed to be registered?

When I use gdb with regular dpdk app (not shared library), I can see this:
#0  0x0046fab0 in rte_eal_driver_register ()
#1  0x00418fb7 in devinitfn_bond_drv ()
#2  0x004f15ed in __libc_csu_init ()
#3  0x76efee55 in __libc_start_main (main=0x41ee65 , argc=1,
argv=0x7fffe4f8, init=0x4f15a0 <__libc_csu_init>, fini=,
rtld_fini=, stack_end=0x7fffe4e8) at
libc-start.c:246
#4  0x0041953c in _start ()


Ok, if I'm not mistaken, it seems driver registration is called before
main. How is this accomplished? Cause in shared library build, I don't have
this before main() and after rte_eal_init() (since driver list is empty)
everything else fails.

Any suggestions please? I'd really appreciate it...

BR,
Newman P.


[dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant structures for hash filter control

2014-11-11 Thread Zhang, Helin


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 3, 2014 3:57 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant
> structures for hash filter control
> 
> 2014-10-21 11:14, Helin Zhang:
> > +enum rte_eth_hash_filter_info_type {
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_UNKNOWN = 0,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_SYM_HASH_ENA_PER_PCTYPE,
> 
> PCTYPE is an unknown word in the API layer.
> Could you replace it by something more generic?
In ethdev layer, the idea of 'flow type' will be used for generic purpose.

> 
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_SYM_HASH_ENA_PER_PORT,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_HASH_FUNCTION,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_MAX,
> > +};
> 
> You should comment each constant.
Yes, agree.

> 
> > +struct rte_eth_sym_hash_ena_info {
> > +   /**< packet classification type, defined in rte_ethdev.h */
> > +   uint8_t pctype;
> 
> No, PCTYPE is not anymore defined in ethdev.
No pctype will be there on ethdev layer.

> 
> > +/**
> > + * A structure used to set or get filter swap information, to support
> > + * 'RTE_ETH_FILTER_HASH', 'RTE_ETH_FILTER_GET/RTE_ETH_FILTER_SET',
> > + * with information type 'RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP'.
> > + */
> > +struct rte_eth_filter_swap_info {
> > +   /**< Packet classification type, defined in rte_ethdev.h */
> > +   uint8_t pctype;
> > +   /**< Offset of the 1st field of the 1st couple to be swapped. */
> > +   uint8_t off0_src0;
> > +   /**< Offset of the 2nd field of the 1st couple to be swapped. */
> > +   uint8_t off0_src1;
> > +   /**< Field length of the first couple. */
> > +   uint8_t len0;
> > +   /**< Offset of the 1st field of the 2nd couple to be swapped. */
> > +   uint8_t off1_src0;
> > +   /**< Offset of the 2nd field of the 2nd couple to be swapped. */
> > +   uint8_t off1_src1;
> > +   /**< Field length of the second couple. */
> > +   uint8_t len1;
> > +};
> 
> I guess it would be easier to understand if
> RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP was defined previously.
> 
> --
> Thomas

Regards,
Helin


[dpdk-dev] [PATCH v5 1/5] i40e: Use constant random hash keys

2014-11-11 Thread Zhang, Helin


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 3, 2014 4:59 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 1/5] i40e: Use constant random hash keys
> 
> 2014-11-03 08:18, Zhang, Helin:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > The title is a bit surprising:
> > > - it should be about RSS
> >
> > RSS makes use of hash function to route received packets, though hash
> > function can be used for other cases, e.g. Flow director.
> 
> Yes but this patch is only changing rss_key_default so I guess it's only 
> related to
> RSS, right?
Yes, it is currently for rss only.

> 
> > > - a constant cannot be really random ;)
The comments could be re-worded.

> >
> > The hash keys are generated by libc random function.
> > It is preparatory to avoid calling random function for each port.
> 
> Here, you remove the call to rte_rand by a constant value.
No need to calculate it every time, like what Linux i40e driver does.

> 
> > > 2014-10-21 11:14, Helin Zhang:
> > > > To be simpler, and remove the race condition, it uses prepared
> > > > constant random hash keys to replace runtime generating the hash keys.
> > >
> > > Could you explain what is the role of rss_key_default?
> >
> > Hash function needs to be configured with keys, before end users
> > configured them with specific keys, we need to provide a default keys
> > which is generated by libc random function.
> > The random keys can get the hash function to route the received
> > packets to all the queues well-proportioned.

Regards,
Helin


[dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages

2014-11-11 Thread XU Liang
I had finished some tests. The patch works fine. My tests are included :* 
single process? + uio + vfio * single process? + uio + vfio + base-virtaddr * 
multiple processes + uio + vfio *?multiple processes + uio + vfio + 
base-virtaddr My unlucky multiple process application still got error 
without?base-virtaddr when initial hugepages. See the attchments: primary.txt 
and secondary.txt.With?base-virtaddr the patch worked, both hugepages and pci 
resources were mapped into base-virtaddr, My application is happy.?See the 
attchments: base-virtaddr_primary.txt and  base-virtaddr_secondary.txt. 
--From:Burakov, 
Anatoly Time:2014 Nov 10 (Mon) 21 : 34To:Burakov, 
Anatoly , dev at dpdk.org Subject:Re: [dpdk-dev] [PATCH v7] eal: map PCI memory resources after 
hugepages
Nak, there are issues with the patch. There is another patch already, but I'll 
submit it whenever Liang verifies it works with his setup.

Thanks,
Anatoly

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Anatoly Burakov
Sent: Monday, November 10, 2014 11:35 AM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages

Multi-process DPDK application must mmap hugepages and pci resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL map PCI BARs right after the hugepages (instead of
location chosen by mmap) in virtual memory.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Liang Xu 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 19 +++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  |  9 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 13 +++--
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  6 ++
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 5fe3961..dae8739 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,25 @@ error:
return -1;
 }

+void *
+pci_find_max_end_va(void)
+{
+   const struct rte_memseg *seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg *last = seg;
+   unsigned i = 0;
+
+   for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) {
+   if (seg->addr == NULL)
+   break;
+
+   if (seg->addr > last->addr)
+   last = seg;
+
+   }
+   return RTE_PTR_ADD(last->addr, last->len);
+}
+
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..5090bf1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -48,6 +48,8 @@

 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

+void *pci_map_addr = NULL;
+

 #define OFF_MAX  ((uint64_t)(off_t)-1)
 static int
@@ -371,10 +373,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   if (pci_map_addr == NULL)
+   pci_map_addr = pci_find_max_end_va();
+
+   mapaddr = pci_map_resource(pci_map_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+
+   pci_map_addr = RTE_PTR_ADD(pci_map_addr, 
maps[j].size);
}

if (fail) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c776ddc..fb6ee7a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -720,8 +720,17 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
if (i == msix_bar)
continue;

-   bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, 
reg.offset,
-   reg.size);
+   if (internal_config.process_type == RTE_PROC_PRIMARY) {
+   if (pci_map_addr == NULL)
+   pci_map_addr = pci_find_max_end_va(

[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-11 Thread Liu, Jijiang


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, November 11, 2014 12:17 AM
> To: Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hi Jijiang,
> 
> On 11/10/2014 07:03 AM, Liu, Jijiang wrote:
> >> Another thing is surprising me.
> >>
> >> - if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the
> >>driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums.
> > If the flag is not set, and imply that it is not VXLAN packet,  and do
> > TX checksum offload as regular packet.
> >
> >> - if PKT_TX_VXLAN_CKSUM is set, then the driver has to use
> >>inner_l{23}_len instead of l{23}_len for the same operation.
> > Your understanding is not fully correct.
> > The l{23}_len is still used for TX checksum offload, please refer to
> i40e_txd_enable_checksum()  implementation.
> 
> This fields are part of public mbuf API. You cannot say to refer to i40e PMD 
> code
> to understand how to use it.
> 
> >> Adding PKT_TX_VXLAN_CKSUM changes the semantic of l2_len and l3_len.
> >> To fix this, I suggest to remove the new fields inner_l{23}_len then
> >> add outer_l{23}_len instead. Therefore, the semantic of l2_len and
> >> l3_len would not change, and a driver would always use the same field for a
> specific offload.
> > Oh...
> 
> Does it mean you agree?

I don't agree to change inner_l{23}_len the name.
The reason is that using the "inner" word means  incoming  packet is tunneling 
packet or encapsulation packet.
if we add  "outer"{2,3}_len  , which will cause confusion when processing 
non-tunneling packet.


> >> For my TSO development, I will follow the current semantic.
> > For TSO, you still can use l{2,3} _len .
> > When I develop tunneling TSO, I will use inner_l3_len/inner_l4_len.
> 
> I've just submitted a first version, please feel free to comment it.
> 
> 
> Regards,
> Olivier


[dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant structures for hash filter control

2014-11-11 Thread Zhang, Helin
Hi Thomas

In order to get things more generic, and remove any mappings on specific NIC 
hardwares, I planned to change the macros in rte_ethdev.h from

/* Supported RSS offloads */
/* for 1G & 10G */
#define ETH_RSS_IPV4_SHIFT0
#define ETH_RSS_IPV4_TCP_SHIFT1
#define ETH_RSS_IPV6_SHIFT2
#define ETH_RSS_IPV6_EX_SHIFT 3
#define ETH_RSS_IPV6_TCP_SHIFT4
#define ETH_RSS_IPV6_TCP_EX_SHIFT 5
#define ETH_RSS_IPV4_UDP_SHIFT6
#define ETH_RSS_IPV6_UDP_SHIFT7
#define ETH_RSS_IPV6_UDP_EX_SHIFT 8
/* for 40G only */
#define ETH_RSS_NONF_IPV4_UDP_SHIFT   31
#define ETH_RSS_NONF_IPV4_TCP_SHIFT   33
#define ETH_RSS_NONF_IPV4_SCTP_SHIFT  34
#define ETH_RSS_NONF_IPV4_OTHER_SHIFT 35
#define ETH_RSS_FRAG_IPV4_SHIFT   36
#define ETH_RSS_NONF_IPV6_UDP_SHIFT   41
#define ETH_RSS_NONF_IPV6_TCP_SHIFT   43
#define ETH_RSS_NONF_IPV6_SCTP_SHIFT  44
#define ETH_RSS_NONF_IPV6_OTHER_SHIFT 45
#define ETH_RSS_FRAG_IPV6_SHIFT   46
#define ETH_RSS_FCOE_OX_SHIFT 48
#define ETH_RSS_FCOE_RX_SHIFT 49
#define ETH_RSS_FCOE_OTHER_SHIFT  50
#define ETH_RSS_L2_PAYLOAD_SHIFT  63

to

/* Supported RSS offloads */
/* for 1G & 10G */
#define ETH_FLOW_TYPE_IPV40
#define ETH_FLOW_TYPE_IPV4_TCP1
#define ETH_FLOW_TYPE_IPV62
#define ETH_FLOW_TYPE_IPV6_EX 3
#define ETH_FLOW_TYPE_IPV6_TCP4
#define ETH_FLOW_TYPE_IPV6_TCP_EX 5
#define ETH_FLOW_TYPE_IPV4_UDP6
#define ETH_FLOW_TYPE_IPV6_UDP7
#define ETH_FLOW_TYPE_IPV6_UDP_EX 8
/* for 40G only */
#define ETH_FLOW_TYPE_NONFRAG_IPV4_UDP   9
#define ETH_FLOW_TYPE_NONFRAG_IPV4_TCP   10
#define ETH_FLOW_TYPE_NONFRAG_IPV4_SCTP  11
#define ETH_FLOW_TYPE_NONFRAG_IPV4_OTHER 12
#define ETH_FLOW_TYPE_FRAG_IPV4   13
#define ETH_FLOW_TYPE_NONFRAG_IPV6_UDP   14
#define ETH_FLOW_TYPE_NONFRAG_IPV6_TCP   15
#define ETH_FLOW_TYPE_NONFRAG_IPV6_SCTP  16
#define ETH_FLOW_TYPE_NONFRAG_IPV6_OTHER 17
#define ETH_FLOW_TYPE_FRAG_IPV6  18
#define ETH_FLOW_TYPE_L2_PAYLOAD  19

Any comments or better ideas on that? Thanks!

Regards,
Helin
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, November 3, 2014 3:57 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant
> structures for hash filter control
> 
> 2014-10-21 11:14, Helin Zhang:
> > +enum rte_eth_hash_filter_info_type {
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_UNKNOWN = 0,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_SYM_HASH_ENA_PER_PCTYPE,
> 
> PCTYPE is an unknown word in the API layer.
> Could you replace it by something more generic?
> 
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_SYM_HASH_ENA_PER_PORT,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_HASH_FUNCTION,
> > +   RTE_ETH_HASH_FILTER_INFO_TYPE_MAX,
> > +};
> 
> You should comment each constant.
> 
> > +struct rte_eth_sym_hash_ena_info {
> > +   /**< packet classification type, defined in rte_ethdev.h */
> > +   uint8_t pctype;
> 
> No, PCTYPE is not anymore defined in ethdev.
> 
> > +/**
> > + * A structure used to set or get filter swap information, to support
> > + * 'RTE_ETH_FILTER_HASH', 'RTE_ETH_FILTER_GET/RTE_ETH_FILTER_SET',
> > + * with information type 'RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP'.
> > + */
> > +struct rte_eth_filter_swap_info {
> > +   /**< Packet classification type, defined in rte_ethdev.h */
> > +   uint8_t pctype;
> > +   /**< Offset of the 1st field of the 1st couple to be swapped. */
> > +   uint8_t off0_src0;
> > +   /**< Offset of the 2nd field of the 1st couple to be swapped. */
> > +   uint8_t off0_src1;
> > +   /**< Field length of the first couple. */
> > +   uint8_t len0;
> > +   /**< Offset of the 1st field of the 2nd couple to be swapped. */
> > +   uint8_t off1_src0;
> > +   /**< Offset of the 2nd field of the 2nd couple to be swapped. */
> > +   uint8_t off1_src1;
> > +   /**< Field length of the second couple. */
> > +   uint8_t len1;
> > +};
> 
> I guess it would be easier to understand if
> RTE_ETH_HASH_FILTER_INFO_TYPE_FILTER_SWAP was defined previously.
> 
> --
> Thomas


[dpdk-dev] [PATCH v6 0/3] app/test: unit test to measure cycles per packet

2014-11-11 Thread Liang, Cunming
Hi Thomas,

Gentle remind, in case you've too much mails to process.

-Liang Cunming

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> Sent: Wednesday, October 29, 2014 1:06 PM
> To: Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/3] app/test: unit test to measure cycles 
> per
> packet
> 
> Hi Thomas,
> 
> All the open issues from the former patches are closed.
> Could you please have a look and get it applied ?
> 
> -Liang Cunming
> 
> > -Original Message-
> > From: Liang, Cunming
> > Sent: Monday, October 27, 2014 9:20 AM
> > To: dev at dpdk.org
> > Cc: nhorman at tuxdriver.com; Ananyev, Konstantin; Richardson, Bruce; De 
> > Lara
> > Guarch, Pablo; Liang, Cunming
> > Subject: [PATCH v6 0/3] app/test: unit test to measure cycles per packet
> >
> > v6 update:
> > # leave FUNC_PTR_OR_*_RET unmodified
> >
> > v5 update:
> > # fix the confusing of retval in some API of rte_ethdev
> >
> > v4 ignore
> >
> > v3 update:
> > # Codes refine according to the feedback.
> >   1. add ether_format_addr to rte_ether.h
> >   2. fix typo in code comments.
> >   3. %lu to %PRIu64, fixing 32-bit targets compilation err
> > # merge 2 small incremental patches to the first one.
> >   The whole unit test as a single patch in [PATCH v3 2/2]
> > # rebase code to the latest master
> >
> > v2 update:
> > Rebase code to the latest master branch.
> >
> > It provides unit test to measure cycles/packet in NIC loopback mode.
> > It simply gives the average cycles of IO used per packet without test 
> > equipment.
> > When doing the test, make sure the link is UP.
> >
> > There's two stream control mode support, one is continues, another is burst.
> > The former continues to forward the injected packets until reaching a 
> > certain
> > amount of number.
> > The latter one stop when all the injected packets are received.
> > In burst stream, now measure two situations, with or without desc. cache
> conflict.
> > By default, it runs in continues stream mode to measure the whole rxtx.
> >
> > Usage Example:
> > 1. Run unit test app in interactive mode
> > app/test -c f -n 4 -- -i
> > 2. Set stream control mode, by default is continuous
> > set_rxtx_sc [continuous|poll_before_xmit|poll_after_xmit]
> > 3. If choose continuous stream, there are another two options can configure
> > 3.1 choose rx/tx pair, default is vector
> > set_rxtx_mode [vector|scalar|full|hybrid]
> > Note: To get acurate scalar fast, plz choose 'vector' or 'hybrid' 
> > without
> > INC_VEC=y in config
> > 3.2 choose the area of masurement, default is rxtx
> > set_rxtx_anchor [rxtx|rxonly|txonly]
> > 4. Run and wait for the result
> > pmd_perf_autotest
> >
> > For who simply just want to see how much cycles cost per packet.
> > Compile DPDK, Run 'app/test', and type 'pmd_perf_autotest', that's it.
> > Nothing else needs to configure.
> > Using other options when you understand and what to measures more.
> >
> >
> > BTW, [1/3] is the same patch as below one.
> > http://dpdk.org/dev/patchwork/patch/817
> >
> > *** BLURB HERE ***
> >
> > Cunming Liang (3):
> >   app/test: allow to create packets in different sizes
> >   app/test: measure the cost of rx/tx routines by cycle number
> >   ethdev: fix wrong error return refer to API definition
> >
> >  app/test/Makefile   |1 +
> >  app/test/commands.c |  111 +
> >  app/test/packet_burst_generator.c   |   26 +-
> >  app/test/packet_burst_generator.h   |   11 +-
> >  app/test/test.h |6 +
> >  app/test/test_link_bonding.c|   39 +-
> >  app/test/test_pmd_perf.c|  922
> > +++
> >  lib/librte_ether/rte_ethdev.c   |6 +-
> >  lib/librte_ether/rte_ether.h|   25 +
> >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |6 +
> >  10 files changed, 1117 insertions(+), 36 deletions(-)
> >  create mode 100644 app/test/test_pmd_perf.c
> >
> > --
> > 1.7.4.1



[dpdk-dev] building shared library

2014-11-11 Thread Newman Poborsky
Hi,

sure, here it is:
ldd libdpdk-api.so
linux-vdso.so.1 =>  (0x7fff3fffe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f583dd99000)
/lib64/ld-linux-x86-64.so.2 (0x7f583e5d4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x7f583db7a000)

This is a library built with Makefile that has the following options:
RTE_BUILD_SHARED_LIB=y
CFLAGS += -fPIC
LDLIBS += -lrte_eal -lrte_mbuf -lrte_cmdline -lrte_timer  -lrte_mempool
-lrte_ring  -lrte_pmd_ring -lethdev -lrte_malloc
include $(RTE_SDK)/mk/rte.extshared.mk

There are no missing libraries.

I also had to add '-fPIC' flag to all Makefiles of lrte_*  libs above.   Is
this the correct way to build shared lib? Am I missing something?

When I build it as a regular dpdk app (like helloworld example) ldd output
is this:
ldd dpdk-api
linux-vdso.so.1 =>  (0x7fffacbfe000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7ffe91b2b000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7ffe91922000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7ffe9171e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7ffe91139000)
/lib64/ld-linux-x86-64.so.2 (0x7ffe92042000)
libpcap.so.1 => /usr/local/lib/libpcap.so.1 (0x7ffe90ef8000)

Thank you for any help!

BR,
Newman P.


On Tue, Nov 11, 2014 at 4:28 AM, Chi, Xiaobo (NSN - CN/Hangzhou) <
xiaobo.chi at nsn.com> wrote:

> Hi,
> I am using DPDK based shared lib, but never met such problems. Can you
> please share this the result of "ldd x.so" and check if all those
> depended lib are all avalible?
>
> brgs,
> chi xiaobo
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of ext Newman Poborsky
> Sent: Monday, November 10, 2014 10:23 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] building shared library
>
> Hi,
>
> is it possible to build a  dpdk app as a shared library?
>
> I tried to put 'include $(RTE_SDK)/mk/rte.extshared.mk' in my Makefile
> (and
> define SHARED) and it builds .so lib, but all rte_* symbols are undefined.
>
> After that i tried adding:
> LDLIBS += -lrte_eal -lrte_mbuf -lrte_cmdline -lrte_timer  -lrte_mempool
> -lrte_ring  -lrte_pmd_ring -lethdev -lrte_malloc
>
> And now almost all symbols in .so file are defined (missing only
> rte_hexdump).
>
> I thought this was gonna be it. But after using this library, pci probe-ing
> fails since I don't have any pmd drivers registered, and
> rte_eth_dev_count() returns 0.
>
> But how are drivers supposed to be registered?
>
> When I use gdb with regular dpdk app (not shared library), I can see this:
> #0  0x0046fab0 in rte_eal_driver_register ()
> #1  0x00418fb7 in devinitfn_bond_drv ()
> #2  0x004f15ed in __libc_csu_init ()
> #3  0x76efee55 in __libc_start_main (main=0x41ee65 , argc=1,
> argv=0x7fffe4f8, init=0x4f15a0 <__libc_csu_init>, fini=,
> rtld_fini=, stack_end=0x7fffe4e8) at
> libc-start.c:246
> #4  0x0041953c in _start ()
>
>
> Ok, if I'm not mistaken, it seems driver registration is called before
> main. How is this accomplished? Cause in shared library build, I don't have
> this before main() and after rte_eal_init() (since driver list is empty)
> everything else fails.
>
> Any suggestions please? I'd really appreciate it...
>
> BR,
> Newman P.
>


[dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine

2014-11-11 Thread Liu, Jijiang
Hi Olivier,

The PKT_TX_VXLAN_CKSUM was not set in the patch, and VXLAN TX checksum offload 
would not work. 

Thanks
Jijiang Liu

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, November 10, 2014 11:59 PM
> To: dev at dpdk.org
> Cc: olivier.matz at 6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw at gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH 10/12] testpmd: rework csum forward engine
> 
> The csum forward engine was becoming too complex to be used and extended
> (the next commits want to add the support of TSO):
> 
> - no explaination about what the code does
> - code is not factorized, lots of code duplicated, especially between
>   ipv4/ipv6
> - user command line api: use of bitmasks that need to be calculated by
>   the user
> - the user flags don't have the same semantic:
>   - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
>   - for other (vxlan), it selects between hardware checksum or no
> checksum
> - the code relies too much on flags set by the driver without software
>   alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
>   compare a software implementation with the hardware offload.
> 
> This commit tries to fix these issues, and provide a simple definition of 
> what is
> done by the forward engine:
> 
>  * Receive a burst of packets, and for supported packet types:
>  *  - modify the IPs
>  *  - reprocess the checksum in SW or HW, depending on testpmd command line
>  *configuration
>  * Then packets are transmitted on the output port.
>  *
>  * Supported packets are:
>  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
>  *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
>  *
>  * The network parser supposes that the packet is contiguous, which may
>  * not be the case in real life.
> 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/cmdline.c  | 151 ---
>  app/test-pmd/config.c   |  11 -
>  app/test-pmd/csumonly.c | 668 ++-
> -
>  app/test-pmd/testpmd.h  |  17 +-
>  4 files changed, 423 insertions(+), 424 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4c3fc76..0361e58 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>   "Disable hardware insertion of a VLAN header in"
>   " packets sent on a port.\n\n"
> 
> - "tx_checksum set (mask) (port_id)\n"
> - "Enable hardware insertion of checksum offload with"
> - " the 8-bit mask, 0~0xff, in packets sent on a port.\n"
> - "bit 0 - insert ip   checksum offload if set\n"
> - "bit 1 - insert udp  checksum offload if set\n"
> - "bit 2 - insert tcp  checksum offload if set\n"
> - "bit 3 - insert sctp checksum offload if set\n"
> - "bit 4 - insert inner ip  checksum offload if 
> set\n"
> - "bit 5 - insert inner udp checksum offload if 
> set\n"
> - "bit 6 - insert inner tcp checksum offload if 
> set\n"
> - "bit 7 - insert inner sctp checksum offload if 
> set\n"
> + "tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw)
> (port_id)\n"
> + "Enable hardware calculation of checksum with when"
> + " transmitting a packet using 'csum' forward engine.\n"
>   "Please check the NIC datasheet for HW limits.\n\n"
> 
> + "tx_checksum show (port_id)\n"
> + "Display tx checksum offload configuration\n\n"
> +
>   "set fwd (%s)\n"
>   "Set packet forwarding mode.\n\n"
> 
> @@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
> 
> 
>  /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */ -
> struct cmd_tx_cksum_set_result {
> +struct cmd_tx_cksum_result {
>   cmdline_fixed_string_t tx_cksum;
> - cmdline_fixed_string_t set;
> - uint8_t cksum_mask;
> + cmdline_fixed_string_t mode;
> + cmdline_fixed_string_t proto;
> + cmdline_fixed_string_t hwsw;
>   uint8_t port_id;
>  };
> 
>  static void
> -cmd_tx_cksum_set_parsed(void *parsed_result,
> +cmd_tx_cksum_parsed(void *parsed_result,
>  __attribute__((unused)) struct cmdline *cl,
>  __attribute__((unused)) void *data)  {
> - struct cmd_tx_cksum_set_result *res = parsed_result;
> + struct cmd_tx_cksum_result *res = parsed_result;
> + int hw = 0;
> + uint16_t ol_flags, mask = 0;
> + struct rte_eth_dev_info dev_info;
> +
> + if (port_id_is_invalid(res->port_id)) {
> + 

[dpdk-dev] [PATCH v3] Add in_flight_bitmask so as to use full 32 bits of tag.

2014-11-11 Thread jigsaw
Hi Bruce,

This patch has little, if any, performance impact.
See the perf stat -d for original and patched version
of test_distributor_perf.

Original version

perf stat -d ./test_orig  -c -n2

 Cache line switch test ===

[4/4590]
Time for 1048576 iterations = 362349056 ticks
Ticks per iteration = 345

=== Performance test of distributor ===
Time per burst:  7043
Time per packet: 220

Worker 0 handled 3377372 packets
Worker 1 handled 2857280 packets
Worker 2 handled 2120982 packets
Worker 3 handled 2112720 packets
Worker 4 handled 2102014 packets
Worker 5 handled 2101314 packets
Worker 6 handled 2099248 packets
Worker 7 handled 2098560 packets
Worker 8 handled 2098114 packets
Worker 9 handled 2097962 packets
Worker 10 handled 2097892 packets
Worker 11 handled 2097856 packets
Worker 12 handled 2097762 packets
Worker 13 handled 2097726 packets
Worker 14 handled 2097630 packets
Total packets: 33554432 (200)
=== Perf test done ===


 Performance counter stats for './test_orig -c -n2 --no-huge':

  45784.935475 task-clock#   12.847 CPUs utilized
  4822 context-switches  #0.105 K/sec
 7 CPU-migrations#0.000 K/sec
  1017 page-faults   #0.022 K/sec
  118181375409 cycles#2.581 GHz
[40.00%]
   88915381373 stalled-cycles-frontend   #   75.24% frontend cycles
idle[39.98%]
   77015854611 stalled-cycles-backend#   65.17% backend  cycles
idle[39.96%]
   30469536186 instructions  #0.26  insns per cycle
 #2.92  stalled cycles per
insn [49.96%]
6481829773 branches  #  141.571 M/sec
[49.97%]
  45283365 branch-misses #0.70% of all branches
[50.00%]
6537115556 L1-dcache-loads   #  142.779 M/sec
[50.05%]
 249533128 L1-dcache-load-misses #3.82% of all L1-dcache
hits   [50.08%]
 144469663 LLC-loads #3.155 M/sec
[40.05%]
  69897760 LLC-load-misses   #   48.38% of all LL-cache
hits[40.03%]

   3.563886431 seconds time elapsed


Patched version
=
perf stat -d ./test_patch  -c -n2
 Cache line switch test ===

[4/4747]
Time for 1048576 iterations = 456390606 ticks
Ticks per iteration = 435

=== Performance test of distributor ===
Time per burst:  6566
Time per packet: 205

Worker 0 handled 2528418 packets
Worker 1 handled 2531906 packets
Worker 2 handled 2524133 packets
Worker 3 handled 2507720 packets
Worker 4 handled 2332648 packets
Worker 5 handled 2300046 packets
Worker 6 handled 2213035 packets
Worker 7 handled 2125486 packets
Worker 8 handled 2096726 packets
Worker 9 handled 2089150 packets
Worker 10 handled 2079626 packets
Worker 11 handled 2074512 packets
Worker 12 handled 2073560 packets
Worker 13 handled 2054838 packets
Worker 14 handled 2022628 packets
Total packets: 33554432 (200)
=== Perf test done ===


 Performance counter stats for './test_patch -c -n2 --no-huge':

  42784.064267 task-clock#   12.546 CPUs utilized
  4517 context-switches  #0.106 K/sec
 5 CPU-migrations#0.000 K/sec
  1020 page-faults   #0.024 K/sec
  110531300647 cycles#2.583 GHz
[40.06%]
   83135557406 stalled-cycles-frontend   #   75.21% frontend cycles
idle[40.02%]
   71862086289 stalled-cycles-backend#   65.02% backend  cycles
idle[39.99%]
   28847200612 instructions  #0.26  insns per cycle
 #2.88  stalled cycles per
insn [49.98%]
6059139104 branches  #  141.621 M/sec
[49.94%]
  38455463 branch-misses #0.63% of all branches
[49.95%]
6090169906 L1-dcache-loads   #  142.347 M/sec
[49.99%]
 235094118 L1-dcache-load-misses #3.86% of all L1-dcache
hits   [50.03%]
 129974492 LLC-loads #3.038 M/sec
[40.07%]
  56701883 LLC-load-misses   #   43.63% of all LL-cache
hits[40.08%]

   3.410231367 seconds time elapsed


The patched version even runs a bit faster.

Perf annotation shows that 92% time is spent in the while loop
of rte_distributor_get_pkt.
Anothoer 6% spent in the calculation of match
inside rte_distributor_process.

Also we can see that there is a big amount of LLC load miss, which is
perhaps due to cache conflict
in the while-loop of rte_distributor_get_pkt?

I'm no going to draw any conclusion here but evidently the
in_flight_bitmask is not in the hot path, and
is irrelevant to the performance of distributor.

Thanks a lot for your code review and ack.

thx &
rgds,
-qinglai


On Mon, Nov 10, 2014 at 6:55 PM, Bruce Richardson <
bruce.richardson at i

[dpdk-dev] [PATCH 00/12] add TSO support

2014-11-11 Thread Olivier MATZ
This is the test report for the new TSO feature. Test done on testpmd
on x86_64-native-linuxapp-gcc

platform:

  Tester (linux)   <>   DUT (DPDK on westmere)
 ixgbe6 port0 (ixgbe)

Run testpmd on DUT:

  cd dpdk.org/
  make install T=x86_64-native-linuxapp-gcc
  cd x86_64-native-linuxapp-gcc/
  modprobe uio
  insmod kmod/igb_uio.ko
  python ../tools/dpdk_nic_bind.py -b igb_uio :02:00.0
  echo 0 > /proc/sys/kernel/randomize_va_space
  echo 1000 >
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  echo 1000 >
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
  mount -t hugetlbfs none /mnt/huge
  ./app/testpmd -c 0x55 -n 4 -m 800 -- -i --port-topology=chained
--enable-rx-cksum

Disable all offload feature on Tester, and start capture:

  ethtool -K ixgbe6 rx off tx off tso off gso off gro off lro off
  ip l set ixgbe6 up
  tcpdump -n -e -i ixgbe6 -s 0 -w /tmp/cap

We use the following scapy script for testing (note: vxlan was not
tested because I have no i40e on my platform, but at least the test
scripts are provided if someone wants to check it):

class VXLAN(Packet):
name = 'VXLAN'
fields_desc = [
FlagsField('flags', default=1 << 3, size=8,
names=['R', 'R', 'R', 'R', 'I', 'R', 'R', 'R']),
XBitField('reserved1', default=0x00, size=24),
BitField('vni', None, size=24),
XBitField('reserved2', default=0x00, size=8),
]
overload_fields = {
UDP: {'sport': 4789, 'dport': 4789},
}
def mysummary(self):
return self.sprintf("VXLAN (vni=%VXLAN.vni%)")

bind_layers(UDP, VXLAN, dport=4789)
bind_layers(VXLAN, Ether)

def test_v4(iface, macdst):
  macsrc = get_if_hwaddr(iface)
  v4 = Ether(dst=macdst, src=macsrc)/IP(src=RandIP(), dst=RandIP())
  # valid TCP packet
  p=v4/TCP(flags=0x10)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # valid UDP packet
  p=v4/UDP()/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # bad IP checksum
  p=v4/TCP(flags=0x10)/Raw(RandString(50))
  p[IP].chksum=0x1234
  sendp(p, iface=iface, count=5)
  # bad TCP checksum
  p=v4/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # large packet
  p=v4/TCP(flags=0x10)/Raw(RandString(1400))
  sendp(p, iface=iface, count=5)

def test_v6(iface, macdst):
  macsrc = get_if_hwaddr(iface)
  v6 = Ether(dst=macdst, src=macsrc)/IPv6(src=RandIP6(), dst=RandIP6())
  # checksum TCP
  p=v6/TCP(flags=0x10)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # checksum UDP
  p=v6/UDP()/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # bad TCP checksum
  p=v6/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # large packet
  p=v6/TCP(flags=0x10)/Raw(RandString(1400))
  sendp(p, iface=iface, count=5)

def test_vxlan(iface, macdst):
  macsrc = get_if_hwaddr(iface)
  vxlan = Ether(dst=macdst, src=macsrc)/IP(src=RandIP(), dst=RandIP())
  vxlan /= UDP()/VXLAN(vni=1234)/Ether(dst=macdst, src=macsrc)
  vxlan /= IP(src=RandIP(), dst=RandIP())
  # valid packet
  p=vxlan/TCP(flags=0x10)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # bad IP checksum
  p=vxlan/TCP(flags=0x10)/Raw(RandString(50))
  p[IP].payload[IP].chksum=0x1234 # inner header
  sendp(p, iface=iface, count=5)
  # bad TCP checksum
  p=vxlan/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # large TCP packet, no UDP checksum on outer
  p=vxlan/TCP(flags=0x10)/Raw(RandString(1400))
  p[UDP].chksum = 0
  sendp(p, iface=iface, count=5)

test_v4("ixgbe6", "00:1B:21:8E:B2:30")
test_v6("ixgbe6", "00:1B:21:8E:B2:30")
test_vxlan("ixgbe6", "00:1B:21:8E:B2:30")

Test 1: rxonly fwd engine
=

Check that the NIC is able to decode the packet header and the bad
checksum values. The test_vxlan does not work on ixgbe as it is not able
to recognize vxlan packets.

testpmd command lines:

  set fwd rxonly
  set verbose 1
  start

Start test_v4() in scapy. Result is:

port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=92 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/

[dpdk-dev] [PATCH 00/12] add TSO support

2014-11-11 Thread Olivier MATZ
On 11/11/2014 10:21 AM, Olivier MATZ wrote:
> Check the capture file (test2-cap-sw-cksum.cap)

> Check the capture file (test3-cap-hw-cksum.cap)

> Check the capture file (test4-cap-tso.cap)

Sorry, the attachments are automatically stripped by the list,
you can find them here:
https://www.droids-corp.org/~zer0/dpdk-tso-cap/

Olivier


[dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine

2014-11-11 Thread Olivier MATZ
Hi Jijiang,

On 11/11/2014 09:35 AM, Liu, Jijiang wrote:
> The PKT_TX_VXLAN_CKSUM was not set in the patch, and VXLAN TX checksum 
> offload would not work. 

Thank you for reporting this. Indeed, there is an issue. See below.

>> +/* Calculate the checksum of outer header (only vxlan is supported,
>> + * meaning IP + UDP). The caller already checked that it's a vxlan
>> + * packet */
>> +static uint64_t
>> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
>> +uint16_t outer_l3_len, uint16_t testpmd_ol_flags) {
>> +struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
>> +struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
>> +struct udp_hdr *udp_hdr;
>> +uint64_t ol_flags = 0;
>> +
>> +if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
>> +ol_flags |= PKT_TX_IP_CKSUM;

Here it should be: ol_flags |= PKT_TX_VXLAN_CKSUM

I'll fix that in the next version.

Regards,
Olivier


[dpdk-dev] [PATCH v8] eal: map PCI memory resources after hugepages

2014-11-11 Thread Anatoly Burakov
Multi-process DPDK application must mmap hugepages and PCI resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL try and map PCI BARs right after the hugepages
(instead of location chosen by mmap) in virtual memory, so that PCI BARs
have less chance of ending up in random places in virtual memory.

Signed-off-by: Liang Xu 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 30 --
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 13 --
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 19 +++---
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  6 +
 4 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 5fe3961..79fbbb8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,25 @@ error:
return -1;
 }

+void *
+pci_find_max_end_va(void)
+{
+   const struct rte_memseg *seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg *last = seg;
+   unsigned i = 0;
+
+   for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) {
+   if (seg->addr == NULL)
+   break;
+
+   if (seg->addr > last->addr)
+   last = seg;
+
+   }
+   return RTE_PTR_ADD(last->addr, last->len);
+}
+
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
@@ -106,21 +125,16 @@ pci_map_resource(void *requested_addr, int fd, off_t 
offset, size_t size)
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
-   if (mapaddr == MAP_FAILED ||
-   (requested_addr != NULL && mapaddr != requested_addr)) {
+   if (mapaddr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
__func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
-   goto fail;
+   } else {
+   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
}

-   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
-
return mapaddr;
-
-fail:
-   return NULL;
 }

 /* parse the "resource" sysfs file */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..e53f06b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -48,6 +49,8 @@

 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

+void *pci_map_addr = NULL;
+

 #define OFF_MAX  ((uint64_t)(off_t)-1)
 static int
@@ -371,10 +374,16 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   /* try mapping somewhere close to the end of 
hugepages */
+   if (pci_map_addr == NULL)
+   pci_map_addr = pci_find_max_end_va();
+
+   mapaddr = pci_map_resource(pci_map_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
-   if (mapaddr == NULL)
+   if (mapaddr == MAP_FAILED)
fail = 1;
+
+   pci_map_addr = RTE_PTR_ADD(mapaddr, (size_t) 
maps[j].size);
}

if (fail) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c776ddc..c1246e8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -720,10 +721,22 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
if (i == msix_bar)
continue;

-   bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, 
reg.offset,
-   reg.size);
+   

[dpdk-dev] Community conference call - Tuesday 18th November

2014-11-11 Thread O'driscoll, Tim
We're going to hold our next community conference call a week from today - 
Tuesday 18th November, at 4:00pm in Ireland/UK. Here's the time in a variety of 
timezones:

Dublin (Ireland)Tuesday, November 18, 2014 at 
4:00:00 PM  GMT UTC 
San Francisco (U.S.A. - California) Tuesday, November 18, 2014 at 8:00:00 
AM  PST UTC-8 hours 
Phoenix (U.S.A. - Arizona)  Tuesday, November 18, 2014 at 9:00:00 
AM  MST UTC-7 hours 
New York (U.S.A. - New York)Tuesday, November 18, 2014 at 11:00:00 
AM EST UTC-5 hours 
Ottawa (Canada - Ontario)   Tuesday, November 18, 2014 at 11:00:00 
AM EST UTC-5 hours 
Paris (France)  Tuesday, November 18, 2014 at 5:00:00 
PM  CET UTC+1 hour  
Tel Aviv (Israel)   Tuesday, November 18, 2014 at 
6:00:00 PM  IST UTC+2 hours 
Moscow (Russia) Tuesday, November 18, 2014 at 7:00:00 PM  MSK 
UTC+3 hours 
Corresponding UTC (GMT) Tuesday, November 18, 2014 at 16:00:00

I'll provide conference bridge numbers later, but I wanted to communicate the 
date and time now.


Tim


[dpdk-dev] building shared library

2014-11-11 Thread Gonzalez Monroy, Sergio
Hi  Newman,

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Newman Poborsky
> Sent: Monday, November 10, 2014 2:23 PM
> 
> Hi,
> 
> is it possible to build a  dpdk app as a shared library?
> 
> I tried to put 'include $(RTE_SDK)/mk/rte.extshared.mk' in my Makefile (and
> define SHARED) and it builds .so lib, but all rte_* symbols are undefined.
> 
Can you elaborate a bit on how you are building DPDK and your app?
Is your objective to build a single .so containing your app and all DPDK libs?
Or do you want your app to have a link dependency on DPDK shared libs?



[dpdk-dev] building shared library

2014-11-11 Thread Newman Poborsky
Hi,

I want to build one .so file with my app (it contains API that I want to
call through JNI) and all DPDK libs that I use in my app.

As I've already mentioned, when I build and start my dpdk app as a
standalone application, I can see that before main() is called, there is a
call to 'rte_eal_driver_register()' function for every driver. When I build
.so file, this does not happen and no driver is registered so everyting
after rte_eal_init() fails.


BR,
Newman

On Tue, Nov 11, 2014 at 11:37 AM, Gonzalez Monroy, Sergio <
sergio.gonzalez.monroy at intel.com> wrote:

> Hi  Newman,
>
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Newman Poborsky
> > Sent: Monday, November 10, 2014 2:23 PM
> >
> > Hi,
> >
> > is it possible to build a  dpdk app as a shared library?
> >
> > I tried to put 'include $(RTE_SDK)/mk/rte.extshared.mk' in my Makefile
> (and
> > define SHARED) and it builds .so lib, but all rte_* symbols are
> undefined.
> >
> Can you elaborate a bit on how you are building DPDK and your app?
> Is your objective to build a single .so containing your app and all DPDK
> libs?
> Or do you want your app to have a link dependency on DPDK shared libs?
>
>


[dpdk-dev] Ports not detected by IGB_UIO in DPDK 1.7.1 in QEMU_KVM environment

2014-11-11 Thread Manoj Viswanath
Bruce,

Thanks for the input.
Sure, will figure out the offending file behind this error and update this
thread.

Meanwhile, wanted to share one more observation regarding this issue:
The "file descriptor error" is NOT SEEN with DPDK 1.7.0 (dpdk-1.7.0.tar.gz
) when explicitly
binding NICs to IGB_UIO using the .py script.

The issue is only seen when using DPDK 1.7.1 ( dpdk-1.7.1.tar.gz
). I hope the
version i am using is the last official tag in the 1.7.x tree.

I see that this problem ("file descriptor error") in DPDK 1.7.1 and its
non-occurrence in DPDK 1.7.0 has already been reported in this group on Sep
11 by zimeiw  in a mail with following subject line:- "*There
are a lot of error log when run l3fwd of dpdk-1.7.1*". Couldn't find any
responses to that thread.

Regards,
Manoj

On Mon, Nov 10, 2014 at 4:28 PM, Bruce Richardson <
bruce.richardson at intel.com> wrote:

> On Fri, Nov 07, 2014 at 11:26:08PM +0530, Manoj Viswanath wrote:
> > Hi Bruce,
> >
> > Please find my comment in lined.
> >
> > On Fri, Nov 7, 2014 at 9:00 PM, Bruce Richardson <
> bruce.richardson at intel.com
> > > wrote:
> >
> > > On Fri, Nov 07, 2014 at 08:31:34PM +0530, Manoj Viswanath wrote:
> > > > Hi Bruce,
> > > >
> > > > I was not doing anything specific for binding the NICs to IGB_UIO
> (like
> > > > invoking "dpdk_nic_bind.py" script explicitly) when using my
> application
> > > > with DPDK 1.6.0. The e1000 devices assigned via virt-manager to the
> VM
> > > were
> > > > automatically getting picked up and initialized by IGB_UIO within
> each
> > > VM.
> > > >
> > > > The same is not working with DPDK 1.7.1 now.
> > > >
> > > > I tried exporting the "dpdk_nic_bind.py" script into my VM (running
> DPDK
> > > > 1.7.1) and tried to check the status. The emulated devices were
> shown as
> > > > neither bound to kernel nor to IGB_UIO as evident from below output:-
> > > >
> > > >
> > >
> <--->
> > > > Network devices using DPDK-compatible driver
> > > > 
> > > > 
> > > >
> > > > Network devices using kernel driver
> > > > ===
> > > > :00:03.0 'Virtio network device' if= drv=virtio-pci
> unused=igb_uio
> > > >
> > > > Other network devices
> > > > =
> > > > :00:04.0 '82540EM Gigabit Ethernet Controller' unused=igb_uio
> > > > :00:05.0 '82540EM Gigabit Ethernet Controller' unused=igb_uio
> > > >
> > >
> <--->
> > > >
> > > > When i tried to forcefully bind the NICs using the "--bind=igb_uio"
> > > option
> > >
> > > Was there any output of the dpdk_nic_bind script? What does the output
> of
> > > it with --status show afterwards?
> > >
> > > ?
> > [MANOJ]?
> >
> > ?Yes. Please refer below output:-
> > 
> > Network devices using DPDK-compatible driver
> > 
> > :00:04.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused=
> > :00:05.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused=
> >
> > Network devices using kernel driver
> > ===
> > :00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio
> >
> > Other network devices
> > =
> > ?
> > 
> >
> > ?However, when i start the DPDK application, i am getting the error log
> as
> > indicated in earlier mail. ?
> >
> > The difference with DPDK 1.6.1 is that at the same stage IGB_UIO has
> > already bound the assigned devices without having to explicitly run the
> > "dpdk_nic_bind.py". Please find below the application log when run with
> > DPDK 1.6.0:-
> >
> > ?
> > Network devices using DPDK-compatible driver
> > 
> > :00:04.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused=
> > :00:08.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused=
> >
> > Network devices using kernel driver
> > ===
> > :00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio
> >
> > Other network devices
> > =
> > 
> > ?
> >
> > ?Kindly note that in both cases, logs have been taken after loading
> IGB_UIO
> > prior to starting DPDK application. ?
> >
> > ?[/MANOJ]?
> >
> > Regards,
>
> Ok, so it appears that after running dpdk_nic_bind to bind the devices to
> igb_uio
> the differences between 1.6 and 1.7 are resolved for that part. The reason
> why you explicitly need to bind the dev

[dpdk-dev] building shared library

2014-11-11 Thread Sergio Gonzalez Monroy
On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
> Hi,
> 
> I want to build one .so file with my app (it contains API that I want to
> call through JNI) and all DPDK libs that I use in my app.
> 
> As I've already mentioned, when I build and start my dpdk app as a
> standalone application, I can see that before main() is called, there is a
> call to 'rte_eal_driver_register()' function for every driver. When I build
> .so file, this does not happen and no driver is registered so everyting
> after rte_eal_init() fails.
> 
Hi Newman,

AFAIK the current build system does not support that.

You can build DPDK as shared libs by setting the following config option:
CONFIG_RTE_BUILD_SHARED_LIB=y

Then build your app as an .so that links against DPDK libs, so you have 
explicit dependencies (such dependencies should show with ldd).

Is there any reason why you want everything to be a single .so ?

I don't know much about how Java loads DSOs but I reckon that it must resolve
explicit dependencies such as libc.

Thanks,
Sergio


> 
> BR,
> Newman
> 


[dpdk-dev] building shared library

2014-11-11 Thread Newman Poborsky
Hi Sergio,

no, that sounds good, thank you.  Since I'm not that familiar with DPDK
build system, where should this option be set? In 'lib' folder's Makefile?

Thank you once again!

BR,
Newman

On Tue, Nov 11, 2014 at 3:18 PM, Sergio Gonzalez Monroy <
sergio.gonzalez.monroy at intel.com> wrote:

> On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
> > Hi,
> >
> > I want to build one .so file with my app (it contains API that I want to
> > call through JNI) and all DPDK libs that I use in my app.
> >
> > As I've already mentioned, when I build and start my dpdk app as a
> > standalone application, I can see that before main() is called, there is
> a
> > call to 'rte_eal_driver_register()' function for every driver. When I
> build
> > .so file, this does not happen and no driver is registered so everyting
> > after rte_eal_init() fails.
> >
> Hi Newman,
>
> AFAIK the current build system does not support that.
>
> You can build DPDK as shared libs by setting the following config option:
> CONFIG_RTE_BUILD_SHARED_LIB=y
>
> Then build your app as an .so that links against DPDK libs, so you have
> explicit dependencies (such dependencies should show with ldd).
>
> Is there any reason why you want everything to be a single .so ?
>
> I don't know much about how Java loads DSOs but I reckon that it must
> resolve
> explicit dependencies such as libc.
>
> Thanks,
> Sergio
>
>
> >
> > BR,
> > Newman
> >
>


[dpdk-dev] building shared library

2014-11-11 Thread Newman Poborsky
Hi,

after building DPDK libs as shared libraries and linking it, I'm back to my
first problem: rte_eal_driver_register() never gest called and my app
crashes since there are no drivers registered.  As previously mentioned, in
regular DPDK user app this functions is called for every driver before
main(). How?

BR,
Newman

On Tue, Nov 11, 2014 at 3:44 PM, Newman Poborsky 
wrote:

> Hi Sergio,
>
> no, that sounds good, thank you.  Since I'm not that familiar with DPDK
> build system, where should this option be set? In 'lib' folder's Makefile?
>
> Thank you once again!
>
> BR,
> Newman
>
> On Tue, Nov 11, 2014 at 3:18 PM, Sergio Gonzalez Monroy <
> sergio.gonzalez.monroy at intel.com> wrote:
>
>> On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
>> > Hi,
>> >
>> > I want to build one .so file with my app (it contains API that I want to
>> > call through JNI) and all DPDK libs that I use in my app.
>> >
>> > As I've already mentioned, when I build and start my dpdk app as a
>> > standalone application, I can see that before main() is called, there
>> is a
>> > call to 'rte_eal_driver_register()' function for every driver. When I
>> build
>> > .so file, this does not happen and no driver is registered so everyting
>> > after rte_eal_init() fails.
>> >
>> Hi Newman,
>>
>> AFAIK the current build system does not support that.
>>
>> You can build DPDK as shared libs by setting the following config option:
>> CONFIG_RTE_BUILD_SHARED_LIB=y
>>
>> Then build your app as an .so that links against DPDK libs, so you have
>> explicit dependencies (such dependencies should show with ldd).
>>
>> Is there any reason why you want everything to be a single .so ?
>>
>> I don't know much about how Java loads DSOs but I reckon that it must
>> resolve
>> explicit dependencies such as libc.
>>
>> Thanks,
>> Sergio
>>
>>
>> >
>> > BR,
>> > Newman
>> >
>>
>
>


[dpdk-dev] building shared library

2014-11-11 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Newman Poborsky
> Sent: Tuesday, November 11, 2014 3:17 PM
> To: Gonzalez Monroy, Sergio
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] building shared library
> 
> Hi,
> 
> after building DPDK libs as shared libraries and linking it, I'm back to my
> first problem: rte_eal_driver_register() never gest called and my app
> crashes since there are no drivers registered.  As previously mentioned, in
> regular DPDK user app this functions is called for every driver before
> main(). How?

If I am not wrong here, you have to use the -d option to specify the driver you 
want to use.

Btw, the option you were looking for can be found in config/common_linuxapp or 
config/common_bsdapp.

Pablo
> 
> BR,
> Newman
> 
> On Tue, Nov 11, 2014 at 3:44 PM, Newman Poborsky
> 
> wrote:
> 
> > Hi Sergio,
> >
> > no, that sounds good, thank you.  Since I'm not that familiar with DPDK
> > build system, where should this option be set? In 'lib' folder's Makefile?
> >
> > Thank you once again!
> >
> > BR,
> > Newman
> >
> > On Tue, Nov 11, 2014 at 3:18 PM, Sergio Gonzalez Monroy <
> > sergio.gonzalez.monroy at intel.com> wrote:
> >
> >> On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
> >> > Hi,
> >> >
> >> > I want to build one .so file with my app (it contains API that I want to
> >> > call through JNI) and all DPDK libs that I use in my app.
> >> >
> >> > As I've already mentioned, when I build and start my dpdk app as a
> >> > standalone application, I can see that before main() is called, there
> >> is a
> >> > call to 'rte_eal_driver_register()' function for every driver. When I
> >> build
> >> > .so file, this does not happen and no driver is registered so everyting
> >> > after rte_eal_init() fails.
> >> >
> >> Hi Newman,
> >>
> >> AFAIK the current build system does not support that.
> >>
> >> You can build DPDK as shared libs by setting the following config option:
> >> CONFIG_RTE_BUILD_SHARED_LIB=y
> >>
> >> Then build your app as an .so that links against DPDK libs, so you have
> >> explicit dependencies (such dependencies should show with ldd).
> >>
> >> Is there any reason why you want everything to be a single .so ?
> >>
> >> I don't know much about how Java loads DSOs but I reckon that it must
> >> resolve
> >> explicit dependencies such as libc.
> >>
> >> Thanks,
> >> Sergio
> >>
> >>
> >> >
> >> > BR,
> >> > Newman
> >> >
> >>
> >
> >


[dpdk-dev] LLC miss in librte_distributor

2014-11-11 Thread jigsaw
Hi Bruce,

I noticed that librte_distributor has quite sever LLC miss problem when
running on 16 cores.
While on 8 cores, there's no such problem.
The test runs on a Intel(R) Xeon(R) CPU E5-2670, a SandyBridge with 32
cores on 2 sockets.

The test case is the distributor_perf_autotest, i.e.
in app/test/test_distributor_perf.c.
The test result is collected by command:

perf stat -e LLC-load-misses,LLC-loads,LLC-store-misses,LLC-stores ./test
-cff -n2 --no-huge

Note that test results show that with or without hugepage, the LCC miss
rate remains the same. So I will just show --no-huge config.

With 8 cores, the LLC miss rate is OK:

LLC-load-misses  26750
LLC-loads  93979233
LLC-store-misses  432263
LLC-stores  69954746

That is 0.028% of load miss and 0.62% of store miss.

With 16 cores, the LLC miss rate is very high:

LLC-load-misses  70263520
LLC-loads  143807657
LLC-store-misses  23115990
LLC-stores  63692854

That is 48.9% load miss and 36.3% store miss.

Most of the load miss happens at first line of rte_distributor_poll_pkt.
Most of the store miss happens at ... I don't know, because perf record on
LLC-store-misses brings down my machine.

It's not so straightforward to me how could this happen: 8 core fine, but
16 cores very bad.
My guess is that 16 cores bring in more QPI transaction between sockets?
Or 16 cores bring a different LLC access pattern?

So I tried to reduce the padding inside union rte_distributor_buffer from 3
cachelines to 1 cacheline.

- char pad[CACHE_LINE_SIZE*3];
+char pad[CACHE_LINE_SIZE];

And it does have a obvious result:

LLC-load-misses  53159968
LLC-loads  167756282
LLC-store-misses  29012799
LLC-stores  63352541

Now it is 31.69% of load miss, and 45.79% of store miss.

It lows down the load miss rate, but raises the store miss rate.
Both numbers are still very high, sadly.
But the bright side is that it decrease the Time per burst and time per
packet.

The original version has:
=== Performance test of distributor ===
Time per burst:  8013
Time per packet: 250

And the patched ver has:
=== Performance test of distributor ===
Time per burst:  6834
Time per packet: 213


I tried a couple of other tricks. Such as adding more idle loops
in rte_distributor_get_pkt,
and making the rte_distributor_buffer thread_local to each worker core. But
none of this trick
has any noticeable outcome. These failures make me tend to believe the high
LLC miss rate
is related to QPI or NUMA. But my machine is not able to perf on uncore QPI
events so this
cannot be approved.


I cannot draw any conclusion or reveal the root cause after all. But I
suggest a further study on the performance bottleneck so as to find a good
solution.

thx &
rgds,
-qinglai


[dpdk-dev] building shared library

2014-11-11 Thread Neil Horman
On Tue, Nov 11, 2014 at 03:26:04PM +, De Lara Guarch, Pablo wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Newman Poborsky
> > Sent: Tuesday, November 11, 2014 3:17 PM
> > To: Gonzalez Monroy, Sergio
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] building shared library
> > 
> > Hi,
> > 
> > after building DPDK libs as shared libraries and linking it, I'm back to my
> > first problem: rte_eal_driver_register() never gest called and my app
> > crashes since there are no drivers registered.  As previously mentioned, in
> > regular DPDK user app this functions is called for every driver before
> > main(). How?
> 
> If I am not wrong here, you have to use the -d option to specify the driver 
> you want to use.
> 
> Btw, the option you were looking for can be found in config/common_linuxapp 
> or config/common_bsdapp.
> 

Alternatively, when you link your application you can speify
-llibrte_pmd_ and your applicaion should call all the constructors when
the dynamic loader hits your binaries DT_NEEDED table.  Thats how you can avoid
the command line specification.

Neil

> Pablo
> > 
> > BR,
> > Newman
> > 
> > On Tue, Nov 11, 2014 at 3:44 PM, Newman Poborsky
> > 
> > wrote:
> > 
> > > Hi Sergio,
> > >
> > > no, that sounds good, thank you.  Since I'm not that familiar with DPDK
> > > build system, where should this option be set? In 'lib' folder's Makefile?
> > >
> > > Thank you once again!
> > >
> > > BR,
> > > Newman
> > >
> > > On Tue, Nov 11, 2014 at 3:18 PM, Sergio Gonzalez Monroy <
> > > sergio.gonzalez.monroy at intel.com> wrote:
> > >
> > >> On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
> > >> > Hi,
> > >> >
> > >> > I want to build one .so file with my app (it contains API that I want 
> > >> > to
> > >> > call through JNI) and all DPDK libs that I use in my app.
> > >> >
> > >> > As I've already mentioned, when I build and start my dpdk app as a
> > >> > standalone application, I can see that before main() is called, there
> > >> is a
> > >> > call to 'rte_eal_driver_register()' function for every driver. When I
> > >> build
> > >> > .so file, this does not happen and no driver is registered so everyting
> > >> > after rte_eal_init() fails.
> > >> >
> > >> Hi Newman,
> > >>
> > >> AFAIK the current build system does not support that.
> > >>
> > >> You can build DPDK as shared libs by setting the following config option:
> > >> CONFIG_RTE_BUILD_SHARED_LIB=y
> > >>
> > >> Then build your app as an .so that links against DPDK libs, so you have
> > >> explicit dependencies (such dependencies should show with ldd).
> > >>
> > >> Is there any reason why you want everything to be a single .so ?
> > >>
> > >> I don't know much about how Java loads DSOs but I reckon that it must
> > >> resolve
> > >> explicit dependencies such as libc.
> > >>
> > >> Thanks,
> > >> Sergio
> > >>
> > >>
> > >> >
> > >> > BR,
> > >> > Newman
> > >> >
> > >>
> > >
> > >


[dpdk-dev] building shared library

2014-11-11 Thread Newman Poborsky
It works!!!  Thanks everybody!

I wasn't using '-Wl,--no-as-needed'  while compiling, so no PMD driver was
linked and hence no constructor called. After putting this options, it
finally works.

Again, thank you very much, I could never figure out all these steps on my
own!

BR,
Newman

On Tue, Nov 11, 2014 at 4:54 PM, Neil Horman  wrote:

> On Tue, Nov 11, 2014 at 03:26:04PM +, De Lara Guarch, Pablo wrote:
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Newman Poborsky
> > > Sent: Tuesday, November 11, 2014 3:17 PM
> > > To: Gonzalez Monroy, Sergio
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] building shared library
> > >
> > > Hi,
> > >
> > > after building DPDK libs as shared libraries and linking it, I'm back
> to my
> > > first problem: rte_eal_driver_register() never gest called and my app
> > > crashes since there are no drivers registered.  As previously
> mentioned, in
> > > regular DPDK user app this functions is called for every driver before
> > > main(). How?
> >
> > If I am not wrong here, you have to use the -d option to specify the
> driver you want to use.
> >
> > Btw, the option you were looking for can be found in
> config/common_linuxapp or config/common_bsdapp.
> >
>
> Alternatively, when you link your application you can speify
> -llibrte_pmd_ and your applicaion should call all the constructors
> when
> the dynamic loader hits your binaries DT_NEEDED table.  Thats how you can
> avoid
> the command line specification.
>
> Neil
>
> > Pablo
> > >
> > > BR,
> > > Newman
> > >
> > > On Tue, Nov 11, 2014 at 3:44 PM, Newman Poborsky
> > > 
> > > wrote:
> > >
> > > > Hi Sergio,
> > > >
> > > > no, that sounds good, thank you.  Since I'm not that familiar with
> DPDK
> > > > build system, where should this option be set? In 'lib' folder's
> Makefile?
> > > >
> > > > Thank you once again!
> > > >
> > > > BR,
> > > > Newman
> > > >
> > > > On Tue, Nov 11, 2014 at 3:18 PM, Sergio Gonzalez Monroy <
> > > > sergio.gonzalez.monroy at intel.com> wrote:
> > > >
> > > >> On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
> > > >> > Hi,
> > > >> >
> > > >> > I want to build one .so file with my app (it contains API that I
> want to
> > > >> > call through JNI) and all DPDK libs that I use in my app.
> > > >> >
> > > >> > As I've already mentioned, when I build and start my dpdk app as a
> > > >> > standalone application, I can see that before main() is called,
> there
> > > >> is a
> > > >> > call to 'rte_eal_driver_register()' function for every driver.
> When I
> > > >> build
> > > >> > .so file, this does not happen and no driver is registered so
> everyting
> > > >> > after rte_eal_init() fails.
> > > >> >
> > > >> Hi Newman,
> > > >>
> > > >> AFAIK the current build system does not support that.
> > > >>
> > > >> You can build DPDK as shared libs by setting the following config
> option:
> > > >> CONFIG_RTE_BUILD_SHARED_LIB=y
> > > >>
> > > >> Then build your app as an .so that links against DPDK libs, so you
> have
> > > >> explicit dependencies (such dependencies should show with ldd).
> > > >>
> > > >> Is there any reason why you want everything to be a single .so ?
> > > >>
> > > >> I don't know much about how Java loads DSOs but I reckon that it
> must
> > > >> resolve
> > > >> explicit dependencies such as libc.
> > > >>
> > > >> Thanks,
> > > >> Sergio
> > > >>
> > > >>
> > > >> >
> > > >> > BR,
> > > >> > Newman
> > > >> >
> > > >>
> > > >
> > > >
>


[dpdk-dev] [PATCH] Bond: set {rx|tx}_offload_capa flags

2014-11-11 Thread Doherty, Declan
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jia Yu
> Sent: Friday, November 7, 2014 5:36 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] Bond: set {rx|tx}_offload_capa flags
> 
> Before the fix, bond device's offload capabilities are unset. This fix
> takes the minimum common set of slave devices' capabilities as bond
> device's capabilities. For simplicity, we ensure all slave devices
> to have a capability before bond device can claim this capability,
> even if some slave devices are unused (i.e. linked down, standby).
> 
> Signed-off-by: Jia Yu 
> ---
>  lib/librte_pmd_bond/rte_eth_bond_api.c | 16 
> .
> --
> 1.9.1

Acked-by: Declan Doherty 




[dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD

2014-11-11 Thread Patel, Rashmin N
Please find comments in-lined.

Thanks,
RP

From: Aziz Hajee [mailto:a...@saisei.com]
Sent: Monday, November 10, 2014 6:00 PM
To: Patel, Rashmin N
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD

Rashmin,
Since I do need the jumbo, I use the vmxnet3-plugin you described, i.e.
(1)
sudo insmod ./vmxnet3-usermap.ko enable_shm=2,2 num_rqs=1,1 num_rxds=2048 
num_txds=2048
and (2) when running the application, use in the args list:
"-d", "librte_pmd_vmxnet3.so"
Does the above two piece mean vmxnet3-plugin
[RP] that?s correct
I do see my vmxnet3 device from the dump,rte_eal_pci_dump();
but the 'nb_ports' in DPDK never gets incremented rte_eth_dev_count() returns 
zero.
so all the other api fails, if (port_id >= nb_ports) {
PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
return;
}

:03:00.0 - vendor:15ad device:7b0
   d2404000 1000
   d2403000 1000
   d240 2000
    
    
    
   d440 0001
:0b:00.0 - vendor:15ad device:7b0
   d2504000 1000
   d2503000 1000
   d250 2000
    
    
    
   d450 0001
DPDK: No Ethernet ports (rte_eth_dev_count() returns zero)
PMD: rte_eth_dev_info_get: Invalid port_id=0
PMD: rte_eth_dev_configure: Invalid port_id=0
PMD: rte_eth_dev_info_get: Invalid port_id=0
PMD: rte_eth_dev_configure: Invalid port_id=0
So when using not using DPDK PMD for VMXNET3, what am i missing, for the the 
DPDK to know the nb_ports,
How will the rte_eth_dev_start(portid) in DPDK library know  ?

rte_pmd_init_all() will not have the init the Intel DPDK PMD , 
RTE_LIBRTE_VMXNET3_PMD = n in config.
[RP] If you?re using the vmxnet3-plugin, you should keep RTE_LIBRTE_VMXNET3_PMD 
= n in config, and link the shared library with its headers, it should work. I 
have tried it once a long back.

#ifdef RTE_LIBRTE_VMXNET3_PMD
if ((ret = rte_vmxnet3_pmd_init()) != 0) {
RTE_LOG(ERR, PMD, "Cannot init vmxnet3 PMD\n");
return (ret);
}
If I make RTE_LIBRTE_VMXNET3_PMD = y, then I am using the Intel DPDK PMD and no 
jumbo.
[RP] Yes, I understood that part. We need to support jumbo frames in in-tree 
version of VMXNET3-PMD, we?ll merge all soon, we can discuss in the community 
conf. call so please do attend the next one on Nov 18 and we can raise concerns 
there

Thanks,
aziz

On Fri, Nov 7, 2014 at 8:53 AM, Patel, Rashmin N mailto:rashmin.n.patel at intel.com>> wrote:
Hi Aziz,

Yes, you're right DPDK VMXNET3-PMD in /lib/librte_pmd_vmxnet3 does not support 
mbuf chaining today. But it's a standalone bsd driver just like any other pmd 
in that directory, it does not need vmxnet3-usermap.ko module.

Now there is another vmxnet3 solution in a separate branch as a plugin, which 
must have vmxnet3-usermap.ko linux module(1), and a user space interface 
piece(2) to tie it to any DPDK application in the main branch. (1) and (2) 
makes the solution which is known as vmxnet3-plugin. It's been there for a long 
time just like virtio-plugin, I don't know who uses it, but community can 
*reply* here if there is still any need of a separate solution that way.

I'm in favor of consolidating all those version into one elegant solution by 
grabbing best features from all of them and maintain one copy. I'm sure that 
developers contributing from VMware would also support that idea because then 
it makes easy to maintain and debug and bug fix and most importantly avoid such 
confusion in future.

Thanks,
Rashmin

-Original Message-
From: dev [mailto:dev-bounces at dpdk.org] On 
Behalf Of Aziz Hajee
Sent: Thursday, November 06, 2014 5:47 PM
To: dev at dpdk.org
Subject: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD

I am using the dpdk1.6.0r1
I could not find a complete clarification, sorry if missed.
VMXNET3 PMD

I have enabled the VMXNET3 PMD  in the dpdk.
 # Compile burst-oriented VMXNET3 PMD driver  #

CONFIG_RTE_LIBRTE_VMXNET3_PMD=y
CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_INIT=y
CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_RX=n
CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX=n
CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
The Intel DPDK VMXNET3 PMD driver does not support mbuf chaining, and I have to 
set NOMULTSEGS for the vmxnet3 interface init to succeed.
tx_conf.txq_flags =  ETH_TXQ_FLAGS_NOMULTSEGS Is there a later version of DPDK 
that supports multiseg for the dpdk
VMXNET3 PMD.

vmware vmxnet3-usermap AND  DPDK VMXNET3 PMD 
=
Is the vmxnet3-usermap.ko module driver also needed ? (appears that I need, 
otherwise the eal initialise fails.
sudo insmod ./vmx

[dpdk-dev] rte_eth_add_mc_addr not in dpdk-1.7.1

2014-11-11 Thread Shyam Sundar Govindaraj
Hi

The below function is available in dpdk-1.6.0r1 but not in dpdk-1.7.1. Please 
let me know if there is any alternative function available for this in 
dpdk-1.7.1?

lib/librte_ether/rte_ethdev.c:858:rte_eth_add_mc_addr(uint8_t port_id, uint8_t 
*mac_addr[], uint32_t mc_addr_cnt)
lib/librte_ether/rte_ethdev.h:1675:extern void rte_eth_add_mc_addr(uint8_t 
port_id, uint8_t *mac_addr[], uint32_t mc_addr_cnt);

Thanks
Shyam


[dpdk-dev] [PATCH v5 2/5] ethdev: add enum type and relevant structures for hash filter control

2014-11-11 Thread Thomas Monjalon
2014-11-11 06:46, Zhang, Helin:
> In order to get things more generic, and remove any mappings on specific NIC 
> hardwares, I planned to change the macros in rte_ethdev.h from
> 
> /* Supported RSS offloads */
> /* for 1G & 10G */
> #define ETH_RSS_IPV4_SHIFT0
> #define ETH_RSS_IPV4_TCP_SHIFT1
> #define ETH_RSS_IPV6_SHIFT2
> #define ETH_RSS_IPV6_EX_SHIFT 3
> #define ETH_RSS_IPV6_TCP_SHIFT4
> #define ETH_RSS_IPV6_TCP_EX_SHIFT 5
> #define ETH_RSS_IPV4_UDP_SHIFT6
> #define ETH_RSS_IPV6_UDP_SHIFT7
> #define ETH_RSS_IPV6_UDP_EX_SHIFT 8
> /* for 40G only */
> #define ETH_RSS_NONF_IPV4_UDP_SHIFT   31
> #define ETH_RSS_NONF_IPV4_TCP_SHIFT   33
> #define ETH_RSS_NONF_IPV4_SCTP_SHIFT  34
> #define ETH_RSS_NONF_IPV4_OTHER_SHIFT 35
> #define ETH_RSS_FRAG_IPV4_SHIFT   36
> #define ETH_RSS_NONF_IPV6_UDP_SHIFT   41
> #define ETH_RSS_NONF_IPV6_TCP_SHIFT   43
> #define ETH_RSS_NONF_IPV6_SCTP_SHIFT  44
> #define ETH_RSS_NONF_IPV6_OTHER_SHIFT 45
> #define ETH_RSS_FRAG_IPV6_SHIFT   46
> #define ETH_RSS_FCOE_OX_SHIFT 48
> #define ETH_RSS_FCOE_RX_SHIFT 49
> #define ETH_RSS_FCOE_OTHER_SHIFT  50
> #define ETH_RSS_L2_PAYLOAD_SHIFT  63
> 
> to
> 
> /* Supported RSS offloads */
> /* for 1G & 10G */
> #define ETH_FLOW_TYPE_IPV40
> #define ETH_FLOW_TYPE_IPV4_TCP1
> #define ETH_FLOW_TYPE_IPV62
> #define ETH_FLOW_TYPE_IPV6_EX 3
> #define ETH_FLOW_TYPE_IPV6_TCP4
> #define ETH_FLOW_TYPE_IPV6_TCP_EX 5
> #define ETH_FLOW_TYPE_IPV4_UDP6
> #define ETH_FLOW_TYPE_IPV6_UDP7
> #define ETH_FLOW_TYPE_IPV6_UDP_EX 8
> /* for 40G only */
> #define ETH_FLOW_TYPE_NONFRAG_IPV4_UDP   9
> #define ETH_FLOW_TYPE_NONFRAG_IPV4_TCP   10
> #define ETH_FLOW_TYPE_NONFRAG_IPV4_SCTP  11
> #define ETH_FLOW_TYPE_NONFRAG_IPV4_OTHER 12
> #define ETH_FLOW_TYPE_FRAG_IPV4   13
> #define ETH_FLOW_TYPE_NONFRAG_IPV6_UDP   14
> #define ETH_FLOW_TYPE_NONFRAG_IPV6_TCP   15
> #define ETH_FLOW_TYPE_NONFRAG_IPV6_SCTP  16
> #define ETH_FLOW_TYPE_NONFRAG_IPV6_OTHER 17
> #define ETH_FLOW_TYPE_FRAG_IPV6  18
> #define ETH_FLOW_TYPE_L2_PAYLOAD  19
> 
> Any comments or better ideas on that? Thanks!

About the renaming RSS -> FLOW_TYPE, I have no objection.
It seems a bit better.
Some comments are needed to explain what means the value.
I think the comments "1G & 10G" or "40G only" are possibly wrong.
Actually you use ETH_FLOW_TYPE_IPV4 for ixgbe and ETH_FLOW_TYPE_FRAG_IPV4
or ETH_FLOW_TYPE_NONFRAG_IPV4_* for i40e. It's not consistent and clearly
shows that you stick to the hardware definitions.

Something really generic could be a set of flags like this:
IPV4
IPV6
NONFRAG
UDP
TCP
SCTP

-- 
Thomas


[dpdk-dev] rte_eth_add_mc_addr not in dpdk-1.7.1

2014-11-11 Thread Thomas Monjalon
2014-11-11 19:41, Shyam Sundar Govindaraj:
> The below function is available in dpdk-1.6.0r1 but not in dpdk-1.7.1.
> Please let me know if there is any alternative function available for this in 
> dpdk-1.7.1?

No, this function doesn't exist in any DPDK version I know.
It's probably something you've added internally. So I'm afraid it will be
difficult to offer support on this ;)

> lib/librte_ether/rte_ethdev.c:858:rte_eth_add_mc_addr(uint8_t port_id, 
> uint8_t *mac_addr[], uint32_t mc_addr_cnt)
> lib/librte_ether/rte_ethdev.h:1675:extern void rte_eth_add_mc_addr(uint8_t 
> port_id, uint8_t *mac_addr[], uint32_t mc_addr_cnt);

-- 
Thomas


[dpdk-dev] vhost-user technical isssues

2014-11-11 Thread Xie, Huawei
Hi Tetsuya:
There are two major technical issues in my mind for vhost-user implementation.

1) memory region map
Vhost-user passes us file fd and offset for each memory region. Unfortunately 
the mmap offset is "very" wrong. I discovered this issue long time ago, and 
also found
that I couldn't mmap the huge page file even with correct offset(need double 
check).
Just now I find that people reported this issue on Nov 3.
[Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use the fd 
for region(0) to map the  whole file.
I think we should use this way temporarily to support qemu-2.1 as it has that 
bug.

2) what message is the indicator for vhost start/release?
Previously  for vhost-cuse, it has SET_BACKEND message.
What we should do for vhost-user?
SET_VRING_KICK for start?
What about for release?
Unlike the kernel virtio, the DPDK virtio in guest could be restarted. 

Thoughts?

-huawei


[dpdk-dev] [PATCH v2 0/2] examples/vmdq: support new VMDQ API

2014-11-11 Thread Thomas Monjalon
> > This patch supports new VMDQ API in vmdq example.
> > 
> > v2 changes:
> > * code rebase
> > * allow app to specify num_pools different with max_nb_pools
> > * fix serious cs issues
> > 
> > Huawei Xie (2):
> >   support new VMDQ API in vmdq example
> >   fix cs issues in vmdq example
> 
> Acked-by : Jing Chen 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] Added Spinlock to l3fwd-vf example to prevent race conditioning

2014-11-11 Thread Thomas Monjalon
Hi Daniel,

This old patch is probably good but I'd like you explain it please.
Reviewers are also welcome.

Thanks
-- 
Thomas

2014-07-23 10:33, Thomas Monjalon:
> Hi Daniel,
> 
> Some explanations are missing here.
> 
> > Signed-off-by: Daniel Mrzyglod 
> > 
> > --- a/examples/l3fwd-vf/main.c
> > +++ b/examples/l3fwd-vf/main.c
> > @@ -54,6 +54,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -328,7 +329,7 @@ struct lcore_conf {
> >  } __rte_cache_aligned;
> >  
> >  static struct lcore_conf lcore_conf[RTE_MAX_LCORE];
> > -
> > +static rte_spinlock_t 
> > spinlock_conf[RTE_MAX_ETHPORTS]={RTE_SPINLOCK_INITIALIZER};
> >  /* Send burst of packets on an output interface */
> >  static inline int
> >  send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
> > @@ -340,7 +341,10 @@ send_burst(struct lcore_conf *qconf, uint16_t n, 
> > uint8_t port)
> > queueid = qconf->tx_queue_id;
> > m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
> >  
> > +   rte_spinlock_lock(&spinlock_conf[port]) ;
> > ret = rte_eth_tx_burst(port, queueid, m_table, n);
> > +   rte_spinlock_unlock(&spinlock_conf[port]);
> > +   
> > if (unlikely(ret < n)) {
> > do {
> > rte_pktmbuf_free(m_table[ret]);
> > 



[dpdk-dev] [PATCH] kni: optimizing the rte_kni_rx_burst

2014-11-11 Thread Thomas Monjalon
Is there anyone interested in KNI to review this patch please?


2014-07-23 12:15, Hemant Agrawal:
> The current implementation of rte_kni_rx_burst polls the fifo for buffers.
> Irrespective of success or failure, it allocates the mbuf and try to put them 
> into the alloc_q
> if the buffers are not added to alloc_q, it frees them.
> This waste lots of cpu cycles in allocating and freeing the buffers if 
> alloc_q is full.
> 
> The logic has been changed to:
> 1. Initially allocand add buffer(burstsize) to alloc_q
> 2. Add buffers to alloc_q only when you are pulling out the buffers.
> 
> Signed-off-by: Hemant Agrawal 
> ---
>  lib/librte_kni/rte_kni.c |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> index 76feef4..01e85f8 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -263,6 +263,9 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
>  
>   ctx->in_use = 1;
>  
> + /* Allocate mbufs and then put them into alloc_q */
> + kni_allocate_mbufs(ctx);
> +
>   return ctx;
>  
>  fail:
> @@ -369,8 +372,9 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf 
> **mbufs, unsigned num)
>  {
>   unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);
>  
> - /* Allocate mbufs and then put them into alloc_q */
> - kni_allocate_mbufs(kni);
> + /* If buffers removed, allocate mbufs and then put them into alloc_q */
> + if(ret)
> + kni_allocate_mbufs(kni);
>  
>   return ret;
>  }



[dpdk-dev] [PATCH] Added Spinlock to l3fwd-vf example to prevent race conditioning

2014-11-11 Thread Xie, Huawei

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, November 11, 2014 3:57 PM
> To: Mrzyglod, DanielX T
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] Added Spinlock to l3fwd-vf example to prevent
> race conditioning
> 
> Hi Daniel,
> 
> This old patch is probably good but I'd like you explain it please.
> Reviewers are also welcome.
> 
> Thanks
> --
> Thomas
> 
> 2014-07-23 10:33, Thomas Monjalon:
> > Hi Daniel,
> >
> > Some explanations are missing here.
> >
> > > Signed-off-by: Daniel Mrzyglod 
> > >
> > > --- a/examples/l3fwd-vf/main.c
> > > +++ b/examples/l3fwd-vf/main.c
> > > @@ -54,6 +54,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -328,7 +329,7 @@ struct lcore_conf {
> > >  } __rte_cache_aligned;
> > >
> > >  static struct lcore_conf lcore_conf[RTE_MAX_LCORE];
> > > -
> > > +static rte_spinlock_t
> spinlock_conf[RTE_MAX_ETHPORTS]={RTE_SPINLOCK_INITIALIZER};
> > >  /* Send burst of packets on an output interface */
> > >  static inline int
> > >  send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
> > > @@ -340,7 +341,10 @@ send_burst(struct lcore_conf *qconf, uint16_t n,
> uint8_t port)
> > >   queueid = qconf->tx_queue_id;
> > >   m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
> > >
> > > + rte_spinlock_lock(&spinlock_conf[port]) ;
> > >   ret = rte_eth_tx_burst(port, queueid, m_table, n);
> > > + rte_spinlock_unlock(&spinlock_conf[port]);

It might not be good choice for here, but how about we also provide 
spin_trylock as alternative API?

> > > +
> > >   if (unlikely(ret < n)) {
> > >   do {
> > >   rte_pktmbuf_free(m_table[ret]);
> > >



[dpdk-dev] [PATCH 0/2] rte_ethdev fix/improvement

2014-11-11 Thread Jia Yu
Thanks, Thomas.

The two patches are minor fixes. No new API is introduced. I will revise
the description.

Thanks,
Jia

On 11/10/14, 1:40 AM, "Thomas Monjalon"  wrote:

>Hi Jia,
>
>2014-11-07 09:31, Jia Yu:
>> This patch series include a fix and an improvement to rte_ethdev lib.
>
>New enhancements won't be integrated in release 1.8.
>But fixes are welcome.
>The problem is that it's not easy to track partially applies patchset.
>So it would be simpler if you send your fix separately.
>
>Thanks
>-- 
>Thomas



[dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD

2014-11-11 Thread Aziz Hajee
Thanks Rashmin,
I am close, just getting this solib loading undefined symbol error: (I mod
code to see the error). Using dpdk-1.6.0r1_ss

EAL: Setting up memory...
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fc0a820 (size = 0x20)
EAL: Ask a virtual area of 0x5c0 bytes
EAL: Virtual area found at 0x7fc07e20 (size = 0x5c0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fc0a1c0 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fc0a180 (size = 0x20)
EAL: Ask a virtual area of 0x19c0 bytes
EAL: Virtual area found at 0x7fc06440 (size = 0x19c0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fc0a140 (size = 0x20)
EAL: Requesting 256 pages of size 2MB from socket 0
EAL: TSC frequency is ~230 KHz
Cannot load solib librte_pmd_vmxnet3.so DLERROR:./librte_pmd_vmxnet3.so:
undefined symbol: per_lcore__lcore_id
EAL: (null)

coming from eal.c /* Launch threads, called at application init(). */
int
rte_eal_init(int argc, char **argv)

   TAILQ_FOREACH(solib, &solib_list, next) {
solib->lib_handle = dlopen(solib->name, RTLD_NOW);
if ((solib->lib_handle == NULL) && (solib->name[0] != '/'))
{
/* relative path: try again with "./" prefix */
char sopath[PATH_MAX];
snprintf(sopath, sizeof(sopath), "./%s",
solib->name);
solib->lib_handle = dlopen(sopath, RTLD_NOW);
}
if (solib->lib_handle == NULL)
printf("Cannot load solib %s DLERROR:%s\n",
solib->name, dlerror());
//  RTE_LOG(WARNING, EAL, "%s\n", dlerror());

}


On Tue, Nov 11, 2014 at 9:19 AM, Patel, Rashmin N  wrote:

>  Please find comments in-lined.
>
>
>
> Thanks,
>
> RP
>
>
>
> *From:* Aziz Hajee [mailto:aziz at saisei.com]
> *Sent:* Monday, November 10, 2014 6:00 PM
> *To:* Patel, Rashmin N
> *Cc:* dev at dpdk.org
> *Subject:* Re: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD
>
>
>
> Rashmin,
> Since I do need the jumbo, I use the vmxnet3-plugin you described, i.e.
> (1)
> sudo insmod ./vmxnet3-usermap.ko enable_shm=2,2 num_rqs=1,1 num_rxds=2048
> num_txds=2048
>
> and (2) when running the application, use in the args list:
>
> "-d", "librte_pmd_vmxnet3.so"
>
> Does the above two piece mean vmxnet3-plugin
>
> [RP] that?s correct
>
> I do see my vmxnet3 device from the dump,rte_eal_pci_dump();
> but the 'nb_ports' in DPDK never gets incremented rte_eth_dev_count()
> returns zero.
> so all the other api fails, if (port_id >= nb_ports) {
> PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> return;
> }
>
>   :03:00.0 - vendor:15ad device:7b0
>d2404000 1000
>d2403000 1000
>d240 2000
> 
> 
> 
>d440 0001
> :0b:00.0 - vendor:15ad device:7b0
>d2504000 1000
>d2503000 1000
>d250 2000
> 
> 
> 
>d450 0001
> DPDK: No Ethernet ports (rte_eth_dev_count() returns zero)
> PMD: rte_eth_dev_info_get: Invalid port_id=0
> PMD: rte_eth_dev_configure: Invalid port_id=0
> PMD: rte_eth_dev_info_get: Invalid port_id=0
> PMD: rte_eth_dev_configure: Invalid port_id=0
>
> So when using not using DPDK PMD for VMXNET3, what am i missing, for the
> the DPDK to know the nb_ports,
>
> How will the rte_eth_dev_start(portid) in DPDK library know  ?
>
> rte_pmd_init_all() will not have the init the Intel DPDK PMD ,
> RTE_LIBRTE_VMXNET3_PMD = n in config.
>
> [RP] If you?re using the vmxnet3-plugin, you should keep
> RTE_LIBRTE_VMXNET3_PMD = n in config, and link the shared library with its
> headers, it should work. I have tried it once a long back.
>
>
> #ifdef RTE_LIBRTE_VMXNET3_PMD
> if ((ret = rte_vmxnet3_pmd_init()) != 0) {
> RTE_LOG(ERR, PMD, "Cannot init vmxnet3 PMD\n");
> return (ret);
> }
>
> If I make RTE_LIBRTE_VMXNET3_PMD = y, then I am using the Intel DPDK PMD
> and no jumbo.
>
> [RP] Yes, I understood that part. We need to support jumbo frames in
> in-tree version of VMXNET3-PMD, we?ll merge all soon, we can discuss in the
> community conf. call so please do attend the next one on Nov 18 and we can
> raise concerns there
>
>
>
> Thanks,
>
> aziz
>
>
>
> On Fri, Nov 7, 2014 at 8:53 AM, Patel, Rashmin N <
> rashmin.n.patel at intel.com> wrote:
>
> Hi Aziz,
>
> Yes, you're right DPDK VMXNET3-PMD in /lib/librte_pmd_vmxnet3 does not
> support mbuf chaining today. But it's a standalone bsd drive