[dpdk-dev] [PATCH] i40e: fix the issue of wrongly reporting descriptor done

2015-07-30 Thread Helin Zhang
Header buffer address for header split will be filled with the
physical address for DMA, which is actually not needed at all,
as header split hasn't been supported. Hardware requires the
least bit of header address which is 'Descriptor Done' bit when
write back should be set to 0 by driver.
The issue is that if the user wants to reserve an odd number of
bytes between the mbuf header and data buffer, the physical address
to be filled in the descriptor would happen to be odd. That means
the DD bit would be set to non-zero by driver. That will result in
reporting descriptor done wrongly.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_rxtx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 891a221..a267b4d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1367,7 +1367,7 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
mb->port = rxq->port_id;
dma_addr = rte_cpu_to_le_64(\
RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
-   rxdp[i].read.hdr_addr = dma_addr;
+   rxdp[i].read.hdr_addr = 0;
rxdp[i].read.pkt_addr = dma_addr;
}

@@ -1514,7 +1514,7 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
rxe->mbuf = nmb;
dma_addr =
rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
-   rxdp->read.hdr_addr = dma_addr;
+   rxdp->read.hdr_addr = 0;
rxdp->read.pkt_addr = dma_addr;

rx_packet_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
@@ -1640,7 +1640,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));

/* Set data buffer address and data length of the mbuf */
-   rxdp->read.hdr_addr = dma_addr;
+   rxdp->read.hdr_addr = 0;
rxdp->read.pkt_addr = dma_addr;
rx_packet_len = (qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
@@ -3047,7 +3047,7 @@ i40e_alloc_rx_queue_mbufs(struct i40e_rx_queue *rxq)

rxd = &rxq->rx_ring[i];
rxd->read.pkt_addr = dma_addr;
-   rxd->read.hdr_addr = dma_addr;
+   rxd->read.hdr_addr = 0;
 #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
rxd->read.rsvd1 = 0;
rxd->read.rsvd2 = 0;
-- 
1.9.3



[dpdk-dev] [PATCH] testpmd: Fix wrong message in testpmd

2015-07-30 Thread Thomas Monjalon
2015-06-24 15:56, Michael Qiu:
> When close one port twice, testpmd will give out wrong messagse.
> 
> testpmd> port stop  0
> Stopping ports...
> Checking link statuses...
> Port 0 Link Up - speed 0 Mbps - full-duplex
> Port 1 Link Up - speed 0 Mbps - full-duplex
> Done
> testpmd> port close 0
> Closing ports...
> Done
> testpmd> port close 0
> Closing ports...
> Port 0 is now not stopped
> Done
> testpmd> 
> 
> 
> Signed-off-by: Michael Qiu 

Applied, thanks


[dpdk-dev] [PATCH v3] i40evf: fix crash when setup tx queues on vf port

2015-07-30 Thread Thomas Monjalon
> This patch fixes the issue:
> Testpmd crashed with Segmentation fault when setup tx queues on vf
> Steps for reproduce:
>   - create one vf device from i40e driver
>   - bind vf device to igb_uio and start testpmd
> 
> With debugging tools, we saw the struct i40e_vf is cleared after
> memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf)) in
> rte_eth_dev_configure, which should not happen, and the pointer to
> i40e_vf isn't in the range of i40e_adapter.
> 
> The root cause is the dev_private_size in i40e virtual function driver struct
> rte_i40evf_pmd was set incorrectly.
> 
> Signed-off-by: Jingjing Wu 

Applied, thanks

Does it mean that Tx with i40evf never worked before?


[dpdk-dev] [PATCHv2 0/2] ixgbe: Two fixes for RX scatter functions.

2015-07-30 Thread Thomas Monjalon
> Acked-by: Wenzhuo Lu 

Applied, thanks


[dpdk-dev] [PATCH v1 1/1] ixgbe: Fix oerrors by setting it to 0

2015-07-30 Thread Thomas Monjalon
> > Fix afebc86be1346136125af8026dc215f81c202c50. oerrors was txdgpc -
> > hw_stats->gptc, txdgpc is the number of packets DMA'ed by the host
> > and was being reset on every call to read stats so it could be < gptc.
> > Because we currently have no way to add txdgpc to struct hw_stats so
> > that we can maintain a persistent value per port oerrors has now been
> > set to 0. References to txdgpc is now removed as we don't use it. This
> > patch also removes rxnfgpc as it's not used anywhere.
> > 
> > Signed-off-by: Maryam Tahhan 
> Acked-by: Konstantin Ananyev 

Applied, thanks

It's a bit sad.
Is it a consequence of forbidding updates in the base driver?


[dpdk-dev] [PATCH v4] ixgbe: fix data access on big endian cpu.

2015-07-30 Thread Thomas Monjalon
> > 1. cpu use data owned by ixgbe must use rte_le_to_cpu_xx(...)
> > 2. cpu fill data to ixgbe must use rte_cpu_to_le_xx(...)
> > 3. checking pci status with converted constant
> > 
> > Signed-off-by: Xuelin Shi 
> 
> Acked-by: Konstantin Ananyev 

Applied without added blank lines, thanks


[dpdk-dev] [PATCH v2] Make the thash library arch-independent

2015-07-30 Thread Thomas Monjalon
2015-07-29 09:56, Vladimir Medvedkin:
> v2 changes
> - Fix SSE to SSE3 typo
> - remove unnecessary comments
> - Leave unalligned union rte_thash_tuple if no support for SSE3
> - Makes 32bit compiler happy by adding ULL suffix
> 
> Signed-off-by: Vladimir Medvedkin 

Applied, thanks



[dpdk-dev] [PATCH] eal: fix build

2015-07-30 Thread Thomas Monjalon
2015-07-29 17:08, Thomas Monjalon:
> 2015-07-29 15:00, Zhang, Helin:
> > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/common/eal_common_pci.c: 
> > In function ???rte_eal_pci_probe_one_driver???:
> > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/common/eal_common_pci.c:188:4:
> >  error: implicit declaration of function ???pci_config_space_set??? 
> > [-Werror=implicit-function-declaration]
> > pci_config_space_set(dev);
> > ^
> > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/common/eal_common_pci.c:188:4:
> >  error: nested extern declaration of ???pci_config_space_set??? 
> > [-Werror=nested-externs]
> > cc1: all warnings being treated as errors
> > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/linuxapp/eal/eal_pci.c:561:1:
> >  error: ???pci_config_space_set??? defined but not used 
> > [-Werror=unused-function]
> >  pci_config_space_set(struct rte_pci_device *dev)
> >  ^
> > cc1: all warnings being treated as errors
> 
> So I will change the title to:
>   eal: fix build with pci config enabled
> 
> and add this into the message:
>   Build log:
>   lib/librte_eal/common/eal_common_pci.c:188:4: error:
>   implicit declaration of function pci_config_space_set
> 
> 
> > > 2015-07-29 06:48, Helin Zhang:
> > > > It fixes the build error of implicit declaration of function.

Applied, thanks


[dpdk-dev] [PATCH v3] i40evf: fix crash when setup tx queues on vf port

2015-07-30 Thread Wu, Jingjing


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, July 30, 2015 6:33 AM
> To: Wu, Jingjing
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] i40evf: fix crash when setup tx queues
> on vf port
> 
> > This patch fixes the issue:
> > Testpmd crashed with Segmentation fault when setup tx queues on vf
> > Steps for reproduce:
> >   - create one vf device from i40e driver
> >   - bind vf device to igb_uio and start testpmd
> >
> > With debugging tools, we saw the struct i40e_vf is cleared after
> > memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf))
> in
> > rte_eth_dev_configure, which should not happen, and the pointer to
> > i40e_vf isn't in the range of i40e_adapter.
> >
> > The root cause is the dev_private_size in i40e virtual function driver
> > struct rte_i40evf_pmd was set incorrectly.
> >
> > Signed-off-by: Jingjing Wu 
> 
> Applied, thanks
> 
> Does it mean that Tx with i40evf never worked before?

Actually we didn't face crash with previous version, i40vf tx works before, 
what makes me surprised. Maybe just lucky. 


[dpdk-dev] [PATCH 2/2] virtio: allow running w/o vlan filtering

2015-07-30 Thread Ouyang, Changchun
I have comments for that.
Pls see below.

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, July 29, 2015 8:57 PM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: Re: [dpdk-dev] [PATCH 2/2] virtio: allow running w/o vlan filtering
> 
> Back on this old patch, it seems justified but nobody agreed.
> 
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> @@ -1288,7 +1288,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
> PMD_DRV_LOG(NOTICE,
> "vlan filtering not available on this host");
> -   return -ENOTSUP;
> }
> 
> 2015-03-06 08:24, Stephen Hemminger:
> > "Ouyang, Changchun"  wrote:
> > > > From: Stephen Hemminger
> > > > Vlan filtering is an option, and not a requirement.
> > > > If host does not support filtering then it can be done in software.

Yes, vlan filter is an option, but currently virtio driver has no software 
solution for vlan filter.
So I would like to disable hw_vlan_filter in rxmode if the dev can't really 
support it rather than removing the return there.

> > >
> > > The question is that guest only send command, no real action to do the
> vlan filter.
> > > So if both host and guest have no real action for vlan filter, who will 
> > > do it?
> >
> > The virtio driver has features.
> > Guest can not send commands to host where feature bit not enabled.
> > Application can call filter_set and check if filter worked or not.
> >
> > Our code already had to do MAC and VLAN validation of incoming packets

There is vlan strip, but have no vlan filter in the rx function.

> > therefore if hardware can't do vlan match, there is no problem.
> > I would expect other applications would do the same thing.
> >
> > Failing during configuration is bad. DPDK API should never force
> > application to play "guess the working configuration" with the device
> > driver or do string match on "which device is this anyway"



[dpdk-dev] [PATCH v2] doc: announce abi change for interrupt mode

2015-07-30 Thread Cunming Liang
The patch announces the planned ABI changes for interrupt mode on v2.2.

Signed-off-by: Cunming Liang 
---
 v2 change:
   - rebase to recent master

 doc/guides/rel_notes/deprecation.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 5330d3b..645ce32 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,3 +35,11 @@ Deprecation Notices
 * The following fields have been deprecated in rte_eth_stats:
   imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
   tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff
+
+* The ABI changes are planned for struct rte_intr_handle, struct rte_eth_conf
+  and struct eth_dev_ops in order to support interrupt mode feature.
+  The upcoming release 2.1 will not contain these ABI changes by default.
+  This change will be in release 2.2. There's no backwards compatibility planed
+  due to the additional interrupt mode feature enabling.
+  Binaries using this library build prior to version 2.2 will require updating
+  and recompilation.
-- 
1.8.1.4



[dpdk-dev] Why only rx queue "0" can receive network packet by i40e NIC

2015-07-30 Thread Jeff Venable, Sr.
Hi Helin,

We do not want RSS to include L4 ports in the hash because packet fragments 
would get routed to queue #0 and would be more difficult to work with.  We are 
using the model where multiple CPUs are pulling from the NIC queues 
independently with no shared state, so each 'pipeline' has private fragment 
reassembly state for the sessions it is managing.

Getting RSS Toeplitz hash to work on { source_ip, dest_ip } tuples only using a 
symmetric rss-key is important.  This works properly with all other Intel NICs 
in the DPDK thus far that we have tested until the i40E PMD with the Intel 
X710-DA4.  The Microsoft RSS specification allows for this.

With the i40E PMD, we have been unsuccessful at enabling this RSS 
configuration.  From the source code and XL710 controller datasheet, we cannot 
find any reference to the flags for this RSS mode.  Unless we can achieve 
feature parity with the other Intel NICs, we don't want to write special case 
code for this one driver which makes the XL710 controller unusable for us and 
seems contrary to the intent of the DPDK APIs which are abstracting this 
behavior.

Do you have any suggestions?

Thanks kindly,

Jeff

-Original Message-
From: Zhang, Helin [mailto:helin.zh...@intel.com] 
Sent: Wednesday, July 22, 2015 5:56 PM
To: Jeff Venable, Sr. ; lhffjzh ; 'Thomas Monjalon' 
Cc: dev at dpdk.org
Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network packet by 
i40e NIC



> -Original Message-
> From: Jeff Venable, Sr. [mailto:jeff at vectranetworks.com]
> Sent: Wednesday, July 22, 2015 5:47 PM
> To: Zhang, Helin; lhffjzh; 'Thomas Monjalon'
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network 
> packet by i40e NIC
> 
> Is the I40E incapable of operating RSS with ETH_RSS_IP (i.e. hashing 
> without L4 ports)?
Why do you think like this? Sorry, I am a bit confused.
ETH_RSS_IP is a super set of all IP based rss types. Please see the rss types 
listed in rte_ethdev.h.
The supports rss types of each NIC can be queried via 'struct rte_eth_dev_info' 
of field 'flow_type_rss_offloads'.

Regards,
Helin

> 
> Thanks,
> 
> Jeff
> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhang, Helin
> Sent: Saturday, February 28, 2015 6:34 AM
> To: lhffjzh; 'Thomas Monjalon'
> Cc: dev at dpdk.org; maintainers at dpdk.org
> Subject: Re: [dpdk-dev] Why only rx queue "0" can receive network 
> packet by i40e NIC
> 
> Good to know that!
> 
> > -Original Message-
> > From: lhffjzh [mailto:lhffjzh at 126.com]
> > Sent: Saturday, February 28, 2015 12:34 PM
> > To: Zhang, Helin; 'Thomas Monjalon'
> > Cc: dev at dpdk.org; maintainers at dpdk.org
> > Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network 
> > packet by i40e NIC
> >
> > Hi Helin,
> >
> > Thanks a lot for your great help, all of rx queue received network 
> > packet after I update rss_hf from "ETH_RSS_IP" to " ETH_RSS_PROTO_MASK ".
> >
> > static struct rte_eth_conf port_conf = {
> > .rxmode = {
> > .mq_mode= ETH_MQ_RX_RSS,
> > .max_rx_pkt_len = ETHER_MAX_LEN,
> > .split_hdr_size = 0,
> > .header_split   = 0, /**< Header Split disabled */
> > .hw_ip_checksum = 1, /**< IP checksum offload enabled */
> > .hw_vlan_filter = 0, /**< VLAN filtering disabled */
> > .jumbo_frame= 0, /**< Jumbo Frame Support disabled */
> > .hw_strip_crc   = 0, /**< CRC stripped by hardware */
> > },
> > .rx_adv_conf = {
> > .rss_conf = {
> > .rss_key = NULL,
> > .rss_hf = ETH_RSS_PROTO_MASK,
> > },
> > },
> > .txmode = {
> > .mq_mode = ETH_MQ_TX_NONE,
> > },
> > .fdir_conf.mode = RTE_FDIR_MODE_SIGNATURE, };
> >
> >
> > Regards,
> > Haifeng
> >
> > -Original Message-
> > From: Zhang, Helin [mailto:helin.zhang at intel.com]
> > Sent: Saturday, February 28, 2015 11:18 AM
> > To: lhffjzh; 'Thomas Monjalon'
> > Cc: dev at dpdk.org; maintainers at dpdk.org
> > Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network 
> > packet by i40e NIC
> >
> > Hi Haifeng
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of lhffjzh
> > > Sent: Saturday, February 28, 2015 9:48 AM
> > > To: 'Thomas Monjalon'
> > > Cc: dev at dpdk.org; maintainers at dpdk.org
> > > Subject: Re: [dpdk-dev] Why only rx queue "0" can receive network 
> > > packet
> > by
> > > i40e NIC
> > >
> > > Hi Thomas,
> > >
> > > Thanks very much for your reminder, you give me many help in this 
> > > mail
> > list.
> > >
> > > The issue with detailed information just as below. but I don't 
> > > know who is
> > the
> > > dpdk i40e maintainers? is maintainers at dpdk.org?
> > >
> > > Hardware list:
> > > 2 i40e 40G NICs
> > > Xeon E5-2670 v2(10 cores)
> > > 32G memory
> > >
> > > I loopback 2 i40e NICs by QSFP cable, one NIC send UDP network 
> > > packet by DPDK

[dpdk-dev] [PATCH] e1000: fix the issue of wrongly reporting descriptor done

2015-07-30 Thread Wenzhuo Lu
Header buffer address for header split will be filled with the physical
address for DMA, which is actually not needed at all, as header split
hasn't been supported. Hardware requires the least bit of header address
which is 'Descriptor Done' bit when write back should be set to 0 by driver.
The issue is that if the user wants to reserve an odd number of bytes between
the mbuf header and data buffer, the physical address to be filled in the
descriptor would happen to be odd. That means the DD bit would be set to
non-zero by driver. That will result in reporting descriptor done wrongly.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/e1000/igb_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 3a31b21..b13930e 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -851,7 +851,7 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rxe->mbuf = nmb;
dma_addr =
rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
-   rxdp->read.hdr_addr = dma_addr;
+   rxdp->read.hdr_addr = 0;
rxdp->read.pkt_addr = dma_addr;

/*
@@ -1040,7 +1040,7 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct 
rte_mbuf **rx_pkts,
rxe->mbuf = nmb;
dma = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
rxdp->read.pkt_addr = dma;
-   rxdp->read.hdr_addr = dma;
+   rxdp->read.hdr_addr = 0;

/*
 * Set data length & data buffer address of mbuf.
@@ -1990,7 +1990,7 @@ igb_alloc_rx_queue_mbufs(struct igb_rx_queue *rxq)
dma_addr =
rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
rxd = &rxq->rx_ring[i];
-   rxd->read.hdr_addr = dma_addr;
+   rxd->read.hdr_addr = 0;
rxd->read.pkt_addr = dma_addr;
rxe[i].mbuf = mbuf;
}
-- 
1.9.3



[dpdk-dev] [PATCH v2] lpm: fix extended flag check when adding a "depth small" entry

2015-07-30 Thread Zhe Tao
When adding a "depth small" entry, if its extended flag is not set and 
its depth is smaller than the one in the tbl24, nothing should be done 
otherwise will operate on the wrong memory area.

Signed-off-by: Zhe Tao 
---
PATCH v2: Edit to keep line size below 80 characters

PATCH v1: Fix extended flag check when adding a "depth small" entry

 lib/librte_lpm/rte_lpm.c | 54 +++-
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index de05307..163ba3c 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -442,35 +442,41 @@ add_depth_small(struct rte_lpm *lpm, uint32_t ip, uint8_t 
depth,
};

/* Setting tbl24 entry in one go to avoid race
-* conditions */
+* conditions
+*/
lpm->tbl24[i] = new_tbl24_entry;

continue;
}

-   /* If tbl24 entry is valid and extended calculate the index
-* into tbl8. */
-   tbl8_index = lpm->tbl24[i].tbl8_gindex *
-   RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
-   tbl8_group_end = tbl8_index + RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
-
-   for (j = tbl8_index; j < tbl8_group_end; j++) {
-   if (!lpm->tbl8[j].valid ||
-   lpm->tbl8[j].depth <= depth) {
-   struct rte_lpm_tbl8_entry new_tbl8_entry = {
-   .valid = VALID,
-   .valid_group = VALID,
-   .depth = depth,
-   .next_hop = next_hop,
-   };
-
-   /*
-* Setting tbl8 entry in one go to avoid race
-* conditions
-*/
-   lpm->tbl8[j] = new_tbl8_entry;
-
-   continue;
+   if (lpm->tbl24[i].ext_entry == 1) {
+   /* If tbl24 entry is valid and extended calculate the
+*  index into tbl8.
+*/
+   tbl8_index = lpm->tbl24[i].tbl8_gindex *
+   RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
+   tbl8_group_end = tbl8_index +
+   RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
+
+   for (j = tbl8_index; j < tbl8_group_end; j++) {
+   if (!lpm->tbl8[j].valid ||
+   lpm->tbl8[j].depth <= depth) {
+   struct rte_lpm_tbl8_entry
+   new_tbl8_entry = {
+   .valid = VALID,
+   .valid_group = VALID,
+   .depth = depth,
+   .next_hop = next_hop,
+   };
+
+   /*
+* Setting tbl8 entry in one go to avoid
+* race conditions
+*/
+   lpm->tbl8[j] = new_tbl8_entry;
+
+   continue;
+   }
}
}
}
-- 
1.9.3



[dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_fdir_filter

2015-07-30 Thread Liang, Cunming


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jingjing Wu
> Sent: Monday, July 20, 2015 3:04 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_fdir_filter
> 
> To fix the FVL's flow director issue for SCTP flow, rte_eth_fdir_filter
> need to be change to support SCTP flow keys extension. Here announce
> the ABI deprecation.
> 
> Signed-off-by: jingjing.wu 
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 5330d3b..63e19c7 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -35,3 +35,7 @@ Deprecation Notices
>  * The following fields have been deprecated in rte_eth_stats:
>imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
>tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff
> +
> +* Significant ABI change is planned for struct rte_eth_fdir_filter to extend
> +  the SCTP flow's key input from release 2.1. The change may be enabled in
> +  the upcoming release 2.1 with CONFIG_RTE_NEXT_ABI.
> --
> 2.4.0

Acked-by: Cunming Liang 



[dpdk-dev] [PATCH v2] lpm: fix extended flag check when adding a "depth small" entry

2015-07-30 Thread Liang, Cunming


> -Original Message-
> From: Tao, Zhe
> Sent: Thursday, July 30, 2015 11:19 AM
> To: dev at dpdk.org
> Cc: Tao, Zhe; Liang, Cunming; Richardson, Bruce
> Subject: [dpdk-dev][PATCH v2] lpm: fix extended flag check when adding a
> "depth small" entry
> 
> When adding a "depth small" entry, if its extended flag is not set and
> its depth is smaller than the one in the tbl24, nothing should be done
> otherwise will operate on the wrong memory area.
> 
> Signed-off-by: Zhe Tao 
> ---
> PATCH v2: Edit to keep line size below 80 characters
> 
> PATCH v1: Fix extended flag check when adding a "depth small" entry
> 
>  lib/librte_lpm/rte_lpm.c | 54 +++---
> --
>  1 file changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index de05307..163ba3c 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -442,35 +442,41 @@ add_depth_small(struct rte_lpm *lpm, uint32_t ip,
> uint8_t depth,
>   };
> 
>   /* Setting tbl24 entry in one go to avoid race
> -  * conditions */
> +  * conditions
> +  */
>   lpm->tbl24[i] = new_tbl24_entry;
> 
>   continue;
>   }
> 
> - /* If tbl24 entry is valid and extended calculate the index
> -  * into tbl8. */
> - tbl8_index = lpm->tbl24[i].tbl8_gindex *
> - RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> - tbl8_group_end = tbl8_index +
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> -
> - for (j = tbl8_index; j < tbl8_group_end; j++) {
> - if (!lpm->tbl8[j].valid ||
> - lpm->tbl8[j].depth <= depth) {
> - struct rte_lpm_tbl8_entry new_tbl8_entry = {
> - .valid = VALID,
> - .valid_group = VALID,
> - .depth = depth,
> - .next_hop = next_hop,
> - };
> -
> - /*
> -  * Setting tbl8 entry in one go to avoid race
> -  * conditions
> -  */
> - lpm->tbl8[j] = new_tbl8_entry;
> -
> - continue;
> + if (lpm->tbl24[i].ext_entry == 1) {
> + /* If tbl24 entry is valid and extended calculate the
> +  *  index into tbl8.
> +  */
> + tbl8_index = lpm->tbl24[i].tbl8_gindex *
> + RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> + tbl8_group_end = tbl8_index +
> + RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> +
> + for (j = tbl8_index; j < tbl8_group_end; j++) {
> + if (!lpm->tbl8[j].valid ||
> + lpm->tbl8[j].depth <= depth) {
> + struct rte_lpm_tbl8_entry
> + new_tbl8_entry = {
> + .valid = VALID,
> + .valid_group = VALID,
> + .depth = depth,
> + .next_hop = next_hop,
> + };
> +
> + /*
> +  * Setting tbl8 entry in one go to avoid
> +  * race conditions
> +  */
> + lpm->tbl8[j] = new_tbl8_entry;
> +
> + continue;
> + }
>   }
>   }
>   }
> --
> 1.9.3

Acked-by: Cunming Liang 



[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode

2015-07-30 Thread Cunming Liang
The patch announces the planned ABI changes for interrupt mode.

Signed-off-by: Cunming Liang 
---
 v3 change:
   - reword for CONFIG_RTE_NEXT_ABI

 v2 change:
   - rebase to recent master

 doc/guides/rel_notes/deprecation.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 5330d3b..d36d267 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,3 +35,8 @@ Deprecation Notices
 * The following fields have been deprecated in rte_eth_stats:
   imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
   tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff
+
+* The ABI changes are planned for struct rte_intr_handle, struct rte_eth_conf
+  and struct eth_dev_ops to support interrupt mode feature from release 2.1.
+  Those changes may be enabled in the upcoming release 2.1
+  with CONFIG_RTE_NEXT_ABI.
-- 
1.8.1.4



[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode

2015-07-30 Thread Liu, Yong
Acked-by: Marvin Liu 

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, July 30, 2015 1:05 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode
> 
> The patch announces the planned ABI changes for interrupt mode.
> 
> Signed-off-by: Cunming Liang 
> ---
>  v3 change:
>- reword for CONFIG_RTE_NEXT_ABI
> 
>  v2 change:
>- rebase to recent master
> 
>  doc/guides/rel_notes/deprecation.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 5330d3b..d36d267 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -35,3 +35,8 @@ Deprecation Notices
>  * The following fields have been deprecated in rte_eth_stats:
>imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
>tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff
> +
> +* The ABI changes are planned for struct rte_intr_handle, struct
> rte_eth_conf
> +  and struct eth_dev_ops to support interrupt mode feature from release
> 2.1.
> +  Those changes may be enabled in the upcoming release 2.1
> +  with CONFIG_RTE_NEXT_ABI.
> --
> 1.8.1.4



[dpdk-dev] [PATCH] app test: fix mempool cache_size not match limited cache_size

2015-07-30 Thread Wu, Jingjing


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yong Liu
> Sent: Wednesday, July 29, 2015 11:22 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] app test: fix mempool cache_size not match
> limited cache_size
> 
> From: Marvin Liu 
> 
> In previous setting, mempool size and cache_size are both 32.
> This is not satisfied with cache_size checking rule by now.
> Cache size should less than CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE and
> mempool size / 1.5.
> 
> Signed-off-by: Marvin Liu 
> 
Acked-by: Jingjing Wu 

> diff --git a/app/test/test_sched.c b/app/test/test_sched.c index
> 1ef6910..7a38db3 100644
> --- a/app/test/test_sched.c
> +++ b/app/test/test_sched.c
> @@ -87,7 +87,7 @@ static struct rte_sched_port_params port_param = {
> 
>  #define NB_MBUF  32
>  #define MBUF_DATA_SZ (2048 + RTE_PKTMBUF_HEADROOM)
> -#define PKT_BURST_SZ 32
> +#define PKT_BURST_SZ 0
>  #define MEMPOOL_CACHE_SZ PKT_BURST_SZ
>  #define SOCKET   0
> 
> --
> 1.9.3



[dpdk-dev] [PATCH] app test: fix eal --no-huge option should work with -m option

2015-07-30 Thread Wu, Jingjing


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yong Liu
> Sent: Wednesday, July 29, 2015 12:38 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] app test: fix eal --no-huge option should work
> with -m option
> 
> From: Marvin Liu 
> 
> '--no-huge' option now can workable with -m option.
> Unit test for eal flag should change pass criterion.
> 
> Signed-off-by: Marvin Liu 
>
Acked-by: Jingjing Wu 

> diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c index
> 0352f87..e6f7035 100644
> --- a/app/test/test_eal_flags.c
> +++ b/app/test/test_eal_flags.c
> @@ -748,8 +748,8 @@ test_no_hpet_flag(void)  }
> 
>  /*
> - * Test that the app runs with --no-huge and doesn't run when either
> - * -m or --socket-mem are specified with --no-huge.
> + * Test that the app runs with --no-huge and doesn't run when
> + --socket-mem are
> + * specified with --no-huge.
>   */
>  static int
>  test_no_huge_flag(void)
> @@ -778,8 +778,8 @@ test_no_huge_flag(void)
>   printf("Error - process did not run ok with --no-huge flag\n");
>   return -1;
>   }
> - if (launch_proc(argv2) == 0) {
> - printf("Error - process run ok with --no-huge and -m flags\n");
> + if (launch_proc(argv2) != 0) {
> + printf("Error - process did not run ok with --no-huge and -m
> +flags\n");
>   return -1;
>   }
>  #ifdef RTE_EXEC_ENV_BSDAPP
> --
> 1.9.3



[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

2015-07-30 Thread Olivier MATZ
Hi,

On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> Hi Martin
>
> Thank you very much for the good catch!
>
> The similar situation in i40e, as explained by Konstantin.
> As header split hasn't been supported by DPDK till now. It would be better to 
> put the header address in RX descriptor to 0.
> But in the future, during header split enabling. We may need to pay extra 
> attention to that. As at least x710 datasheet said specifically as below.
> "The header address should be set by the software to an even number (word 
> aligned address)". We may need to find a way to ensure that during 
> mempool/mbuf allocation.

Indeed it would be good to force the priv_size to be aligned.

The priv_size could be aligned automatically in
rte_pktmbuf_pool_create(). The only possible problem I could see
is that it would break applications that access to the data buffer
by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
best thing to do (I didn't find any applications like this in dpdk).

For applications that directly use rte_mempool_create() instead of
rte_pktmbuf_pool_create(), we could add a check using an assert in 
rte_pktmbuf_init() and some words in the documentation.

The question is: what should be the proper alignment? I would say
at least 8 bytes, but maybe cache_aligned is an option too.

Regards,
Olivier


>
> Regards,
> Helin
>
>> -Original Message-
>> From: Ananyev, Konstantin
>> Sent: Wednesday, July 29, 2015 11:12 AM
>> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
>> Cc: dev at dpdk.org
>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
>> mbuf
>> private area size is odd
>>
>> Hi Martin,
>>
>>> -Original Message-
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
>>> Sent: Wednesday, July 29, 2015 4:07 PM
>>> To: Zhang, Helin; olivier.matz at 6wind.com
>>> Cc: dev at dpdk.org
>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
>>> mbuf private area size is odd
>>>
>>> Hi Helin, Hi Olivier,
>>>
>>> we are seeing an issue with the ixgbe and i40e drivers which we could
>>> track down to our setting of the private area size of the mbufs.
>>> The issue can be easily reproduced with the l2fwd example application
>>> when a small modification is done: just set the priv_size parameter in
>>> the call to the rte_pktmbuf_pool_create function to an odd number like
>>> 1. In our tests this causes every call to rte_eth_rx_burst to return
>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
>>> packets are received on the interface and the hardware counters do not
>>> report any received packets.
>>
>>  From Niantic datasheet:
>>
>> "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the
>> advanced receive descriptor programming by the software. The ...
>> Packet Buffer Address (64)
>> This is the physical address of the packet buffer. The lowest bit is A0 (LSB 
>> of the
>> address).
>> Header Buffer Address (64)
>> The physical address of the header buffer with the lowest bit being 
>> Descriptor
>> Done (DD).
>> When a packet spans in multiple descriptors, the header buffer address is 
>> used
>> only on the first descriptor. During the programming phase, software must set
>> the DD bit to zero (see the description of the DD bit in this section). This 
>> means
>> that header buffer addresses are always word aligned."
>>
>> Right now, in ixgbe PMD we always setup  Packet Buffer Address (PBA)and
>> Header Buffer Address (HBA) to the same value:
>> buf_physaddr + RTE_PKTMBUF_HEADROOM
>> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and 
>> then
>> SW considers that HW already done with it.
>> In other words, right now for ixgbe you can't use RX buffer that is not 
>> aligned on
>> word boundary.
>>
>> So the advice would be, right now - don't set priv_size to the odd value.
>> As we don't support split header feature anyway, I think we can fix it just 
>> by
>> always setting HBA in the RXD to zero.
>> Could you try the fix for ixgbe below?
>>
>> Same story with FVL, I believe.
>> Konstantin
>>
>>
>>> Interestingly this does not happen if we force the scattered rx path.
>>>
>>> I assume the drivers have some expectations regarding the alignment of
>>> the buf_addr in the mbuf and setting an odd private are size breaks
>>> this alignment in the rte_pktmbuf_init function. If this is the case
>>> then one possible fix might be to enforce an alignment on the private area 
>>> size.
>>>
>>> Best regards,
>>> Martin
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>> index a0c8847..94967c5 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
>> reset_mbuf)
>>
>>  /* populate the descriptors */
>>  dma_addr =
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
>> -

[dpdk-dev] [PATCH v10 0/3] deduplicate EAL common functions

2015-07-30 Thread Olivier MATZ
Hi Thomas & Ravi,

On 07/27/2015 02:59 AM, Thomas Monjalon wrote:
> 2015-07-27 02:56, Thomas Monjalon:
>> v9 was a subset of previous deduplications by Ravi Kerur.
>> This v10 address the comments I've done on v9.
>>
>> Ravi Kerur (3):
>>eal: deduplicate lcore initialization
>>eal: deduplicate timer functions
>>eal: deduplicate memory initialization
>
> Applied shortly to integrate this old pending cleanup in RC2.
>

When I try to compile the dpdk for x86_x32-native-linuxapp-gcc , I
get the following compilation error:

   CC eal_common_timer.o
In file included from /usr/include/sys/sysctl.h:63:0,
  from 
/home/matz/dpdk-pkg-cron/dpdk.org/lib/librte_eal/common/eal_common_timer.c:39:
/usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is 
unsupported in x32 kernel"
  # error "sysctl system call is unsupported in x32 kernel"
^

Removing the "#include " line fixes the issue without
impacting the compilation. I think this include is not needed and
could be removed.
I can provide a patch if it's ok for you.

Regards,
Olivier



[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode

2015-07-30 Thread He, Shaopeng
Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, July 30, 2015 1:05 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3] doc: announce abi change for interrupt
> mode
> 
> The patch announces the planned ABI changes for interrupt mode.
> 
> Signed-off-by: Cunming Liang 
Acked-by: Shaopeng He 



[dpdk-dev] [PATCH] e1000: fix ieee1588 timestamp issue

2015-07-30 Thread Wenzhuo Lu
Ieee1588 reads system time to set its timestamp. On 1G NICs, for example,
i350, system time is disabled by default. It means the ieee1588 timestamp
will always be 0.
This patch enables system time when ieee1588 is enabled.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/e1000/igb_ethdev.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 56734a3..8fb67ac 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -3898,11 +3898,19 @@ eth_igb_set_mc_addr_list(struct rte_eth_dev *dev,
return 0;
 }

+#define E1000_TSAUXC_DISABLE_SYSTIME 0x8000
+
 static int
 igb_timesync_enable(struct rte_eth_dev *dev)
 {
struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
uint32_t tsync_ctl;
+   uint32_t tsauxc;
+
+   /* Enable system time for it isn't on by default. */
+   tsauxc = E1000_READ_REG(hw, E1000_TSAUXC);
+   tsauxc &= ~E1000_TSAUXC_DISABLE_SYSTIME;
+   E1000_WRITE_REG(hw, E1000_TSAUXC, tsauxc);

/* Start incrementing the register used to timestamp PTP packets. */
E1000_WRITE_REG(hw, E1000_TIMINCA, E1000_TIMINCA_INIT);
-- 
1.9.3



[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

2015-07-30 Thread Ananyev, Konstantin
Hi Olivier,

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 9:12 AM
> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
> mbuf private area size is odd
> 
> Hi,
> 
> On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> > Hi Martin
> >
> > Thank you very much for the good catch!
> >
> > The similar situation in i40e, as explained by Konstantin.
> > As header split hasn't been supported by DPDK till now. It would be better 
> > to put the header address in RX descriptor to 0.
> > But in the future, during header split enabling. We may need to pay extra 
> > attention to that. As at least x710 datasheet said
> specifically as below.
> > "The header address should be set by the software to an even number (word 
> > aligned address)". We may need to find a way to
> ensure that during mempool/mbuf allocation.
> 
> Indeed it would be good to force the priv_size to be aligned.
> 
> The priv_size could be aligned automatically in
> rte_pktmbuf_pool_create(). The only possible problem I could see
> is that it would break applications that access to the data buffer
> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
> best thing to do (I didn't find any applications like this in dpdk).


Might be just make rte_pktmbuf_pool_create() fail if input priv_size % 
MIN_ALIGN != 0?

> 
> For applications that directly use rte_mempool_create() instead of
> rte_pktmbuf_pool_create(), we could add a check using an assert in
> rte_pktmbuf_init() and some words in the documentation.
> 
> The question is: what should be the proper alignment? I would say
> at least 8 bytes, but maybe cache_aligned is an option too.

8 bytes seems enough to me.

Konstantin 

> 
> Regards,
> Olivier
> 
> 
> >
> > Regards,
> > Helin
> >
> >> -Original Message-
> >> From: Ananyev, Konstantin
> >> Sent: Wednesday, July 29, 2015 11:12 AM
> >> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
> >> Cc: dev at dpdk.org
> >> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
> >> mbuf
> >> private area size is odd
> >>
> >> Hi Martin,
> >>
> >>> -Original Message-
> >>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
> >>> Sent: Wednesday, July 29, 2015 4:07 PM
> >>> To: Zhang, Helin; olivier.matz at 6wind.com
> >>> Cc: dev at dpdk.org
> >>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
> >>> mbuf private area size is odd
> >>>
> >>> Hi Helin, Hi Olivier,
> >>>
> >>> we are seeing an issue with the ixgbe and i40e drivers which we could
> >>> track down to our setting of the private area size of the mbufs.
> >>> The issue can be easily reproduced with the l2fwd example application
> >>> when a small modification is done: just set the priv_size parameter in
> >>> the call to the rte_pktmbuf_pool_create function to an odd number like
> >>> 1. In our tests this causes every call to rte_eth_rx_burst to return
> >>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
> >>> packets are received on the interface and the hardware counters do not
> >>> report any received packets.
> >>
> >>  From Niantic datasheet:
> >>
> >> "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the
> >> advanced receive descriptor programming by the software. The ...
> >> Packet Buffer Address (64)
> >> This is the physical address of the packet buffer. The lowest bit is A0 
> >> (LSB of the
> >> address).
> >> Header Buffer Address (64)
> >> The physical address of the header buffer with the lowest bit being 
> >> Descriptor
> >> Done (DD).
> >> When a packet spans in multiple descriptors, the header buffer address is 
> >> used
> >> only on the first descriptor. During the programming phase, software must 
> >> set
> >> the DD bit to zero (see the description of the DD bit in this section). 
> >> This means
> >> that header buffer addresses are always word aligned."
> >>
> >> Right now, in ixgbe PMD we always setup  Packet Buffer Address (PBA)and
> >> Header Buffer Address (HBA) to the same value:
> >> buf_physaddr + RTE_PKTMBUF_HEADROOM
> >> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and 
> >> then
> >> SW considers that HW already done with it.
> >> In other words, right now for ixgbe you can't use RX buffer that is not 
> >> aligned on
> >> word boundary.
> >>
> >> So the advice would be, right now - don't set priv_size to the odd value.
> >> As we don't support split header feature anyway, I think we can fix it 
> >> just by
> >> always setting HBA in the RXD to zero.
> >> Could you try the fix for ixgbe below?
> >>
> >> Same story with FVL, I believe.
> >> Konstantin
> >>
> >>
> >>> Interestingly this does not happen if we force the scattered rx path.
> >>>
> >>> I assume the drivers have some expectations regarding the alignment of
> >>> the buf_add

[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

2015-07-30 Thread Olivier MATZ
Hi Konstantin,

On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote:
> Hi Olivier,
>
>> -Original Message-
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Thursday, July 30, 2015 9:12 AM
>> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
>> mbuf private area size is odd
>>
>> Hi,
>>
>> On 07/29/2015 10:24 PM, Zhang, Helin wrote:
>>> Hi Martin
>>>
>>> Thank you very much for the good catch!
>>>
>>> The similar situation in i40e, as explained by Konstantin.
>>> As header split hasn't been supported by DPDK till now. It would be better 
>>> to put the header address in RX descriptor to 0.
>>> But in the future, during header split enabling. We may need to pay extra 
>>> attention to that. As at least x710 datasheet said
>> specifically as below.
>>> "The header address should be set by the software to an even number (word 
>>> aligned address)". We may need to find a way to
>> ensure that during mempool/mbuf allocation.
>>
>> Indeed it would be good to force the priv_size to be aligned.
>>
>> The priv_size could be aligned automatically in
>> rte_pktmbuf_pool_create(). The only possible problem I could see
>> is that it would break applications that access to the data buffer
>> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
>> best thing to do (I didn't find any applications like this in dpdk).
>
>
> Might be just make rte_pktmbuf_pool_create() fail if input priv_size % 
> MIN_ALIGN != 0?

Hmm maybe it would break more applications: an odd priv_size is
probably rare, but a priv_size that is not aligned to 8 bytes is
maybe more common.
It's maybe safer to align the size transparently?


Regards,
Olivier



>
>>
>> For applications that directly use rte_mempool_create() instead of
>> rte_pktmbuf_pool_create(), we could add a check using an assert in
>> rte_pktmbuf_init() and some words in the documentation.
>>
>> The question is: what should be the proper alignment? I would say
>> at least 8 bytes, but maybe cache_aligned is an option too.
>
> 8 bytes seems enough to me.
>
> Konstantin
>
>>
>> Regards,
>> Olivier
>>
>>
>>>
>>> Regards,
>>> Helin
>>>
 -Original Message-
 From: Ananyev, Konstantin
 Sent: Wednesday, July 29, 2015 11:12 AM
 To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
 Cc: dev at dpdk.org
 Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
 mbuf
 private area size is odd

 Hi Martin,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
> Sent: Wednesday, July 29, 2015 4:07 PM
> To: Zhang, Helin; olivier.matz at 6wind.com
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
> mbuf private area size is odd
>
> Hi Helin, Hi Olivier,
>
> we are seeing an issue with the ixgbe and i40e drivers which we could
> track down to our setting of the private area size of the mbufs.
> The issue can be easily reproduced with the l2fwd example application
> when a small modification is done: just set the priv_size parameter in
> the call to the rte_pktmbuf_pool_create function to an odd number like
> 1. In our tests this causes every call to rte_eth_rx_burst to return
> 32 (which is the setting of nb_pkts) nonsense mbufs although no
> packets are received on the interface and the hardware counters do not
> report any received packets.

   From Niantic datasheet:

 "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the
 advanced receive descriptor programming by the software. The ...
 Packet Buffer Address (64)
 This is the physical address of the packet buffer. The lowest bit is A0 
 (LSB of the
 address).
 Header Buffer Address (64)
 The physical address of the header buffer with the lowest bit being 
 Descriptor
 Done (DD).
 When a packet spans in multiple descriptors, the header buffer address is 
 used
 only on the first descriptor. During the programming phase, software must 
 set
 the DD bit to zero (see the description of the DD bit in this section). 
 This means
 that header buffer addresses are always word aligned."

 Right now, in ixgbe PMD we always setup  Packet Buffer Address (PBA)and
 Header Buffer Address (HBA) to the same value:
 buf_physaddr + RTE_PKTMBUF_HEADROOM
 So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and 
 then
 SW considers that HW already done with it.
 In other words, right now for ixgbe you can't use RX buffer that is not 
 aligned on
 word boundary.

 So the advice would be, right now - don't set priv_size to the odd value.
 As we don't support split header feature anyway, I think we can fix it 
 just by
>>>

[dpdk-dev] [PATCH v1 1/1] ixgbe: Fix oerrors by setting it to 0

2015-07-30 Thread Tahhan, Maryam
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, July 30, 2015 12:20 AM
> To: Tahhan, Maryam
> Cc: dev at dpdk.org; Ananyev, Konstantin
> Subject: Re: [dpdk-dev] [PATCH v1 1/1] ixgbe: Fix oerrors by setting it to 0
> 
> > > Fix afebc86be1346136125af8026dc215f81c202c50. oerrors was txdgpc -
> > > hw_stats->gptc, txdgpc is the number of packets DMA'ed by the host
> > > and was being reset on every call to read stats so it could be < gptc.
> > > Because we currently have no way to add txdgpc to struct hw_stats so
> > > that we can maintain a persistent value per port oerrors has now
> > > been set to 0. References to txdgpc is now removed as we don't use
> > > it. This patch also removes rxnfgpc as it's not used anywhere.
> > >
> > > Signed-off-by: Maryam Tahhan 
> > Acked-by: Konstantin Ananyev 
> 
> Applied, thanks
> 
> It's a bit sad.
> Is it a consequence of forbidding updates in the base driver?

Yes, that's exactly it.

In the meantime I'm going to look at/investigate another way to allow us to 
maintain additional (anything not in struct hw_stats) per port stats/registers 
in addition to the base driver. 

All the best
Maryam


[dpdk-dev] abi change announce

2015-07-30 Thread Xie, Huawei
Hi Thomas:
I am doing virtio/vhost performance optimization, so there is possibly
some change, for example to virtio or vhost virtqueue data structure.
Do i need to announce the ABI change even if the change hasn't been
determined?

/huawei



[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

2015-07-30 Thread Ananyev, Konstantin


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 10:10 AM
> To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
> mbuf private area size is odd
> 
> Hi Konstantin,
> 
> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote:
> > Hi Olivier,
> >
> >> -Original Message-
> >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> >> Sent: Thursday, July 30, 2015 9:12 AM
> >> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
> >> Cc: dev at dpdk.org
> >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
> >> mbuf private area size is odd
> >>
> >> Hi,
> >>
> >> On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> >>> Hi Martin
> >>>
> >>> Thank you very much for the good catch!
> >>>
> >>> The similar situation in i40e, as explained by Konstantin.
> >>> As header split hasn't been supported by DPDK till now. It would be 
> >>> better to put the header address in RX descriptor to 0.
> >>> But in the future, during header split enabling. We may need to pay extra 
> >>> attention to that. As at least x710 datasheet said
> >> specifically as below.
> >>> "The header address should be set by the software to an even number (word 
> >>> aligned address)". We may need to find a way to
> >> ensure that during mempool/mbuf allocation.
> >>
> >> Indeed it would be good to force the priv_size to be aligned.
> >>
> >> The priv_size could be aligned automatically in
> >> rte_pktmbuf_pool_create(). The only possible problem I could see
> >> is that it would break applications that access to the data buffer
> >> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
> >> best thing to do (I didn't find any applications like this in dpdk).
> >
> >
> > Might be just make rte_pktmbuf_pool_create() fail if input priv_size % 
> > MIN_ALIGN != 0?
> 
> Hmm maybe it would break more applications: an odd priv_size is
> probably rare, but a priv_size that is not aligned to 8 bytes is
> maybe more common.

My thought was that rte_mempool_create() was just introduced in 2.1,
so if we add extra requirement for the input parameter now -
there would be no ABI breakage, and not many people started to use it already. 
For me just seems a bit easier and more straightforward then silent alignment -
user would not have wrong assumptions here.
Though if you think that a silent alignment would be more convenient
for most users - I wouldn't insist.
Konstantin

> It's maybe safer to align the size transparently?
> 
> 
> Regards,
> Olivier
> 
> 
> 
> >
> >>
> >> For applications that directly use rte_mempool_create() instead of
> >> rte_pktmbuf_pool_create(), we could add a check using an assert in
> >> rte_pktmbuf_init() and some words in the documentation.
> >>
> >> The question is: what should be the proper alignment? I would say
> >> at least 8 bytes, but maybe cache_aligned is an option too.
> >
> > 8 bytes seems enough to me.
> >
> > Konstantin
> >
> >>
> >> Regards,
> >> Olivier
> >>
> >>
> >>>
> >>> Regards,
> >>> Helin
> >>>
>  -Original Message-
>  From: Ananyev, Konstantin
>  Sent: Wednesday, July 29, 2015 11:12 AM
>  To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
>  Cc: dev at dpdk.org
>  Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e 
>  when mbuf
>  private area size is odd
> 
>  Hi Martin,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
> > Sent: Wednesday, July 29, 2015 4:07 PM
> > To: Zhang, Helin; olivier.matz at 6wind.com
> > Cc: dev at dpdk.org
> > Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
> > mbuf private area size is odd
> >
> > Hi Helin, Hi Olivier,
> >
> > we are seeing an issue with the ixgbe and i40e drivers which we could
> > track down to our setting of the private area size of the mbufs.
> > The issue can be easily reproduced with the l2fwd example application
> > when a small modification is done: just set the priv_size parameter in
> > the call to the rte_pktmbuf_pool_create function to an odd number like
> > 1. In our tests this causes every call to rte_eth_rx_burst to return
> > 32 (which is the setting of nb_pkts) nonsense mbufs although no
> > packets are received on the interface and the hardware counters do not
> > report any received packets.
> 
>    From Niantic datasheet:
> 
>  "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the
>  advanced receive descriptor programming by the software. The ...
>  Packet Buffer Address (64)
>  This is the physical address of the packet buffer. The lowest bit is A0 
>  (LSB of the
>  address).
>  Header Buffer Address (64)
>  The physical address of the header bu

[dpdk-dev] [PATCH] e1000: fix ieee1588 timestamp issue

2015-07-30 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> Sent: Thursday, July 30, 2015 9:34 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] e1000: fix ieee1588 timestamp issue
> 
> Ieee1588 reads system time to set its timestamp. On 1G NICs, for example,
> i350, system time is disabled by default. It means the ieee1588 timestamp
> will always be 0.
> This patch enables system time when ieee1588 is enabled.

Looks good.


> +#define E1000_TSAUXC_DISABLE_SYSTIME 0x8000

Probably best to move this to the top of the file with the other timesync 
defines.


I wonder if this would also fix the following known issue with i210 timesyncing 
from the release notes:

http://dpdk.org/doc/guides/rel_notes/known_issues.html#ieee1588-support-possibly-not-working-with-an-intel-ethernet-controller-i210-nic

I don't have an i210 NIC to test but perhaps someone could verify it.

John



[dpdk-dev] abi change announce

2015-07-30 Thread Thomas Monjalon
2015-07-30 09:25, Xie, Huawei:
> Hi Thomas:
> I am doing virtio/vhost performance optimization, so there is possibly
> some change, for example to virtio or vhost virtqueue data structure.
> Do i need to announce the ABI change even if the change hasn't been
> determined?

I have no strong opinion.
It seems strange to announce something which is not known.
You may be able to introduce your change without previous notice by using
NEXT_ABI if not too invasive.

Neil, an opinion?


[dpdk-dev] abi change announce

2015-07-30 Thread Neil Horman
On Thu, Jul 30, 2015 at 12:18:41PM +0200, Thomas Monjalon wrote:
> 2015-07-30 09:25, Xie, Huawei:
> > Hi Thomas:
> > I am doing virtio/vhost performance optimization, so there is possibly
> > some change, for example to virtio or vhost virtqueue data structure.
> > Do i need to announce the ABI change even if the change hasn't been
> > determined?
> 
> I have no strong opinion.
> It seems strange to announce something which is not known.
> You may be able to introduce your change without previous notice by using
> NEXT_ABI if not too invasive.
> 
> Neil, an opinion?
> 

Given the process, you can't announce the change until you know what it is,
since you need to detail in the announcement what the change is going to be.

We have no method to reserve an 'ABI break to be determined later', nor should
we.  Write the code, then we figure out if ABI needs to change and there is a
need to announce.

Neil



[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

2015-07-30 Thread Olivier MATZ
Hi,

On 07/30/2015 11:43 AM, Ananyev, Konstantin wrote:
>
>
>> -Original Message-
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Thursday, July 30, 2015 10:10 AM
>> To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
>> mbuf private area size is odd
>>
>> Hi Konstantin,
>>
>> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote:
>>> Hi Olivier,
>>>
 -Original Message-
 From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
 Sent: Thursday, July 30, 2015 9:12 AM
 To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
 Cc: dev at dpdk.org
 Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when 
 mbuf private area size is odd

 Hi,

 On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> Hi Martin
>
> Thank you very much for the good catch!
>
> The similar situation in i40e, as explained by Konstantin.
> As header split hasn't been supported by DPDK till now. It would be 
> better to put the header address in RX descriptor to 0.
> But in the future, during header split enabling. We may need to pay extra 
> attention to that. As at least x710 datasheet said
 specifically as below.
> "The header address should be set by the software to an even number (word 
> aligned address)". We may need to find a way to
 ensure that during mempool/mbuf allocation.

 Indeed it would be good to force the priv_size to be aligned.

 The priv_size could be aligned automatically in
 rte_pktmbuf_pool_create(). The only possible problem I could see
 is that it would break applications that access to the data buffer
 by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
 best thing to do (I didn't find any applications like this in dpdk).
>>>
>>>
>>> Might be just make rte_pktmbuf_pool_create() fail if input priv_size % 
>>> MIN_ALIGN != 0?
>>
>> Hmm maybe it would break more applications: an odd priv_size is
>> probably rare, but a priv_size that is not aligned to 8 bytes is
>> maybe more common.
>
> My thought was that rte_mempool_create() was just introduced in 2.1,
> so if we add extra requirement for the input parameter now -
> there would be no ABI breakage, and not many people started to use it already.
> For me just seems a bit easier and more straightforward then silent alignment 
> -
> user would not have wrong assumptions here.
> Though if you think that a silent alignment would be more convenient
> for most users - I wouldn't insist.


Yes, I agree on the principle, but it depends whether this fix
is integrated for 2.1 or not.
I think it may already be a bit late for that, especially as it
is not a very critical bug.

Thomas, what do you think?


Olivier




> Konstantin
>
>> It's maybe safer to align the size transparently?
>>
>>
>> Regards,
>> Olivier
>>
>>
>>
>>>

 For applications that directly use rte_mempool_create() instead of
 rte_pktmbuf_pool_create(), we could add a check using an assert in
 rte_pktmbuf_init() and some words in the documentation.

 The question is: what should be the proper alignment? I would say
 at least 8 bytes, but maybe cache_aligned is an option too.
>>>
>>> 8 bytes seems enough to me.
>>>
>>> Konstantin
>>>

 Regards,
 Olivier


>
> Regards,
> Helin
>
>> -Original Message-
>> From: Ananyev, Konstantin
>> Sent: Wednesday, July 29, 2015 11:12 AM
>> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
>> Cc: dev at dpdk.org
>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e 
>> when mbuf
>> private area size is odd
>>
>> Hi Martin,
>>
>>> -Original Message-
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
>>> Sent: Wednesday, July 29, 2015 4:07 PM
>>> To: Zhang, Helin; olivier.matz at 6wind.com
>>> Cc: dev at dpdk.org
>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
>>> mbuf private area size is odd
>>>
>>> Hi Helin, Hi Olivier,
>>>
>>> we are seeing an issue with the ixgbe and i40e drivers which we could
>>> track down to our setting of the private area size of the mbufs.
>>> The issue can be easily reproduced with the l2fwd example application
>>> when a small modification is done: just set the priv_size parameter in
>>> the call to the rte_pktmbuf_pool_create function to an odd number like
>>> 1. In our tests this causes every call to rte_eth_rx_burst to return
>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
>>> packets are received on the interface and the hardware counters do not
>>> report any received packets.
>>
>>From Niantic datasheet:
>>
>> "7.1.6.1 Advanced Receive Descriptors ? Read Forma

[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Jan Viktorin
Hi,

thanks for reply. I could see those docs but it does not help me a lot.
I still do not understand very well the principle of the tool. How it
chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and
testpmd. They have somehow opposite meaning. I can start testpmd now,
however, it does ot probe any NIC. I've tried -w to whitelist certain
NICs but with no success.

$ dpdk_nic_bind --status

Network devices using DPDK-compatible driver

:03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic 
unused=e1000
:03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic 
unused=e1000

Network devices using kernel driver
===
:00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e 
unused=uio_pci_generic *Active*

Other network devices
=


$ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 0 on socket 0
EAL: Detected lcore 3 as core 1 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 4 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0x3c0 bytes
EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fe973a0 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fe97360 (size = 0x20)
EAL: Ask a virtual area of 0x3c0 bytes
EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fe96f40 (size = 0x20)
EAL: Ask a virtual area of 0x20 bytes
EAL: Virtual area found at 0x7fe96f00 (size = 0x20)
EAL: Requesting 64 pages of size 2MB from socket 0
EAL: TSC frequency is ~368 KHz
EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0])
EAL: lcore 1 is ready (tid=6efff700;cpuset=[1])
EAL: No probed ethernet devices
Interactive-mode selected
Done
testpmd>

Thanks
Jan Viktorin

On Wed, 29 Jul 2015 12:09:06 +0300
ciprian.barbu  wrote:

> 
> 
> On 28.07.2015 21:13, Jan Viktorin wrote:
> > Hello all,
> >
> > I am learning how to measure throughput with dpdk. I have 4 cores
> > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs connected
> > together. I do not understand very well, how to setup testpmd.
> 
> http://dpdk.org/doc
> http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html
> http://dpdk.org/doc/quick-start
> 
> >
> > I've successfully bound the NICs to dpdk:
> >
> > $ dpdk_nic_bind --status
> >
> > Network devices using DPDK-compatible driver
> > 
> > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic 
> > unused=e1000
> > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic 
> > unused=e1000
> >
> > Network devices using kernel driver
> > ===
> > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e 
> > unused=uio_pci_generic *Active*
> >
> > Other network devices
> > =
> > 
> >
> > and then I tried to run testpmd:
> >
> > sudo ./testpmd -b :03:00.0 -b :03:02.0 -c 0xf -n2 -- --nb-cores=1 
> > --nb-ports=0 --rxd=2048 --txd=2048 --mbcache=512 --burst=512
> 
> http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html#testpmd-command-line-options
> 
> The -b option black lists your PCI devices, you don't need those. The 
> --nb-ports is of course the number of ports, it cannot be 0.
> 
> > ...
> > EAL: Ask a virtual area of 0x40 bytes
> > EAL: Virtual area found at 0x7f154980 (size = 0x40)
> > EAL: Requesting 1024 pages of size 2MB from socket 0
> > EAL: TSC frequency is ~369 KHz
> > EAL: Master lcore 0 is ready (tid=de94a8c0;cpuset=[0])
> > EAL: lcore 2 is ready (tid=487fd700;cpuset=[2])
> > EAL: lcore 3 is ready (tid=47ffc700;cpuset=[3])
> > EAL: lcore 1 is ready (tid=48ffe700;cpuset=[1])
> > EAL: No probed ethernet devices
> > EAL: Error - exiting with code: 1
> >Cause: Invalid port 0
> >
> > I tried --nb-ports={0,1,2} but neither of them works. BTW, what does this 
> > option it mean? :)
> > I could not find any description in the docs nor in the help (maybe I've 
> > omitted something).
> >
> >
> > Well, if I manage the testpmd to work I need a packet generator, right? 
> > I've downloaded
> > the dpdk-pktgen. And I am lost again. How can I start it?
> >
> > After several attempts (mostly trying to use the pktgen-master/slave.sh, 
> > what is
> > their purpose?), the most "successful" output was:
> >
> > ...
> > EAL: Detected lcore 0 as core 0 on socket 03 handles port 1 rx & core 4 
> > handles port 0-7 tx
> > EAL: Detected lcore 1 as core 1 on socket 0 as it does not matter to the 
> > syntax.
> > EAL: Detected lcore 2 as core 0 on soc

[dpdk-dev] [PATCH v10 0/3] deduplicate EAL common functions

2015-07-30 Thread Ravi Kerur
Hi Olivier,


On Thu, Jul 30, 2015 at 1:12 AM, Olivier MATZ 
wrote:

> Hi Thomas & Ravi,
>
>
> On 07/27/2015 02:59 AM, Thomas Monjalon wrote:
>
>> 2015-07-27 02:56, Thomas Monjalon:
>>
>>> v9 was a subset of previous deduplications by Ravi Kerur.
>>> This v10 address the comments I've done on v9.
>>>
>>> Ravi Kerur (3):
>>>eal: deduplicate lcore initialization
>>>eal: deduplicate timer functions
>>>eal: deduplicate memory initialization
>>>
>>
>> Applied shortly to integrate this old pending cleanup in RC2.
>>
>>
> When I try to compile the dpdk for x86_x32-native-linuxapp-gcc , I
> get the following compilation error:
>
>   CC eal_common_timer.o
> In file included from /usr/include/sys/sysctl.h:63:0,
>  from /home/matz/dpdk-pkg-cron/
> dpdk.org/lib/librte_eal/common/eal_common_timer.c:39:
> /usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is
> unsupported in x32 kernel"
>  # error "sysctl system call is unsupported in x32 kernel"
>^
>
> Removing the "#include " line fixes the issue without
> impacting the compilation. I think this include is not needed and
> could be removed.
> I can provide a patch if it's ok for you.
>
>
If it compiles fine on FreeBSD then it should be fine. It primarily needed
for eal_timer.c in FreeBSD environment, during code movement it slipped
through my mind. Sorry for the inconvenience.

Thanks,
Ravi


> Regards,
> Olivier
>
>


[dpdk-dev] how to compile kernel drivers only

2015-07-30 Thread Montorsi, Francesco
Hi all,

I'm trying to compile DPDK kernel drivers (i.e., igb_uio.ko and kni.ko if I got 
it right) only on a certain machine.
On that machine, I'm not interested in anything else. how can I tweak .config 
file to achieve it?

I have tried to set all options to =n, except for:

CONFIG_RTE_LIBRTE_EAL=y
CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y
CONFIG_RTE_EAL_IGB_UIO=y
CONFIG_RTE_EAL_VFIO=y
CONFIG_RTE_LIBRTE_KNI=y

But then I get a compile error about app/dump_cfg:

== Build app/dump_cfg
? CC main.o
? LD dump_cfg
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_common_log.o): 
In function `rte_eal_common_log_init':
eal_common_log.c:(.text+0x1b0): undefined reference to `rte_mempool_create'
eal_common_log.c:(.text+0x1fe): undefined reference to `rte_mempool_lookup'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_pci_uio.o): In 
function `pci_uio_map_resource':
eal_pci_uio.c:(.text+0x4dd): undefined reference to `rte_zmalloc'
eal_pci_uio.c:(.text+0x873): undefined reference to `rte_malloc'
eal_pci_uio.c:(.text+0x9bf): undefined reference to `rte_malloc'
eal_pci_uio.c:(.text+0xb0a): undefined reference to `rte_malloc'
eal_pci_uio.c:(.text+0xc55): undefined reference to `rte_malloc'
eal_pci_uio.c:(.text+0xda6): undefined reference to `rte_malloc'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_pci_uio.o):eal_pci_uio.c:(.text+0xf00):
 more undefined references to `rte_malloc' follow
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_pci_uio.o): In 
function `pci_uio_map_resource':
eal_pci_uio.c:(.text+0x10e4): undefined reference to `rte_free'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_interrupts.o): 
In function `rte_intr_callback_unregister':
eal_interrupts.c:(.text+0x7b6): undefined reference to `rte_free'
eal_interrupts.c:(.text+0x7f2): undefined reference to `rte_free'
eal_interrupts.c:(.text+0x84c): undefined reference to `rte_free'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_interrupts.o): 
In function `rte_intr_callback_register':
eal_interrupts.c:(.text+0x8fb): undefined reference to `rte_zmalloc'
eal_interrupts.c:(.text+0x96f): undefined reference to `rte_zmalloc'
eal_interrupts.c:(.text+0xac4): undefined reference to `rte_free'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_alarm.o): In 
function `rte_eal_alarm_cancel':
eal_alarm.c:(.text+0xa4): undefined reference to `rte_free'
eal_alarm.c:(.text+0x128): undefined reference to `rte_free'
eal_alarm.c:(.text+0x156): undefined reference to `rte_free'
eal_alarm.c:(.text+0x1d3): undefined reference to `rte_free'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_alarm.o): In 
function `rte_eal_alarm_set':
eal_alarm.c:(.text+0x31e): undefined reference to `rte_zmalloc'
dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_alarm.o): In 
function `eal_alarm_callback':
eal_alarm.c:(.text+0x59e): undefined reference to `rte_free'


How can I avoid building any app like dump_cfg?

Thanks!
Francesco



[dpdk-dev] how to compile kernel drivers only

2015-07-30 Thread Thomas Monjalon
2015-07-30 12:17, Montorsi, Francesco:
> How can I avoid building any app like dump_cfg?

In app/Makefile, you'll find the options to disable:
DIRS-$(CONFIG_RTE_APP_TEST) += test
DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl   
 
DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline
DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd
DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test
DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info



[dpdk-dev] how to compile kernel drivers only

2015-07-30 Thread Montorsi, Francesco
Hi Thomas,
Thanks for your reply.

My problem is that I have in app/Makefile:

DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += dump_cfg

So that I should put 

CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n

To disable dump_cfg application build. However, If I do so, the kernel drivers 
are not built at all and make just says:

make T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc EXTRA_LDFLAGS="" 
--directory=dpdk-2.0.0 all
make[1]: Entering directory 
`/home/hammer/share/CSA-Hamachi-Sprint/HW-Accel/drivers/dpdk/dpdk-2.0.0'
== Build lib
== Build lib/librte_compat
  SYMLINK-FILE include/rte_compat.h
== Build lib/librte_eal
== Build app
Build complete

So that 
CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y
Seems to be a pre-requisite of kernel drivers... or am I missing something?

Thanks,
Francesco

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: gioved? 30 luglio 2015 14:23
To: Montorsi, Francesco
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] how to compile kernel drivers only

2015-07-30 12:17, Montorsi, Francesco:
> How can I avoid building any app like dump_cfg?

In app/Makefile, you'll find the options to disable:
DIRS-$(CONFIG_RTE_APP_TEST) += test
DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl   
 
DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline
DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd
DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test
DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info



[dpdk-dev] how to compile kernel drivers only

2015-07-30 Thread Thomas Monjalon
Francesco, please reply below (easier to follow the thread).

2015-07-30 12:48, Montorsi, Francesco:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] 
> > 2015-07-30 12:17, Montorsi, Francesco:
> > > How can I avoid building any app like dump_cfg?
> > 
> > In app/Makefile, you'll find the options to disable:
> > DIRS-$(CONFIG_RTE_APP_TEST) += test
> > DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl
> > DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline
> > DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd
> > DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test
> > DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info
> 
> My problem is that I have in app/Makefile:
> 
> DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += dump_cfg
> 
> So that I should put 
> 
> CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n
> 
> To disable dump_cfg application build. However, If I do so, the kernel 
> drivers are not built at all and make just says:
> 
> make T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc 
> EXTRA_LDFLAGS="" --directory=dpdk-2.0.0 all
> make[1]: Entering directory 
> `/home/hammer/share/CSA-Hamachi-Sprint/HW-Accel/drivers/dpdk/dpdk-2.0.0'
> == Build lib
> == Build lib/librte_compat
>   SYMLINK-FILE include/rte_compat.h
> == Build lib/librte_eal
> == Build app
> Build complete
> 
> So that 
> CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y
> Seems to be a pre-requisite of kernel drivers... or am I missing something?

You're right. You cannot build only kernel drivers.
You are welcome to add a new config option to enable/disable apps.


[dpdk-dev] [PATCH] pci: fix build on FreeBSD

2015-07-30 Thread Thomas Monjalon
Build log:
lib/librte_eal/bsdapp/eal/eal_pci.c:462:9: error:
incompatible integer to pointer conversion passing 'u_int32_t'
(aka 'unsigned int') to parameter of type 'void *'

It is fixed by passing the pointer of pi.pi_data to memcpy.

By the way, it seems strange that pi_data is initialized twice:
.pi_data = *(u_int32_t *)buf
memcpy(&pi.pi_data, buf, len);

Signed-off-by: Thomas Monjalon 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index ff56cd3..6fa0d08 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -459,7 +459,7 @@ int rte_eal_pci_write_config(const struct rte_pci_device 
*dev,
goto error;
}

-   memcpy(pi.pi_data, buf, len);
+   memcpy(&pi.pi_data, buf, len);

fd = open("/dev/pci", O_RDONLY);
if (fd < 0) {
-- 
2.4.2



[dpdk-dev] [PATCH v10 0/3] deduplicate EAL common functions

2015-07-30 Thread Thomas Monjalon
2015-07-30 10:12, Olivier MATZ:
> Hi Thomas & Ravi,
> 
> On 07/27/2015 02:59 AM, Thomas Monjalon wrote:
> > 2015-07-27 02:56, Thomas Monjalon:
> >> v9 was a subset of previous deduplications by Ravi Kerur.
> >> This v10 address the comments I've done on v9.
> >>
> >> Ravi Kerur (3):
> >>eal: deduplicate lcore initialization
> >>eal: deduplicate timer functions
> >>eal: deduplicate memory initialization
> >
> > Applied shortly to integrate this old pending cleanup in RC2.
> >
> 
> When I try to compile the dpdk for x86_x32-native-linuxapp-gcc , I
> get the following compilation error:
> 
>CC eal_common_timer.o
> In file included from /usr/include/sys/sysctl.h:63:0,
>   from 
> /home/matz/dpdk-pkg-cron/dpdk.org/lib/librte_eal/common/eal_common_timer.c:39:
> /usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is 
> unsupported in x32 kernel"
>   # error "sysctl system call is unsupported in x32 kernel"
> ^
> 
> Removing the "#include " line fixes the issue without
> impacting the compilation. I think this include is not needed and
> could be removed.
> I can provide a patch if it's ok for you.

After fixing another build issue on FreeBSD (patch sent), it builds well
without sys/sysctl.h.
So it seems to be an useless inclusion.


[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

2015-07-30 Thread Thomas Monjalon
2015-07-30 13:22, Olivier MATZ:
> On 07/30/2015 11:43 AM, Ananyev, Konstantin wrote:
> > From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> >> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote:
> >>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>  On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> > The similar situation in i40e, as explained by Konstantin.
> > As header split hasn't been supported by DPDK till now. It would be 
> > better to put the header address in RX descriptor to 0.
> > But in the future, during header split enabling. We may need to pay 
> > extra attention to that. As at least x710 datasheet said
>  specifically as below.
> > "The header address should be set by the software to an even number 
> > (word aligned address)". We may need to find a way to
>  ensure that during mempool/mbuf allocation.
> 
>  Indeed it would be good to force the priv_size to be aligned.
> 
>  The priv_size could be aligned automatically in
>  rte_pktmbuf_pool_create(). The only possible problem I could see
>  is that it would break applications that access to the data buffer
>  by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
>  best thing to do (I didn't find any applications like this in dpdk).
> >>>
> >>>
> >>> Might be just make rte_pktmbuf_pool_create() fail if input priv_size % 
> >>> MIN_ALIGN != 0?
> >>
> >> Hmm maybe it would break more applications: an odd priv_size is
> >> probably rare, but a priv_size that is not aligned to 8 bytes is
> >> maybe more common.
> >
> > My thought was that rte_mempool_create() was just introduced in 2.1,
> > so if we add extra requirement for the input parameter now -
> > there would be no ABI breakage, and not many people started to use it 
> > already.
> > For me just seems a bit easier and more straightforward then silent 
> > alignment -
> > user would not have wrong assumptions here.
> > Though if you think that a silent alignment would be more convenient
> > for most users - I wouldn't insist.
> 
> 
> Yes, I agree on the principle, but it depends whether this fix
> is integrated for 2.1 or not.
> I think it may already be a bit late for that, especially as it
> is not a very critical bug.
> 
> Thomas, what do you think?

It is a fix.
Adding a doc comment, an assert and an alignment constraint or a new automatic
alignment in the not yet released function shouldn't hurt.
A patch would be welcome for 2.1. Thanks



[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Olivier Matz
It looks better to have a data buffer address that is aligned to
8 bytes. This is the case when there is no mbuf private area, but
if there is one, the alignment depends on the size of this area
that is located between the mbuf structure and the data buffer.

Indeed, some drivers expects to have the buffer address aligned
to an even address, and moreover an unaligned buffer may impact
the performance when accessing to network headers.

Add a check in rte_pktmbuf_pool_create() to verify the alignment
constraint before creating the mempool. For applications that use
the alternative way (direct call to rte_mempool_create), also
add an assertion in rte_pktmbuf_init().

By the way, also add the MBUF log type.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/include/rte_log.h | 1 +
 lib/librte_mbuf/rte_mbuf.c  | 8 +++-
 lib/librte_mbuf/rte_mbuf.h  | 7 +--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 24a55cc..ede0dca 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -77,6 +77,7 @@ extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */
 #define RTE_LOGTYPE_TABLE   0x4000 /**< Log related to table. */
 #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
+#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */

 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 4320dd4..a1ddbb3 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
mbuf_size = sizeof(struct rte_mbuf) + priv_size;
buf_len = rte_pktmbuf_data_room_size(mp);

+   RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0);
RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size);
RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);

@@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;

-
+   if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) {
+   RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
+   priv_size);
+   rte_errno = EINVAL;
+   return NULL;
+   }
elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
(unsigned)data_room_size;
mbp_priv.mbuf_data_room_size = data_room_size;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 010b32d..c3b8c98 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -698,6 +698,9 @@ extern "C" {
  RTE_PTYPE_INNER_L4_MASK))
 #endif /* RTE_NEXT_ABI */

+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
 /**
  * Get the name of a RX offload flag
  *
@@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
*opaque_arg);
  *   details.
  * @param priv_size
  *   Size of application private are between the rte_mbuf structure
- *   and the data buffer.
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
  * @param data_room_size
  *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
  * @param socket_id
@@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
*opaque_arg);
  *   with rte_errno set appropriately. Possible rte_errno values include:
  *- E_RTE_NO_CONFIG - function could not get pointer to rte_config 
structure
  *- E_RTE_SECONDARY - function was called from a secondary process instance
- *- EINVAL - cache size provided is too large
+ *- EINVAL - cache size provided is too large, or priv_size is not aligned.
  *- ENOSPC - the maximum number of memzones has already been allocated
  *- EEXIST - a memzone with the same name already exists
  *- ENOMEM - no appropriate memory area found in which to create memzone
-- 
2.1.4



[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Ananyev, Konstantin

Hi Olivier,

If fails to compile for me:

/local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c: In function 
?rte_pktmbuf_pool_create?:
/local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: error: 
?rte_errno? undeclared (first use in this function)
   rte_errno = EINVAL;
   ^
/local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: note: each 
undeclared identifier is reported only once for each function it appears in

I had to add:

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index a1ddbb3..04344c0 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * ctrlmbuf constructor, given as a callback function to

Apart from that - looks good to me.
Konstantin

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 2:56 PM
> To: dev at dpdk.org
> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; 
> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com
> Subject: [PATCH] mbuf: enforce alignment of mbuf private area
> 
> It looks better to have a data buffer address that is aligned to
> 8 bytes. This is the case when there is no mbuf private area, but
> if there is one, the alignment depends on the size of this area
> that is located between the mbuf structure and the data buffer.
> 
> Indeed, some drivers expects to have the buffer address aligned
> to an even address, and moreover an unaligned buffer may impact
> the performance when accessing to network headers.
> 
> Add a check in rte_pktmbuf_pool_create() to verify the alignment
> constraint before creating the mempool. For applications that use
> the alternative way (direct call to rte_mempool_create), also
> add an assertion in rte_pktmbuf_init().
> 
> By the way, also add the MBUF log type.
> 
> Signed-off-by: Olivier Matz 
> ---
>  lib/librte_eal/common/include/rte_log.h | 1 +
>  lib/librte_mbuf/rte_mbuf.c  | 8 +++-
>  lib/librte_mbuf/rte_mbuf.h  | 7 +--
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_log.h 
> b/lib/librte_eal/common/include/rte_log.h
> index 24a55cc..ede0dca 100644
> --- a/lib/librte_eal/common/include/rte_log.h
> +++ b/lib/librte_eal/common/include/rte_log.h
> @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs;
>  #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */
>  #define RTE_LOGTYPE_TABLE   0x4000 /**< Log related to table. */
>  #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
> +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */
> 
>  /* these log types can be used in an application */
>  #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 4320dd4..a1ddbb3 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>   mbuf_size = sizeof(struct rte_mbuf) + priv_size;
>   buf_len = rte_pktmbuf_data_room_size(mp);
> 
> + RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0);
>   RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size);
>   RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
> 
> @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
>   struct rte_pktmbuf_pool_private mbp_priv;
>   unsigned elt_size;
> 
> -
> + if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) {
> + RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
> + priv_size);
> + rte_errno = EINVAL;
> + return NULL;
> + }
>   elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>   (unsigned)data_room_size;
>   mbp_priv.mbuf_data_room_size = data_room_size;
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 010b32d..c3b8c98 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -698,6 +698,9 @@ extern "C" {
>   RTE_PTYPE_INNER_L4_MASK))
>  #endif /* RTE_NEXT_ABI */
> 
> +/** Alignment constraint of mbuf private area. */
> +#define RTE_MBUF_PRIV_ALIGN 8
> +
>  /**
>   * Get the name of a RX offload flag
>   *
> @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
> *opaque_arg);
>   *   details.
>   * @param priv_size
>   *   Size of application private are between the rte_mbuf structure
> - *   and the data buffer.
> + *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
>   * @param data_room_size
>   *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
>   * @param socket_id
> @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
> *opaque_arg);
>   *   with rte_errno set appropriately. Possib

[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode

2015-07-30 Thread O'Driscoll, Tim
Hi Neil,

There have been a few deprecation notices like this one submitted. Since you 
drove the ABI policy, it would be good to get confirmation from you that these 
are compliant with the policy and that you don't see any issues. Ideally, it 
would be great if you can review and ack them. If you don't have the time, even 
just a general indication that you don't see any problems would be useful.


Thanks,
Tim

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liu, Yong
> Sent: Thursday, July 30, 2015 6:15 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] doc: announce abi change for
> interrupt mode
> 
> Acked-by: Marvin Liu 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, July 30, 2015 1:05 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v3] doc: announce abi change for interrupt
> mode
> >
> > The patch announces the planned ABI changes for interrupt mode.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  v3 change:
> >- reword for CONFIG_RTE_NEXT_ABI
> >
> >  v2 change:
> >- rebase to recent master
> >
> >  doc/guides/rel_notes/deprecation.rst | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index 5330d3b..d36d267 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -35,3 +35,8 @@ Deprecation Notices
> >  * The following fields have been deprecated in rte_eth_stats:
> >imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
> >tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff
> > +
> > +* The ABI changes are planned for struct rte_intr_handle, struct
> > rte_eth_conf
> > +  and struct eth_dev_ops to support interrupt mode feature from
> release
> > 2.1.
> > +  Those changes may be enabled in the upcoming release 2.1
> > +  with CONFIG_RTE_NEXT_ABI.
> > --
> > 1.8.1.4



[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Ravi Kerur
On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin 
wrote:

> Hi,
>
> thanks for reply. I could see those docs but it does not help me a lot.
> I still do not understand very well the principle of the tool. How it
> chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and
> testpmd. They have somehow opposite meaning. I can start testpmd now,
> however, it does ot probe any NIC. I've tried -w to whitelist certain
> NICs but with no success.
>
> $ dpdk_nic_bind --status
>
> Network devices using DPDK-compatible driver
> 
> :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> unused=e1000
> :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> unused=e1000
>

NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check the
device-id.  Adding support in PMD should not be a problem, but I am not
sure on support since there is End of Life listed on Intel Website

http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller


> Network devices using kernel driver
> ===
> :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> unused=uio_pci_generic *Active*
>

DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested
currently.


>
> Other network devices
> =
> 
>
> $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 0 on socket 0
> EAL: Detected lcore 3 as core 1 on socket 0
> EAL: Support maximum 128 logical core(s) by configuration.
> EAL: Detected 4 lcore(s)
> EAL: VFIO modules not all loaded, skip VFIO support...
> EAL: Setting up physically contiguous memory...
> EAL: Ask a virtual area of 0x3c0 bytes
> EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fe973a0 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fe97360 (size = 0x20)
> EAL: Ask a virtual area of 0x3c0 bytes
> EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fe96f40 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fe96f00 (size = 0x20)
> EAL: Requesting 64 pages of size 2MB from socket 0
> EAL: TSC frequency is ~368 KHz
> EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0])
> EAL: lcore 1 is ready (tid=6efff700;cpuset=[1])
> EAL: No probed ethernet devices
> Interactive-mode selected
> Done
> testpmd>
>
> Thanks
> Jan Viktorin
>
> On Wed, 29 Jul 2015 12:09:06 +0300
> ciprian.barbu  wrote:
>
> >
> >
> > On 28.07.2015 21:13, Jan Viktorin wrote:
> > > Hello all,
> > >
> > > I am learning how to measure throughput with dpdk. I have 4 cores
> > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs connected
> > > together. I do not understand very well, how to setup testpmd.
> >
> > http://dpdk.org/doc
> > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html
> > http://dpdk.org/doc/quick-start
> >
> > >
> > > I've successfully bound the NICs to dpdk:
> > >
> > > $ dpdk_nic_bind --status
> > >
> > > Network devices using DPDK-compatible driver
> > > 
> > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> unused=e1000
> > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> unused=e1000
> > >
> > > Network devices using kernel driver
> > > ===
> > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> unused=uio_pci_generic *Active*
> > >
> > > Other network devices
> > > =
> > > 
> > >
> > > and then I tried to run testpmd:
> > >
> > > sudo ./testpmd -b :03:00.0 -b :03:02.0 -c 0xf -n2 --
> --nb-cores=1 --nb-ports=0 --rxd=2048 --txd=2048 --mbcache=512 --burst=512
> >
> >
> http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html#testpmd-command-line-options
> >
> > The -b option black lists your PCI devices, you don't need those. The
> > --nb-ports is of course the number of ports, it cannot be 0.
> >
> > > ...
> > > EAL: Ask a virtual area of 0x40 bytes
> > > EAL: Virtual area found at 0x7f154980 (size = 0x40)
> > > EAL: Requesting 1024 pages of size 2MB from socket 0
> > > EAL: TSC frequency is ~369 KHz
> > > EAL: Master lcore 0 is ready (tid=de94a8c0;cpuset=[0])
> > > EAL: lcore 2 is ready (tid=487fd700;cpuset=[2])
> > > EAL: lcore 3 is ready (tid=47ffc700;cpuset=[3])
> > > EAL: lcore 1 is ready (tid=48ffe700;cpuset=[1])
> > > EAL: No probed ethernet devices
> > > EAL: Error - exiting with code: 1
> > >Cause: Invalid port 0
> > >
> > > I tried --nb-ports={0,1,2} but neither of them works. BTW, what does
> this option it mean? :)
> > > I could not find a

[dpdk-dev] [PACTH v2 1/2] mk: use LDLIBS variable when building the shared object file

2015-07-30 Thread Nelio Laranjeiro
Some .so libraries needs to be linked with external libraries.  For that the
LDLIBS variable should be present on the link line when those .so files are
created.  PMD Makefile is responsible for filling the LDLIBS variable with
the link to the external library it needs.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Olivier Matz 
---
Changelog: add missing EXTRA_LDFLAGS variable necessary to link with an
external library when it is not installed on the system or located somewhere
else.

 mk/rte.lib.mk | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 9ff5cce..fcc8e20 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -81,7 +81,8 @@ O_TO_A_DO = @set -e; \
$(O_TO_A) && \
echo $(O_TO_A_CMD) > $(call exe2cmd,$(@))

-O_TO_S = $(LD) $(_CPU_LDFLAGS) -shared $(OBJS-y) -Wl,-soname,$(LIB) -o $(LIB)
+O_TO_S = $(LD) $(_CPU_LDFLAGS) $(EXTRA_LDFLAGS) $(LDLIBS) -shared $(OBJS-y) \
+-Wl,-soname,$(LIB) -o $(LIB)
 O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
 O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
 O_TO_S_DO = @set -e; \
-- 
1.9.1



[dpdk-dev] [PACTH v2 2/2] mlx4: fix shared library dependency

2015-07-30 Thread Nelio Laranjeiro
librte_pmd_mlx4.so needs to be linked with libiverbs otherwise, the PMD is not
able to open Mellanox devices and the following message is printed by testpmd
at startup "librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?".

Applications dependency on libverbs are moved to be only valid in static mode,
in shared mode, applications do not depend on it anymore,
librte_pmd_mlx4.so keeps this dependency and thus is linked with libverbs.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Olivier Matz 
---
Changelog: don't compiled MLX4 PMD when the DPDK is build in combined shared
library.


 doc/guides/nics/mlx4.rst  | 5 +
 drivers/net/Makefile  | 6 +-
 drivers/net/mlx4/Makefile | 1 +
 mk/rte.app.mk | 2 +-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index c33aa38..840cb65 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -47,6 +47,11 @@ There is also a `section dedicated to this poll mode driver
be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and
recompiling DPDK.

+.. warning::
+
+   ``CONFIG_RTE_BUILD_COMBINE_LIBS`` is not supported (if set, it will not
+   compile this PMD even if ``CONFIG_RTE_LIBRTE_MLX4_PMD`` is set).
+
 Implementation details
 --

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..1725c94 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -40,7 +40,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
-DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap
@@ -49,5 +48,10 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+# Drivers not support in combined mode
+ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)
+DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
+endif
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 14cb53f..d2f5692 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -50,6 +50,7 @@ CFLAGS += -g
 CFLAGS += -I.
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -libverbs

 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 97719cb..04af756 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -100,7 +100,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),n)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lfuse
 endif

-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)   += -libverbs
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)  += -lz

 _LDLIBS-y += --start-group
@@ -140,6 +139,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING)   += 
-lrte_pmd_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)   += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)   += -lrte_pmd_null
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)   += -libverbs

 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)

-- 
1.9.1



[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov
Hi, Konstantin, Helin,
there is a documented limitation of xl710 controllers (i40e driver) 
which is not handled in any way by a DPDK driver.
 From the datasheet chapter 8.4.1:

"? A single transmit packet may span up to 8 buffers (up to 8 data descriptors 
per packet including
both the header and payload buffers).
? The total number of data descriptors for the whole TSO (explained later on in 
this chapter) is
unlimited as long as each segment within the TSO obeys the previous rule (up to 
8 data descriptors
per segment for both the TSO header and the segment payload buffers)."

This means that, for instance, long cluster with small fragments has to 
be linearized before it may be placed on the HW ring.
In more standard environments like Linux or FreeBSD drivers the solution 
is straight forward - call skb_linearize()/m_collapse() corresponding.
In the non-conformist environment like DPDK life is not that easy - 
there is no easy way to collapse the cluster into a linear buffer from 
inside the device driver
since device driver doesn't allocate memory in a fast path and utilizes 
the user allocated pools only.

Here are two proposals for a solution:

 1. We may provide a callback that would return a user TRUE if a give
cluster has to be linearized and it should always be called before
rte_eth_tx_burst(). Alternatively it may be called from inside the
rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
error code for a case when one of the clusters it's given has to be
linearized.
 2. Another option is to allocate a mempool in the driver with the
elements consuming a single page each (standard 2KB buffers would
do). Number of elements in the pool should be as Tx ring length
multiplied by "64KB/(linear data length of the buffer in the pool
above)". Here I use 64KB as a maximum packet length and not taking
into an account esoteric things like "Giant" TSO mentioned in the
spec above. Then we may actually go and linearize the cluster if
needed on top of the buffers from the pool above, post the buffer
from the mempool above on the HW ring, link the original cluster to
that new cluster (using the private data) and release it when the
send is done.


The first is a change in the API and would require from the application 
some additional handling (linearization). The second would require some 
additional memory but would keep all dirty details inside the driver and 
would leave the rest of the code intact.

Pls., comment.

thanks,
vlad




[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Jan Viktorin
The 82545 is listed at http://dpdk.org/doc/nics and I can see it in
rte_pci_dev_ids.h/e1000_hw.h:

196 #define E1000_DEV_ID_82545GM_COPPER   0x1026

$ lspci -nn
...
03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit Ethernet 
Controller [8086:1026] (rev 04)
03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit Ethernet 
Controller [8086:1026] (rev 04)

However, it is rev 04 and in e1000_hw.h there is just e1000_82545_rev_3.
But this should not avoid the match (?). Is it possible to grow the verbosity
level of the device matching process in DPDK?

I do not expect any support, I just wanted to use it for sending traffic
at 1 Gbps because there are two such cards mostly unused in my computer.
I did not plan to use I217-V (in fact, I did not expect much from this
integrated NIC and I did not even notice it is an Intel one...).

Regards
Jan Viktorin

On Thu, 30 Jul 2015 07:44:14 -0700
Ravi Kerur  wrote:

> On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin 
> wrote:
> 
> > Hi,
> >
> > thanks for reply. I could see those docs but it does not help me a lot.
> > I still do not understand very well the principle of the tool. How it
> > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and
> > testpmd. They have somehow opposite meaning. I can start testpmd now,
> > however, it does ot probe any NIC. I've tried -w to whitelist certain
> > NICs but with no success.
> >
> > $ dpdk_nic_bind --status
> >
> > Network devices using DPDK-compatible driver
> > 
> > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > unused=e1000
> > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > unused=e1000
> >
> 
> NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check the
> device-id.  Adding support in PMD should not be a problem, but I am not
> sure on support since there is End of Life listed on Intel Website
> 
> http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller
> 
> 
> > Network devices using kernel driver
> > ===
> > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> > unused=uio_pci_generic *Active*
> >
> 
> DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested
> currently.
> 
> 
> >
> > Other network devices
> > =
> > 
> >
> > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
> > EAL: Detected lcore 0 as core 0 on socket 0
> > EAL: Detected lcore 1 as core 1 on socket 0
> > EAL: Detected lcore 2 as core 0 on socket 0
> > EAL: Detected lcore 3 as core 1 on socket 0
> > EAL: Support maximum 128 logical core(s) by configuration.
> > EAL: Detected 4 lcore(s)
> > EAL: VFIO modules not all loaded, skip VFIO support...
> > EAL: Setting up physically contiguous memory...
> > EAL: Ask a virtual area of 0x3c0 bytes
> > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0)
> > EAL: Ask a virtual area of 0x20 bytes
> > EAL: Virtual area found at 0x7fe973a0 (size = 0x20)
> > EAL: Ask a virtual area of 0x20 bytes
> > EAL: Virtual area found at 0x7fe97360 (size = 0x20)
> > EAL: Ask a virtual area of 0x3c0 bytes
> > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0)
> > EAL: Ask a virtual area of 0x20 bytes
> > EAL: Virtual area found at 0x7fe96f40 (size = 0x20)
> > EAL: Ask a virtual area of 0x20 bytes
> > EAL: Virtual area found at 0x7fe96f00 (size = 0x20)
> > EAL: Requesting 64 pages of size 2MB from socket 0
> > EAL: TSC frequency is ~368 KHz
> > EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0])
> > EAL: lcore 1 is ready (tid=6efff700;cpuset=[1])
> > EAL: No probed ethernet devices
> > Interactive-mode selected
> > Done
> > testpmd>
> >
> > Thanks
> > Jan Viktorin
> >
> > On Wed, 29 Jul 2015 12:09:06 +0300
> > ciprian.barbu  wrote:
> >
> > >
> > >
> > > On 28.07.2015 21:13, Jan Viktorin wrote:
> > > > Hello all,
> > > >
> > > > I am learning how to measure throughput with dpdk. I have 4 cores
> > > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs connected
> > > > together. I do not understand very well, how to setup testpmd.
> > >
> > > http://dpdk.org/doc
> > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html
> > > http://dpdk.org/doc/quick-start
> > >
> > > >
> > > > I've successfully bound the NICs to dpdk:
> > > >
> > > > $ dpdk_nic_bind --status
> > > >
> > > > Network devices using DPDK-compatible driver
> > > > 
> > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > unused=e1000
> > > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > unused=e1000
> > > >
> > > > Network devices using kernel driver
> > > > ===
> > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> > unused=uio_pci_generic *Active*
> > > >
> > > > O

[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Zhang, Helin


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 6:56 AM
> To: dev at dpdk.org
> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin;
> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com
> Subject: [PATCH] mbuf: enforce alignment of mbuf private area
> 
> It looks better to have a data buffer address that is aligned to
> 8 bytes. This is the case when there is no mbuf private area, but if there is 
> one,
> the alignment depends on the size of this area that is located between the 
> mbuf
> structure and the data buffer.
> 
> Indeed, some drivers expects to have the buffer address aligned to an even
> address, and moreover an unaligned buffer may impact the performance when
> accessing to network headers.
> 
> Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint
> before creating the mempool. For applications that use the alternative way
> (direct call to rte_mempool_create), also add an assertion in 
> rte_pktmbuf_init().
> 
> By the way, also add the MBUF log type.
> 
> Signed-off-by: Olivier Matz 
> ---
>  lib/librte_eal/common/include/rte_log.h | 1 +
>  lib/librte_mbuf/rte_mbuf.c  | 8 +++-
>  lib/librte_mbuf/rte_mbuf.h  | 7 +--
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_log.h
> b/lib/librte_eal/common/include/rte_log.h
> index 24a55cc..ede0dca 100644
> --- a/lib/librte_eal/common/include/rte_log.h
> +++ b/lib/librte_eal/common/include/rte_log.h
> @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs;
>  #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */
>  #define RTE_LOGTYPE_TABLE   0x4000 /**< Log related to table. */
>  #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
> +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */
> 
>  /* these log types can be used in an application */
>  #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index
> 4320dd4..a1ddbb3 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>   mbuf_size = sizeof(struct rte_mbuf) + priv_size;
>   buf_len = rte_pktmbuf_data_room_size(mp);
> 
> + RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0);
Using RTE_ALIGN() could be more readable?

>   RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size);
>   RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
> 
> @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned
> n,
>   struct rte_pktmbuf_pool_private mbp_priv;
>   unsigned elt_size;
> 
> -
> + if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) {
Using RTE_ALIGN() could be more readable?

> + RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
> + priv_size);
> + rte_errno = EINVAL;
> + return NULL;
> + }
>   elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>   (unsigned)data_room_size;
>   mbp_priv.mbuf_data_room_size = data_room_size; diff --git
> a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> 010b32d..c3b8c98 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -698,6 +698,9 @@ extern "C" {
> 
> RTE_PTYPE_INNER_L4_MASK))  #endif /* RTE_NEXT_ABI */
> 
> +/** Alignment constraint of mbuf private area. */ #define
> +RTE_MBUF_PRIV_ALIGN 8
> +
>  /**
>   * Get the name of a RX offload flag
>   *
> @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp,
> void *opaque_arg);
>   *   details.
>   * @param priv_size
>   *   Size of application private are between the rte_mbuf structure
> - *   and the data buffer.
> + *   and the data buffer. This value must be aligned to
> RTE_MBUF_PRIV_ALIGN.
>   * @param data_room_size
>   *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
>   * @param socket_id
> @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp,
> void *opaque_arg);
>   *   with rte_errno set appropriately. Possible rte_errno values include:
>   *- E_RTE_NO_CONFIG - function could not get pointer to rte_config
> structure
>   *- E_RTE_SECONDARY - function was called from a secondary process
> instance
> - *- EINVAL - cache size provided is too large
> + *- EINVAL - cache size provided is too large, or priv_size is not 
> aligned.
>   *- ENOSPC - the maximum number of memzones has already been
> allocated
>   *- EEXIST - a memzone with the same name already exists
>   *- ENOMEM - no appropriate memory area found in which to create
> memzone
> --
> 2.1.4



[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Ravi Kerur
On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin 
wrote:

> The 82545 is listed at http://dpdk.org/doc/nics and I can see it in
> rte_pci_dev_ids.h/e1000_hw.h:
>
> 196 #define E1000_DEV_ID_82545GM_COPPER   0x1026
>
> $ lspci -nn
> ...
> 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> Ethernet Controller [8086:1026] (rev 04)
> 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> Ethernet Controller [8086:1026] (rev 04)
>
> However, it is rev 04 and in e1000_hw.h there is just e1000_82545_rev_3.
> But this should not avoid the match (?). Is it possible to grow the
> verbosity
> level of the device matching process in DPDK?
>

Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add
device-id via RTE_PCI_DEV_ID_DECL_EM.

>
> I do not expect any support, I just wanted to use it for sending traffic
> at 1 Gbps because there are two such cards mostly unused in my computer.
> I did not plan to use I217-V (in fact, I did not expect much from this
> integrated NIC and I did not even notice it is an Intel one...).
>
> Regards
> Jan Viktorin
>
> On Thu, 30 Jul 2015 07:44:14 -0700
> Ravi Kerur  wrote:
>
> > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin 
> > wrote:
> >
> > > Hi,
> > >
> > > thanks for reply. I could see those docs but it does not help me a lot.
> > > I still do not understand very well the principle of the tool. How it
> > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and
> > > testpmd. They have somehow opposite meaning. I can start testpmd now,
> > > however, it does ot probe any NIC. I've tried -w to whitelist certain
> > > NICs but with no success.
> > >
> > > $ dpdk_nic_bind --status
> > >
> > > Network devices using DPDK-compatible driver
> > > 
> > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > > unused=e1000
> > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > > unused=e1000
> > >
> >
> > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check
> the
> > device-id.  Adding support in PMD should not be a problem, but I am not
> > sure on support since there is End of Life listed on Intel Website
> >
> >
> http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller
> >
> >
> > > Network devices using kernel driver
> > > ===
> > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> > > unused=uio_pci_generic *Active*
> > >
> >
> > DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested
> > currently.
> >
> >
> > >
> > > Other network devices
> > > =
> > > 
> > >
> > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
> > > EAL: Detected lcore 0 as core 0 on socket 0
> > > EAL: Detected lcore 1 as core 1 on socket 0
> > > EAL: Detected lcore 2 as core 0 on socket 0
> > > EAL: Detected lcore 3 as core 1 on socket 0
> > > EAL: Support maximum 128 logical core(s) by configuration.
> > > EAL: Detected 4 lcore(s)
> > > EAL: VFIO modules not all loaded, skip VFIO support...
> > > EAL: Setting up physically contiguous memory...
> > > EAL: Ask a virtual area of 0x3c0 bytes
> > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0)
> > > EAL: Ask a virtual area of 0x20 bytes
> > > EAL: Virtual area found at 0x7fe973a0 (size = 0x20)
> > > EAL: Ask a virtual area of 0x20 bytes
> > > EAL: Virtual area found at 0x7fe97360 (size = 0x20)
> > > EAL: Ask a virtual area of 0x3c0 bytes
> > > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0)
> > > EAL: Ask a virtual area of 0x20 bytes
> > > EAL: Virtual area found at 0x7fe96f40 (size = 0x20)
> > > EAL: Ask a virtual area of 0x20 bytes
> > > EAL: Virtual area found at 0x7fe96f00 (size = 0x20)
> > > EAL: Requesting 64 pages of size 2MB from socket 0
> > > EAL: TSC frequency is ~368 KHz
> > > EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0])
> > > EAL: lcore 1 is ready (tid=6efff700;cpuset=[1])
> > > EAL: No probed ethernet devices
> > > Interactive-mode selected
> > > Done
> > > testpmd>
> > >
> > > Thanks
> > > Jan Viktorin
> > >
> > > On Wed, 29 Jul 2015 12:09:06 +0300
> > > ciprian.barbu  wrote:
> > >
> > > >
> > > >
> > > > On 28.07.2015 21:13, Jan Viktorin wrote:
> > > > > Hello all,
> > > > >
> > > > > I am learning how to measure throughput with dpdk. I have 4 cores
> > > > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs
> connected
> > > > > together. I do not understand very well, how to setup testpmd.
> > > >
> > > > http://dpdk.org/doc
> > > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html
> > > > http://dpdk.org/doc/quick-start
> > > >
> > > > >
> > > > > I've successfully bound the NICs to dpdk:
> > > > >
> > > > > $ dpdk_nic_bind --status
> > > > >
> > > > > Network devices using DPDK-compatible driver
> > > > > ==

[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Olivier MATZ
On 07/30/2015 04:13 PM, Ananyev, Konstantin wrote:
>
> Hi Olivier,
>
> If fails to compile for me:
>
> /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c: In function 
> ?rte_pktmbuf_pool_create?:
> /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: error: 
> ?rte_errno? undeclared (first use in this function)
> rte_errno = EINVAL;
> ^
> /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: note: each 
> undeclared identifier is reported only once for each function it appears in
>
> I had to add:

Sorry I had the same error but I forgot to squash the fix... :/

I'm sending a v2


>
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index a1ddbb3..04344c0 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -58,6 +58,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>
>   /*
>* ctrlmbuf constructor, given as a callback function to
>
> Apart from that - looks good to me.
> Konstantin
>
>> -Original Message-
>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
>> Sent: Thursday, July 30, 2015 2:56 PM
>> To: dev at dpdk.org
>> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; 
>> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com
>> Subject: [PATCH] mbuf: enforce alignment of mbuf private area
>>
>> It looks better to have a data buffer address that is aligned to
>> 8 bytes. This is the case when there is no mbuf private area, but
>> if there is one, the alignment depends on the size of this area
>> that is located between the mbuf structure and the data buffer.
>>
>> Indeed, some drivers expects to have the buffer address aligned
>> to an even address, and moreover an unaligned buffer may impact
>> the performance when accessing to network headers.
>>
>> Add a check in rte_pktmbuf_pool_create() to verify the alignment
>> constraint before creating the mempool. For applications that use
>> the alternative way (direct call to rte_mempool_create), also
>> add an assertion in rte_pktmbuf_init().
>>
>> By the way, also add the MBUF log type.
>>
>> Signed-off-by: Olivier Matz 
>> ---
>>   lib/librte_eal/common/include/rte_log.h | 1 +
>>   lib/librte_mbuf/rte_mbuf.c  | 8 +++-
>>   lib/librte_mbuf/rte_mbuf.h  | 7 +--
>>   3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/include/rte_log.h 
>> b/lib/librte_eal/common/include/rte_log.h
>> index 24a55cc..ede0dca 100644
>> --- a/lib/librte_eal/common/include/rte_log.h
>> +++ b/lib/librte_eal/common/include/rte_log.h
>> @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs;
>>   #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */
>>   #define RTE_LOGTYPE_TABLE   0x4000 /**< Log related to table. */
>>   #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
>> +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */
>>
>>   /* these log types can be used in an application */
>>   #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>> index 4320dd4..a1ddbb3 100644
>> --- a/lib/librte_mbuf/rte_mbuf.c
>> +++ b/lib/librte_mbuf/rte_mbuf.c
>> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>>  mbuf_size = sizeof(struct rte_mbuf) + priv_size;
>>  buf_len = rte_pktmbuf_data_room_size(mp);
>>
>> +RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0);
>>  RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size);
>>  RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
>>
>> @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
>>  struct rte_pktmbuf_pool_private mbp_priv;
>>  unsigned elt_size;
>>
>> -
>> +if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) {
>> +RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
>> +priv_size);
>> +rte_errno = EINVAL;
>> +return NULL;
>> +}
>>  elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>>  (unsigned)data_room_size;
>>  mbp_priv.mbuf_data_room_size = data_room_size;
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 010b32d..c3b8c98 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -698,6 +698,9 @@ extern "C" {
>>RTE_PTYPE_INNER_L4_MASK))
>>   #endif /* RTE_NEXT_ABI */
>>
>> +/** Alignment constraint of mbuf private area. */
>> +#define RTE_MBUF_PRIV_ALIGN 8
>> +
>>   /**
>>* Get the name of a RX offload flag
>>*
>> @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, 
>> void *opaque_arg);
>>*   details.
>>* @param priv_size
>>*   Size of application private are between the rte_mbuf structure
>> - *   and the data buffer.
>> + *   and the data buffer. This value must be aligned to RTE_MBUF_PR

[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Olivier MATZ
On 07/30/2015 05:33 PM, Zhang, Helin wrote:
>
>
>> -Original Message-
>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
>> Sent: Thursday, July 30, 2015 6:56 AM
>> To: dev at dpdk.org
>> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin;
>> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com
>> Subject: [PATCH] mbuf: enforce alignment of mbuf private area
>>
>> It looks better to have a data buffer address that is aligned to
>> 8 bytes. This is the case when there is no mbuf private area, but if there 
>> is one,
>> the alignment depends on the size of this area that is located between the 
>> mbuf
>> structure and the data buffer.
>>
>> Indeed, some drivers expects to have the buffer address aligned to an even
>> address, and moreover an unaligned buffer may impact the performance when
>> accessing to network headers.
>>
>> Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint
>> before creating the mempool. For applications that use the alternative way
>> (direct call to rte_mempool_create), also add an assertion in 
>> rte_pktmbuf_init().
>>
>> By the way, also add the MBUF log type.
>>
>> Signed-off-by: Olivier Matz 
>> ---
>>   lib/librte_eal/common/include/rte_log.h | 1 +
>>   lib/librte_mbuf/rte_mbuf.c  | 8 +++-
>>   lib/librte_mbuf/rte_mbuf.h  | 7 +--
>>   3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/include/rte_log.h
>> b/lib/librte_eal/common/include/rte_log.h
>> index 24a55cc..ede0dca 100644
>> --- a/lib/librte_eal/common/include/rte_log.h
>> +++ b/lib/librte_eal/common/include/rte_log.h
>> @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs;
>>   #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */
>>   #define RTE_LOGTYPE_TABLE   0x4000 /**< Log related to table. */
>>   #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
>> +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */
>>
>>   /* these log types can be used in an application */
>>   #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index
>> 4320dd4..a1ddbb3 100644
>> --- a/lib/librte_mbuf/rte_mbuf.c
>> +++ b/lib/librte_mbuf/rte_mbuf.c
>> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>>  mbuf_size = sizeof(struct rte_mbuf) + priv_size;
>>  buf_len = rte_pktmbuf_data_room_size(mp);
>>
>> +RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0);
> Using RTE_ALIGN() could be more readable?
>
>>  RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size);
>>  RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
>>
>> @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned
>> n,
>>  struct rte_pktmbuf_pool_private mbp_priv;
>>  unsigned elt_size;
>>
>> -
>> +if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) {
> Using RTE_ALIGN() could be more readable?

Will do, thanks for commenting.



>
>> +RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
>> +priv_size);
>> +rte_errno = EINVAL;
>> +return NULL;
>> +}
>>  elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>>  (unsigned)data_room_size;
>>  mbp_priv.mbuf_data_room_size = data_room_size; diff --git
>> a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
>> 010b32d..c3b8c98 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -698,6 +698,9 @@ extern "C" {
>>
>> RTE_PTYPE_INNER_L4_MASK))  #endif /* RTE_NEXT_ABI */
>>
>> +/** Alignment constraint of mbuf private area. */ #define
>> +RTE_MBUF_PRIV_ALIGN 8
>> +
>>   /**
>>* Get the name of a RX offload flag
>>*
>> @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp,
>> void *opaque_arg);
>>*   details.
>>* @param priv_size
>>*   Size of application private are between the rte_mbuf structure
>> - *   and the data buffer.
>> + *   and the data buffer. This value must be aligned to
>> RTE_MBUF_PRIV_ALIGN.
>>* @param data_room_size
>>*   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
>>* @param socket_id
>> @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp,
>> void *opaque_arg);
>>*   with rte_errno set appropriately. Possible rte_errno values include:
>>*- E_RTE_NO_CONFIG - function could not get pointer to rte_config
>> structure
>>*- E_RTE_SECONDARY - function was called from a secondary process
>> instance
>> - *- EINVAL - cache size provided is too large
>> + *- EINVAL - cache size provided is too large, or priv_size is not 
>> aligned.
>>*- ENOSPC - the maximum number of memzones has already been
>> allocated
>>*- EEXIST - a memzone with the same name already exists
>>*- ENOMEM - no appropriate memory area found in which to create
>> memzon

[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Zhang, Helin


> -Original Message-
> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Thursday, July 30, 2015 7:58 AM
> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
> Subject: RFC: i40e xmit path HW limitation
> 
> Hi, Konstantin, Helin,
> there is a documented limitation of xl710 controllers (i40e driver) which is 
> not
> handled in any way by a DPDK driver.
>  From the datasheet chapter 8.4.1:
> 
> "? A single transmit packet may span up to 8 buffers (up to 8 data 
> descriptors per
> packet including both the header and payload buffers).
> ? The total number of data descriptors for the whole TSO (explained later on 
> in
> this chapter) is unlimited as long as each segment within the TSO obeys the
> previous rule (up to 8 data descriptors per segment for both the TSO header 
> and
> the segment payload buffers)."
Yes, I remember the RX side just supports 5 segments per packet receiving.
But what's the possible issue you thought about?

> 
> This means that, for instance, long cluster with small fragments has to be
> linearized before it may be placed on the HW ring.
What type of size of the small fragments? Basically 2KB is the default size of 
mbuf of most
example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the 
maximum
packet size we supported.
If 1KB mbuf is used, don't expect it can transmit more than 8KB size of packet.

> In more standard environments like Linux or FreeBSD drivers the solution is
> straight forward - call skb_linearize()/m_collapse() corresponding.
> In the non-conformist environment like DPDK life is not that easy - there is 
> no
> easy way to collapse the cluster into a linear buffer from inside the device 
> driver
> since device driver doesn't allocate memory in a fast path and utilizes the 
> user
> allocated pools only.

> 
> Here are two proposals for a solution:
> 
>  1. We may provide a callback that would return a user TRUE if a give
> cluster has to be linearized and it should always be called before
> rte_eth_tx_burst(). Alternatively it may be called from inside the
> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
> error code for a case when one of the clusters it's given has to be
> linearized.
>  2. Another option is to allocate a mempool in the driver with the
> elements consuming a single page each (standard 2KB buffers would
> do). Number of elements in the pool should be as Tx ring length
> multiplied by "64KB/(linear data length of the buffer in the pool
> above)". Here I use 64KB as a maximum packet length and not taking
> into an account esoteric things like "Giant" TSO mentioned in the
> spec above. Then we may actually go and linearize the cluster if
> needed on top of the buffers from the pool above, post the buffer
> from the mempool above on the HW ring, link the original cluster to
> that new cluster (using the private data) and release it when the
> send is done.
> 
> 
> The first is a change in the API and would require from the application some
> additional handling (linearization). The second would require some additional
> memory but would keep all dirty details inside the driver and would leave the
> rest of the code intact.
> 
> Pls., comment.
> 
> thanks,
> vlad
> 



[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Stephen Hemminger
On Thu, 30 Jul 2015 17:57:33 +0300
Vlad Zolotarov  wrote:

> Hi, Konstantin, Helin,
> there is a documented limitation of xl710 controllers (i40e driver) 
> which is not handled in any way by a DPDK driver.
>  From the datasheet chapter 8.4.1:
> 
> "? A single transmit packet may span up to 8 buffers (up to 8 data 
> descriptors per packet including
> both the header and payload buffers).
> ? The total number of data descriptors for the whole TSO (explained later on 
> in this chapter) is
> unlimited as long as each segment within the TSO obeys the previous rule (up 
> to 8 data descriptors
> per segment for both the TSO header and the segment payload buffers)."
> 
> This means that, for instance, long cluster with small fragments has to 
> be linearized before it may be placed on the HW ring.
> In more standard environments like Linux or FreeBSD drivers the solution 
> is straight forward - call skb_linearize()/m_collapse() corresponding.
> In the non-conformist environment like DPDK life is not that easy - 
> there is no easy way to collapse the cluster into a linear buffer from 
> inside the device driver
> since device driver doesn't allocate memory in a fast path and utilizes 
> the user allocated pools only.
> 
> Here are two proposals for a solution:
> 
>  1. We may provide a callback that would return a user TRUE if a give
> cluster has to be linearized and it should always be called before
> rte_eth_tx_burst(). Alternatively it may be called from inside the
> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
> error code for a case when one of the clusters it's given has to be
> linearized.
>  2. Another option is to allocate a mempool in the driver with the
> elements consuming a single page each (standard 2KB buffers would
> do). Number of elements in the pool should be as Tx ring length
> multiplied by "64KB/(linear data length of the buffer in the pool
> above)". Here I use 64KB as a maximum packet length and not taking
> into an account esoteric things like "Giant" TSO mentioned in the
> spec above. Then we may actually go and linearize the cluster if
> needed on top of the buffers from the pool above, post the buffer
> from the mempool above on the HW ring, link the original cluster to
> that new cluster (using the private data) and release it when the
> send is done.

Or just silently drop heavily scattered packets (and increment oerrors)
with a PMD_TX_LOG debug message.

I think a DPDK driver doesn't have to accept all possible mbufs and do
extra work. It seems reasonable to expect caller to be well behaved
in this restricted ecosystem.



[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Avi Kivity


On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
> On Thu, 30 Jul 2015 17:57:33 +0300
> Vlad Zolotarov  wrote:
>
>> Hi, Konstantin, Helin,
>> there is a documented limitation of xl710 controllers (i40e driver)
>> which is not handled in any way by a DPDK driver.
>>   From the datasheet chapter 8.4.1:
>>
>> "? A single transmit packet may span up to 8 buffers (up to 8 data 
>> descriptors per packet including
>> both the header and payload buffers).
>> ? The total number of data descriptors for the whole TSO (explained later on 
>> in this chapter) is
>> unlimited as long as each segment within the TSO obeys the previous rule (up 
>> to 8 data descriptors
>> per segment for both the TSO header and the segment payload buffers)."
>>
>> This means that, for instance, long cluster with small fragments has to
>> be linearized before it may be placed on the HW ring.
>> In more standard environments like Linux or FreeBSD drivers the solution
>> is straight forward - call skb_linearize()/m_collapse() corresponding.
>> In the non-conformist environment like DPDK life is not that easy -
>> there is no easy way to collapse the cluster into a linear buffer from
>> inside the device driver
>> since device driver doesn't allocate memory in a fast path and utilizes
>> the user allocated pools only.
>>
>> Here are two proposals for a solution:
>>
>>   1. We may provide a callback that would return a user TRUE if a give
>>  cluster has to be linearized and it should always be called before
>>  rte_eth_tx_burst(). Alternatively it may be called from inside the
>>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
>>  error code for a case when one of the clusters it's given has to be
>>  linearized.
>>   2. Another option is to allocate a mempool in the driver with the
>>  elements consuming a single page each (standard 2KB buffers would
>>  do). Number of elements in the pool should be as Tx ring length
>>  multiplied by "64KB/(linear data length of the buffer in the pool
>>  above)". Here I use 64KB as a maximum packet length and not taking
>>  into an account esoteric things like "Giant" TSO mentioned in the
>>  spec above. Then we may actually go and linearize the cluster if
>>  needed on top of the buffers from the pool above, post the buffer
>>  from the mempool above on the HW ring, link the original cluster to
>>  that new cluster (using the private data) and release it when the
>>  send is done.
> Or just silently drop heavily scattered packets (and increment oerrors)
> with a PMD_TX_LOG debug message.
>
> I think a DPDK driver doesn't have to accept all possible mbufs and do
> extra work. It seems reasonable to expect caller to be well behaved
> in this restricted ecosystem.
>

How can the caller know what's well behaved?  It's device dependent.




[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Jan Viktorin
OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list.
Much better now:

EAL: Requesting 64 pages of size 2MB from socket 0
EAL: TSC frequency is ~365 KHz
EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0])
EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1])
EAL: PCI device :03:00.0 on NUMA socket -1
EAL:   probe driver: 8086:1026 rte_em_pmd
EAL:   PCI memory mapped at 0x7fde44a0
EAL:   PCI memory mapped at 0x7fde44a2
PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026
EAL: PCI device :03:02.0 on NUMA socket -1
EAL:   probe driver: 8086:1026 rte_em_pmd
EAL:   Not managed by a supported kernel driver, skipped
Interactive-mode selected
EAL: Error - exiting with code: 1
  Cause: Creation of mbuf pool for socket 0 failed

I've tried both uio_pci_generic and igb_uio. Is there anything else
I can do about it?

Jan V.

On Thu, 30 Jul 2015 08:41:47 -0700
Ravi Kerur  wrote:

> On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin 
> wrote:
> 
> > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in
> > rte_pci_dev_ids.h/e1000_hw.h:
> >
> > 196 #define E1000_DEV_ID_82545GM_COPPER   0x1026
> >
> > $ lspci -nn
> > ...
> > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> > Ethernet Controller [8086:1026] (rev 04)
> > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> > Ethernet Controller [8086:1026] (rev 04)
> >
> > However, it is rev 04 and in e1000_hw.h there is just e1000_82545_rev_3.
> > But this should not avoid the match (?). Is it possible to grow the
> > verbosity
> > level of the device matching process in DPDK?
> >
> 
> Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add
> device-id via RTE_PCI_DEVEM_ID_DECL_EM.
> 
> >
> > I do not expect any support, I just wanted to use it for sending traffic
> > at 1 Gbps because there are two such cards mostly unused in my computer.
> > I did not plan to use I217-V (in fact, I did not expect much from this
> > integrated NIC and I did not even notice it is an Intel one...).
> >
> > Regards
> > Jan Viktorin
> >
> > On Thu, 30 Jul 2015 07:44:14 -0700
> > Ravi Kerur  wrote:
> >
> > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > thanks for reply. I could see those docs but it does not help me a lot.
> > > > I still do not understand very well the principle of the tool. How it
> > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and
> > > > testpmd. They have somehow opposite meaning. I can start testpmd now,
> > > > however, it does ot probe any NIC. I've tried -w to whitelist certain
> > > > NICs but with no success.
> > > >
> > > > $ dpdk_nic_bind --status
> > > >
> > > > Network devices using DPDK-compatible driver
> > > > 
> > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > > > unused=e1000
> > > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic
> > > > unused=e1000
> > > >
> > >
> > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check
> > the
> > > device-id.  Adding support in PMD should not be a problem, but I am not
> > > sure on support since there is End of Life listed on Intel Website
> > >
> > >
> > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller
> > >
> > >
> > > > Network devices using kernel driver
> > > > ===
> > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> > > > unused=uio_pci_generic *Active*
> > > >
> > >
> > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested
> > > currently.
> > >
> > >
> > > >
> > > > Other network devices
> > > > =
> > > > 
> > > >
> > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
> > > > EAL: Detected lcore 0 as core 0 on socket 0
> > > > EAL: Detected lcore 1 as core 1 on socket 0
> > > > EAL: Detected lcore 2 as core 0 on socket 0
> > > > EAL: Detected lcore 3 as core 1 on socket 0
> > > > EAL: Support maximum 128 logical core(s) by configuration.
> > > > EAL: Detected 4 lcore(s)
> > > > EAL: VFIO modules not all loaded, skip VFIO support...
> > > > EAL: Setting up physically contiguous memory...
> > > > EAL: Ask a virtual area of 0x3c0 bytes
> > > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0)
> > > > EAL: Ask a virtual area of 0x20 bytes
> > > > EAL: Virtual area found at 0x7fe973a0 (size = 0x20)
> > > > EAL: Ask a virtual area of 0x20 bytes
> > > > EAL: Virtual area found at 0x7fe97360 (size = 0x20)
> > > > EAL: Ask a virtual area of 0x3c0 bytes
> > > > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0)
> > > > EAL: Ask a virtual area of 0x20 bytes
> > > > EAL: Virtual area found at 0x7fe96f40 (size = 0x20)
> > > > EAL: Ask a virtual area of 0x20 bytes
> > > > EAL: Virtual area found at 0x7fe96f00 (size = 0x20)
> 

[dpdk-dev] [PATCH v2] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Olivier Matz
It looks better to have a data buffer address that is aligned to
8 bytes. This is the case when there is no mbuf private area, but
if there is one, the alignment depends on the size of this area
that is located between the mbuf structure and the data buffer.

Indeed, some drivers expects to have the buffer address aligned
to an even address, and moreover an unaligned buffer may impact
the performance when accessing to network headers.

Add a check in rte_pktmbuf_pool_create() to verify the alignment
constraint before creating the mempool. For applications that use
the alternative way (direct call to rte_mempool_create), also
add an assertion in rte_pktmbuf_init().

By the way, also add the MBUF log type.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/include/rte_log.h | 1 +
 lib/librte_mbuf/rte_mbuf.c  | 9 -
 lib/librte_mbuf/rte_mbuf.h  | 7 +--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 24a55cc..ede0dca 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -77,6 +77,7 @@ extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */
 #define RTE_LOGTYPE_TABLE   0x4000 /**< Log related to table. */
 #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
+#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */

 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 4320dd4..e416312 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * ctrlmbuf constructor, given as a callback function to
@@ -125,6 +126,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
mbuf_size = sizeof(struct rte_mbuf) + priv_size;
buf_len = rte_pktmbuf_data_room_size(mp);

+   RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size);
RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size);
RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);

@@ -154,7 +156,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;

-
+   if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
+   RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
+   priv_size);
+   rte_errno = EINVAL;
+   return NULL;
+   }
elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
(unsigned)data_room_size;
mbp_priv.mbuf_data_room_size = data_room_size;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 010b32d..c3b8c98 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -698,6 +698,9 @@ extern "C" {
  RTE_PTYPE_INNER_L4_MASK))
 #endif /* RTE_NEXT_ABI */

+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
 /**
  * Get the name of a RX offload flag
  *
@@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
*opaque_arg);
  *   details.
  * @param priv_size
  *   Size of application private are between the rte_mbuf structure
- *   and the data buffer.
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
  * @param data_room_size
  *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
  * @param socket_id
@@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void 
*opaque_arg);
  *   with rte_errno set appropriately. Possible rte_errno values include:
  *- E_RTE_NO_CONFIG - function could not get pointer to rte_config 
structure
  *- E_RTE_SECONDARY - function was called from a secondary process instance
- *- EINVAL - cache size provided is too large
+ *- EINVAL - cache size provided is too large, or priv_size is not aligned.
  *- ENOSPC - the maximum number of memzones has already been allocated
  *- EEXIST - a memzone with the same name already exists
  *- ENOMEM - no appropriate memory area found in which to create memzone
-- 
2.1.4



[dpdk-dev] [PACTH v2 2/2] mlx4: fix shared library dependency

2015-07-30 Thread Thomas Monjalon
2015-07-30 16:48, Nelio Laranjeiro:
> librte_pmd_mlx4.so needs to be linked with libiverbs otherwise, the PMD is not
> able to open Mellanox devices and the following message is printed by testpmd
> at startup "librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?".
> 
> Applications dependency on libverbs are moved to be only valid in static mode,
> in shared mode, applications do not depend on it anymore,
> librte_pmd_mlx4.so keeps this dependency and thus is linked with libverbs.
> 
> Signed-off-by: Nelio Laranjeiro 
> Acked-by: Olivier Matz 
> ---
> Changelog: don't compiled MLX4 PMD when the DPDK is build in combined shared
> library.

MLX4 cannot be supported in combined shared library because there is no clean
way of adding -libverbs to the combined library.
(This comment should be in the commit message)

> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -40,7 +40,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
>  DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
>  DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
>  DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
> -DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
>  DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap
> @@ -49,5 +48,10 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
>  
> +# Drivers not support in combined mode

This comment is useless.

> +ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)

It can be enabled if building a static combined library.

> +DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4

There is no good reason to move this line.



[dpdk-dev] [PATCH v2] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Zhang, Helin


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 9:22 AM
> To: dev at dpdk.org
> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin;
> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com
> Subject: [PATCH v2] mbuf: enforce alignment of mbuf private area
> 
> It looks better to have a data buffer address that is aligned to
> 8 bytes. This is the case when there is no mbuf private area, but if there is 
> one,
> the alignment depends on the size of this area that is located between the 
> mbuf
> structure and the data buffer.
> 
> Indeed, some drivers expects to have the buffer address aligned to an even
> address, and moreover an unaligned buffer may impact the performance when
> accessing to network headers.
> 
> Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint
> before creating the mempool. For applications that use the alternative way
> (direct call to rte_mempool_create), also add an assertion in 
> rte_pktmbuf_init().
> 
> By the way, also add the MBUF log type.
> 
> Signed-off-by: Olivier Matz 
Acked-by: Helin Zhang 


[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov


On 07/30/15 19:10, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Thursday, July 30, 2015 7:58 AM
>> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
>> Subject: RFC: i40e xmit path HW limitation
>>
>> Hi, Konstantin, Helin,
>> there is a documented limitation of xl710 controllers (i40e driver) which is 
>> not
>> handled in any way by a DPDK driver.
>>   From the datasheet chapter 8.4.1:
>>
>> "? A single transmit packet may span up to 8 buffers (up to 8 data 
>> descriptors per
>> packet including both the header and payload buffers).
>> ? The total number of data descriptors for the whole TSO (explained later on 
>> in
>> this chapter) is unlimited as long as each segment within the TSO obeys the
>> previous rule (up to 8 data descriptors per segment for both the TSO header 
>> and
>> the segment payload buffers)."
> Yes, I remember the RX side just supports 5 segments per packet receiving.
> But what's the possible issue you thought about?
Note that it's a Tx size we are talking about.

See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo.
If such a cluster arrives and you post it on the HW ring - HW will shut 
this HW ring down permanently. The application will see that it's ring 
is stuck.

>
>> This means that, for instance, long cluster with small fragments has to be
>> linearized before it may be placed on the HW ring.
> What type of size of the small fragments? Basically 2KB is the default size 
> of mbuf of most
> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the 
> maximum
> packet size we supported.
> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of 
> packet.

I kinda lost u here. Again, we talk about the Tx side here and buffers 
are not obligatory completely filled. Namely there may be a cluster with 
15 fragments 100 bytes each.

>
>> In more standard environments like Linux or FreeBSD drivers the solution is
>> straight forward - call skb_linearize()/m_collapse() corresponding.
>> In the non-conformist environment like DPDK life is not that easy - there is 
>> no
>> easy way to collapse the cluster into a linear buffer from inside the device 
>> driver
>> since device driver doesn't allocate memory in a fast path and utilizes the 
>> user
>> allocated pools only.
>> Here are two proposals for a solution:
>>
>>   1. We may provide a callback that would return a user TRUE if a give
>>  cluster has to be linearized and it should always be called before
>>  rte_eth_tx_burst(). Alternatively it may be called from inside the
>>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
>>  error code for a case when one of the clusters it's given has to be
>>  linearized.
>>   2. Another option is to allocate a mempool in the driver with the
>>  elements consuming a single page each (standard 2KB buffers would
>>  do). Number of elements in the pool should be as Tx ring length
>>  multiplied by "64KB/(linear data length of the buffer in the pool
>>  above)". Here I use 64KB as a maximum packet length and not taking
>>  into an account esoteric things like "Giant" TSO mentioned in the
>>  spec above. Then we may actually go and linearize the cluster if
>>  needed on top of the buffers from the pool above, post the buffer
>>  from the mempool above on the HW ring, link the original cluster to
>>  that new cluster (using the private data) and release it when the
>>  send is done.
>>
>>
>> The first is a change in the API and would require from the application some
>> additional handling (linearization). The second would require some additional
>> memory but would keep all dirty details inside the driver and would leave the
>> rest of the code intact.
>>
>> Pls., comment.
>>
>> thanks,
>> vlad
>>



[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov


On 07/30/15 19:20, Avi Kivity wrote:
>
>
> On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
>> On Thu, 30 Jul 2015 17:57:33 +0300
>> Vlad Zolotarov  wrote:
>>
>>> Hi, Konstantin, Helin,
>>> there is a documented limitation of xl710 controllers (i40e driver)
>>> which is not handled in any way by a DPDK driver.
>>>   From the datasheet chapter 8.4.1:
>>>
>>> "? A single transmit packet may span up to 8 buffers (up to 8 data 
>>> descriptors per packet including
>>> both the header and payload buffers).
>>> ? The total number of data descriptors for the whole TSO (explained 
>>> later on in this chapter) is
>>> unlimited as long as each segment within the TSO obeys the previous 
>>> rule (up to 8 data descriptors
>>> per segment for both the TSO header and the segment payload buffers)."
>>>
>>> This means that, for instance, long cluster with small fragments has to
>>> be linearized before it may be placed on the HW ring.
>>> In more standard environments like Linux or FreeBSD drivers the 
>>> solution
>>> is straight forward - call skb_linearize()/m_collapse() corresponding.
>>> In the non-conformist environment like DPDK life is not that easy -
>>> there is no easy way to collapse the cluster into a linear buffer from
>>> inside the device driver
>>> since device driver doesn't allocate memory in a fast path and utilizes
>>> the user allocated pools only.
>>>
>>> Here are two proposals for a solution:
>>>
>>>   1. We may provide a callback that would return a user TRUE if a give
>>>  cluster has to be linearized and it should always be called before
>>>  rte_eth_tx_burst(). Alternatively it may be called from inside the
>>>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return 
>>> some
>>>  error code for a case when one of the clusters it's given has 
>>> to be
>>>  linearized.
>>>   2. Another option is to allocate a mempool in the driver with the
>>>  elements consuming a single page each (standard 2KB buffers would
>>>  do). Number of elements in the pool should be as Tx ring length
>>>  multiplied by "64KB/(linear data length of the buffer in the pool
>>>  above)". Here I use 64KB as a maximum packet length and not taking
>>>  into an account esoteric things like "Giant" TSO mentioned in the
>>>  spec above. Then we may actually go and linearize the cluster if
>>>  needed on top of the buffers from the pool above, post the buffer
>>>  from the mempool above on the HW ring, link the original 
>>> cluster to
>>>  that new cluster (using the private data) and release it when the
>>>  send is done.
>> Or just silently drop heavily scattered packets (and increment oerrors)
>> with a PMD_TX_LOG debug message.
>>
>> I think a DPDK driver doesn't have to accept all possible mbufs and do
>> extra work. It seems reasonable to expect caller to be well behaved
>> in this restricted ecosystem.
>>
>
> How can the caller know what's well behaved?  It's device dependent.

+1

Stephen, how do you imagine this well-behaved application? Having switch 
case by an underlying device type and then "well-behaving" correspondingly?
Not to mention that to "well-behave" the application writer has to read 
HW specs and understand them, which would limit the amount of DPDK 
developers to a very small amount of people... ;) Not to mention that 
the mentioned above switch-case would be a super ugly thing to be found 
in an application that would raise a big question about the 
justification of a DPDK existence as as SDK providing device drivers 
interface. ;)

>
>



[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Stephen Hemminger
On Thu, 30 Jul 2015 19:50:27 +0300
Vlad Zolotarov  wrote:

> 
> 
> On 07/30/15 19:20, Avi Kivity wrote:
> >
> >
> > On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
> >> On Thu, 30 Jul 2015 17:57:33 +0300
> >> Vlad Zolotarov  wrote:
> >>
> >>> Hi, Konstantin, Helin,
> >>> there is a documented limitation of xl710 controllers (i40e driver)
> >>> which is not handled in any way by a DPDK driver.
> >>>   From the datasheet chapter 8.4.1:
> >>>
> >>> "? A single transmit packet may span up to 8 buffers (up to 8 data 
> >>> descriptors per packet including
> >>> both the header and payload buffers).
> >>> ? The total number of data descriptors for the whole TSO (explained 
> >>> later on in this chapter) is
> >>> unlimited as long as each segment within the TSO obeys the previous 
> >>> rule (up to 8 data descriptors
> >>> per segment for both the TSO header and the segment payload buffers)."
> >>>
> >>> This means that, for instance, long cluster with small fragments has to
> >>> be linearized before it may be placed on the HW ring.
> >>> In more standard environments like Linux or FreeBSD drivers the 
> >>> solution
> >>> is straight forward - call skb_linearize()/m_collapse() corresponding.
> >>> In the non-conformist environment like DPDK life is not that easy -
> >>> there is no easy way to collapse the cluster into a linear buffer from
> >>> inside the device driver
> >>> since device driver doesn't allocate memory in a fast path and utilizes
> >>> the user allocated pools only.
> >>>
> >>> Here are two proposals for a solution:
> >>>
> >>>   1. We may provide a callback that would return a user TRUE if a give
> >>>  cluster has to be linearized and it should always be called before
> >>>  rte_eth_tx_burst(). Alternatively it may be called from inside the
> >>>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return 
> >>> some
> >>>  error code for a case when one of the clusters it's given has 
> >>> to be
> >>>  linearized.
> >>>   2. Another option is to allocate a mempool in the driver with the
> >>>  elements consuming a single page each (standard 2KB buffers would
> >>>  do). Number of elements in the pool should be as Tx ring length
> >>>  multiplied by "64KB/(linear data length of the buffer in the pool
> >>>  above)". Here I use 64KB as a maximum packet length and not taking
> >>>  into an account esoteric things like "Giant" TSO mentioned in the
> >>>  spec above. Then we may actually go and linearize the cluster if
> >>>  needed on top of the buffers from the pool above, post the buffer
> >>>  from the mempool above on the HW ring, link the original 
> >>> cluster to
> >>>  that new cluster (using the private data) and release it when the
> >>>  send is done.
> >> Or just silently drop heavily scattered packets (and increment oerrors)
> >> with a PMD_TX_LOG debug message.
> >>
> >> I think a DPDK driver doesn't have to accept all possible mbufs and do
> >> extra work. It seems reasonable to expect caller to be well behaved
> >> in this restricted ecosystem.
> >>
> >
> > How can the caller know what's well behaved?  It's device dependent.
> 
> +1
> 
> Stephen, how do you imagine this well-behaved application? Having switch 
> case by an underlying device type and then "well-behaving" correspondingly?
> Not to mention that to "well-behave" the application writer has to read 
> HW specs and understand them, which would limit the amount of DPDK 
> developers to a very small amount of people... ;) Not to mention that 
> the mentioned above switch-case would be a super ugly thing to be found 
> in an application that would raise a big question about the 
> justification of a DPDK existence as as SDK providing device drivers 
> interface. ;)

Either have a RTE_MAX_MBUF_SEGMENTS that is global or
a mbuf_linearize function?  Driver already can stash the
mbuf pool used for Rx and reuse it for the transient Tx buffers.



[dpdk-dev] [PATCH v2] enic: silence log message unless debug enabled

2015-07-30 Thread Stephen Hemminger
This blocks the annoying ENIC driver initialization message unless
debug is enabled. Drivers should speak only when spoken to and not
be chatty.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/enic/enic_compat.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/enic/enic_compat.h b/drivers/net/enic/enic_compat.h
index f3598ed..94656c8 100644
--- a/drivers/net/enic/enic_compat.h
+++ b/drivers/net/enic/enic_compat.h
@@ -82,7 +82,11 @@
 #define dev_err(x, args...) dev_printk(ERR, args)
 #define dev_info(x, args...) dev_printk(INFO,  args)
 #define dev_warning(x, args...) dev_printk(WARNING, args)
+#ifdef RTE_LIBRTE_ENIC_DEBUG
 #define dev_debug(x, args...) dev_printk(DEBUG, args)
+#else
+#define dev_debug(x, args...) do { } while(0)
+#endif

 #define __le16 u16
 #define __le32 u32
-- 
2.1.4



[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Ravi Kerur
On Thu, Jul 30, 2015 at 9:19 AM, Jan Viktorin 
wrote:

> OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list.
> Much better now:
>
> EAL: Requesting 64 pages of size 2MB from socket 0
> EAL: TSC frequency is ~365 KHz
> EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0])
> EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1])
> EAL: PCI device :03:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:1026 rte_em_pmd
> EAL:   PCI memory mapped at 0x7fde44a0
> EAL:   PCI memory mapped at 0x7fde44a2
> PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026
> EAL: PCI device :03:02.0 on NUMA socket -1
> EAL:   probe driver: 8086:1026 rte_em_pmd
> EAL:   Not managed by a supported kernel driver, skipped
> Interactive-mode selected
> EAL: Error - exiting with code: 1
>   Cause: Creation of mbuf pool for socket 0 failed
>
> I've tried both uio_pci_generic and igb_uio. Is there anything else
> I can do about it?
>

I am attaching patch from M Jay/Cuming (Intel engineers), it's not yet
integrated into DPDK mainline. You need it to fix above mbuf error.

>
> Jan V.
>
> On Thu, 30 Jul 2015 08:41:47 -0700
> Ravi Kerur  wrote:
>
> > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin 
> > wrote:
> >
> > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in
> > > rte_pci_dev_ids.h/e1000_hw.h:
> > >
> > > 196 #define E1000_DEV_ID_82545GM_COPPER   0x1026
> > >
> > > $ lspci -nn
> > > ...
> > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> > > Ethernet Controller [8086:1026] (rev 04)
> > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> > > Ethernet Controller [8086:1026] (rev 04)
> > >
> > > However, it is rev 04 and in e1000_hw.h there is just
> e1000_82545_rev_3.
> > > But this should not avoid the match (?). Is it possible to grow the
> > > verbosity
> > > level of the device matching process in DPDK?
> > >
> >
> > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add
> > device-id via RTE_PCI_DEVEM_ID_DECL_EM.
> >
> > >
> > > I do not expect any support, I just wanted to use it for sending
> traffic
> > > at 1 Gbps because there are two such cards mostly unused in my
> computer.
> > > I did not plan to use I217-V (in fact, I did not expect much from this
> > > integrated NIC and I did not even notice it is an Intel one...).
> > >
> > > Regards
> > > Jan Viktorin
> > >
> > > On Thu, 30 Jul 2015 07:44:14 -0700
> > > Ravi Kerur  wrote:
> > >
> > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin <
> viktorin at rehivetech.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > thanks for reply. I could see those docs but it does not help me a
> lot.
> > > > > I still do not understand very well the principle of the tool. How
> it
> > > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind
> and
> > > > > testpmd. They have somehow opposite meaning. I can start testpmd
> now,
> > > > > however, it does ot probe any NIC. I've tried -w to whitelist
> certain
> > > > > NICs but with no success.
> > > > >
> > > > > $ dpdk_nic_bind --status
> > > > >
> > > > > Network devices using DPDK-compatible driver
> > > > > 
> > > > > :03:00.0 '82545GM Gigabit Ethernet Controller'
> drv=uio_pci_generic
> > > > > unused=e1000
> > > > > :03:02.0 '82545GM Gigabit Ethernet Controller'
> drv=uio_pci_generic
> > > > > unused=e1000
> > > > >
> > > >
> > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and
> check
> > > the
> > > > device-id.  Adding support in PMD should not be a problem, but I am
> not
> > > > sure on support since there is End of Life listed on Intel Website
> > > >
> > > >
> > >
> http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller
> > > >
> > > >
> > > > > Network devices using kernel driver
> > > > > ===
> > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> > > > > unused=uio_pci_generic *Active*
> > > > >
> > > >
> > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being
> tested
> > > > currently.
> > > >
> > > >
> > > > >
> > > > > Other network devices
> > > > > =
> > > > > 
> > > > >
> > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
> > > > > EAL: Detected lcore 0 as core 0 on socket 0
> > > > > EAL: Detected lcore 1 as core 1 on socket 0
> > > > > EAL: Detected lcore 2 as core 0 on socket 0
> > > > > EAL: Detected lcore 3 as core 1 on socket 0
> > > > > EAL: Support maximum 128 logical core(s) by configuration.
> > > > > EAL: Detected 4 lcore(s)
> > > > > EAL: VFIO modules not all loaded, skip VFIO support...
> > > > > EAL: Setting up physically contiguous memory...
> > > > > EAL: Ask a virtual area of 0x3c0 bytes
> > > > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0)
> > > > > EAL: Ask a virtual area of 0x20 bytes
> > > > > EAL: Virtual area found at 0x7fe973a0 (

[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Ravi Kerur
On Thu, Jul 30, 2015 at 10:06 AM, Ravi Kerur  wrote:

>
>
> On Thu, Jul 30, 2015 at 9:19 AM, Jan Viktorin 
> wrote:
>
>> OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list.
>> Much better now:
>>
>> EAL: Requesting 64 pages of size 2MB from socket 0
>> EAL: TSC frequency is ~365 KHz
>> EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0])
>> EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1])
>> EAL: PCI device :03:00.0 on NUMA socket -1
>> EAL:   probe driver: 8086:1026 rte_em_pmd
>> EAL:   PCI memory mapped at 0x7fde44a0
>> EAL:   PCI memory mapped at 0x7fde44a2
>> PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026
>> EAL: PCI device :03:02.0 on NUMA socket -1
>> EAL:   probe driver: 8086:1026 rte_em_pmd
>> EAL:   Not managed by a supported kernel driver, skipped
>> Interactive-mode selected
>> EAL: Error - exiting with code: 1
>>   Cause: Creation of mbuf pool for socket 0 failed
>>
>> I've tried both uio_pci_generic and igb_uio. Is there anything else
>> I can do about it?
>>
>
> I am attaching patch from M Jay/Cuming (Intel engineers), it's not yet
> integrated into DPDK mainline. You need it to fix above mbuf error.
>

In addition if you are allocating too little hugepages it can cause issues
esp 64. I usually allocate > 1024 hugepages.

>
>> Jan V.
>>
>> On Thu, 30 Jul 2015 08:41:47 -0700
>> Ravi Kerur  wrote:
>>
>> > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin 
>> > wrote:
>> >
>> > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in
>> > > rte_pci_dev_ids.h/e1000_hw.h:
>> > >
>> > > 196 #define E1000_DEV_ID_82545GM_COPPER   0x1026
>> > >
>> > > $ lspci -nn
>> > > ...
>> > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
>> > > Ethernet Controller [8086:1026] (rev 04)
>> > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
>> > > Ethernet Controller [8086:1026] (rev 04)
>> > >
>> > > However, it is rev 04 and in e1000_hw.h there is just
>> e1000_82545_rev_3.
>> > > But this should not avoid the match (?). Is it possible to grow the
>> > > verbosity
>> > > level of the device matching process in DPDK?
>> > >
>> >
>> > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add
>> > device-id via RTE_PCI_DEVEM_ID_DECL_EM.
>> >
>> > >
>> > > I do not expect any support, I just wanted to use it for sending
>> traffic
>> > > at 1 Gbps because there are two such cards mostly unused in my
>> computer.
>> > > I did not plan to use I217-V (in fact, I did not expect much from this
>> > > integrated NIC and I did not even notice it is an Intel one...).
>> > >
>> > > Regards
>> > > Jan Viktorin
>> > >
>> > > On Thu, 30 Jul 2015 07:44:14 -0700
>> > > Ravi Kerur  wrote:
>> > >
>> > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin <
>> viktorin at rehivetech.com>
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > thanks for reply. I could see those docs but it does not help me
>> a lot.
>> > > > > I still do not understand very well the principle of the tool.
>> How it
>> > > > > chooses the NICs to use? Previously I confused -b in
>> dpdk_nic_bind and
>> > > > > testpmd. They have somehow opposite meaning. I can start testpmd
>> now,
>> > > > > however, it does ot probe any NIC. I've tried -w to whitelist
>> certain
>> > > > > NICs but with no success.
>> > > > >
>> > > > > $ dpdk_nic_bind --status
>> > > > >
>> > > > > Network devices using DPDK-compatible driver
>> > > > > 
>> > > > > :03:00.0 '82545GM Gigabit Ethernet Controller'
>> drv=uio_pci_generic
>> > > > > unused=e1000
>> > > > > :03:02.0 '82545GM Gigabit Ethernet Controller'
>> drv=uio_pci_generic
>> > > > > unused=e1000
>> > > > >
>> > > >
>> > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and
>> check
>> > > the
>> > > > device-id.  Adding support in PMD should not be a problem, but I am
>> not
>> > > > sure on support since there is End of Life listed on Intel Website
>> > > >
>> > > >
>> > >
>> http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller
>> > > >
>> > > >
>> > > > > Network devices using kernel driver
>> > > > > ===
>> > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
>> > > > > unused=uio_pci_generic *Active*
>> > > > >
>> > > >
>> > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being
>> tested
>> > > > currently.
>> > > >
>> > > >
>> > > > >
>> > > > > Other network devices
>> > > > > =
>> > > > > 
>> > > > >
>> > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
>> > > > > EAL: Detected lcore 0 as core 0 on socket 0
>> > > > > EAL: Detected lcore 1 as core 1 on socket 0
>> > > > > EAL: Detected lcore 2 as core 0 on socket 0
>> > > > > EAL: Detected lcore 3 as core 1 on socket 0
>> > > > > EAL: Support maximum 128 logical core(s) by configuration.
>> > > > > EAL: Detected 4 lcore(s)
>> > > > > EAL: VFIO modules n

[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov


On 07/30/15 20:01, Stephen Hemminger wrote:
> On Thu, 30 Jul 2015 19:50:27 +0300
> Vlad Zolotarov  wrote:
>
>>
>> On 07/30/15 19:20, Avi Kivity wrote:
>>>
>>> On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
 On Thu, 30 Jul 2015 17:57:33 +0300
 Vlad Zolotarov  wrote:

> Hi, Konstantin, Helin,
> there is a documented limitation of xl710 controllers (i40e driver)
> which is not handled in any way by a DPDK driver.
>From the datasheet chapter 8.4.1:
>
> "? A single transmit packet may span up to 8 buffers (up to 8 data
> descriptors per packet including
> both the header and payload buffers).
> ? The total number of data descriptors for the whole TSO (explained
> later on in this chapter) is
> unlimited as long as each segment within the TSO obeys the previous
> rule (up to 8 data descriptors
> per segment for both the TSO header and the segment payload buffers)."
>
> This means that, for instance, long cluster with small fragments has to
> be linearized before it may be placed on the HW ring.
> In more standard environments like Linux or FreeBSD drivers the
> solution
> is straight forward - call skb_linearize()/m_collapse() corresponding.
> In the non-conformist environment like DPDK life is not that easy -
> there is no easy way to collapse the cluster into a linear buffer from
> inside the device driver
> since device driver doesn't allocate memory in a fast path and utilizes
> the user allocated pools only.
>
> Here are two proposals for a solution:
>
>1. We may provide a callback that would return a user TRUE if a give
>   cluster has to be linearized and it should always be called before
>   rte_eth_tx_burst(). Alternatively it may be called from inside the
>   rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return
> some
>   error code for a case when one of the clusters it's given has
> to be
>   linearized.
>2. Another option is to allocate a mempool in the driver with the
>   elements consuming a single page each (standard 2KB buffers would
>   do). Number of elements in the pool should be as Tx ring length
>   multiplied by "64KB/(linear data length of the buffer in the pool
>   above)". Here I use 64KB as a maximum packet length and not taking
>   into an account esoteric things like "Giant" TSO mentioned in the
>   spec above. Then we may actually go and linearize the cluster if
>   needed on top of the buffers from the pool above, post the buffer
>   from the mempool above on the HW ring, link the original
> cluster to
>   that new cluster (using the private data) and release it when the
>   send is done.
 Or just silently drop heavily scattered packets (and increment oerrors)
 with a PMD_TX_LOG debug message.

 I think a DPDK driver doesn't have to accept all possible mbufs and do
 extra work. It seems reasonable to expect caller to be well behaved
 in this restricted ecosystem.

>>> How can the caller know what's well behaved?  It's device dependent.
>> +1
>>
>> Stephen, how do you imagine this well-behaved application? Having switch
>> case by an underlying device type and then "well-behaving" correspondingly?
>> Not to mention that to "well-behave" the application writer has to read
>> HW specs and understand them, which would limit the amount of DPDK
>> developers to a very small amount of people... ;) Not to mention that
>> the mentioned above switch-case would be a super ugly thing to be found
>> in an application that would raise a big question about the
>> justification of a DPDK existence as as SDK providing device drivers
>> interface. ;)
> Either have a RTE_MAX_MBUF_SEGMENTS

And what would it be in our care? 8? This would limit the maximum TSO 
packet to 16KB for 2KB buffers.

> that is global or
> a mbuf_linearize function?  Driver already can stash the
> mbuf pool used for Rx and reuse it for the transient Tx buffers.
First of all who can guaranty that that pool would meet our needs - 
namely have large enough buffers?
Secondly, using user's Rx mempool for that would be really not nice 
(read - dirty) towards the user that may had allocated the specific 
amount of buffers in it according to some calculations that didn't 
include the usage from the Tx flow.

And lastly and most importantly, this would require using the atomic 
operations during access to Rx mempool, that would both require a 
specific mempool initialization and would significantly hit the 
performance.


>



[dpdk-dev] RFC: i40e xmit path HW limitation

2015-07-30 Thread Avi Kivity
On 07/30/2015 08:01 PM, Stephen Hemminger wrote:
> On Thu, 30 Jul 2015 19:50:27 +0300
> Vlad Zolotarov  wrote:
>
>>
>> On 07/30/15 19:20, Avi Kivity wrote:
>>>
>>> On 07/30/2015 07:17 PM, Stephen Hemminger wrote:
 On Thu, 30 Jul 2015 17:57:33 +0300
 Vlad Zolotarov  wrote:

> Hi, Konstantin, Helin,
> there is a documented limitation of xl710 controllers (i40e driver)
> which is not handled in any way by a DPDK driver.
>From the datasheet chapter 8.4.1:
>
> "? A single transmit packet may span up to 8 buffers (up to 8 data
> descriptors per packet including
> both the header and payload buffers).
> ? The total number of data descriptors for the whole TSO (explained
> later on in this chapter) is
> unlimited as long as each segment within the TSO obeys the previous
> rule (up to 8 data descriptors
> per segment for both the TSO header and the segment payload buffers)."
>
> This means that, for instance, long cluster with small fragments has to
> be linearized before it may be placed on the HW ring.
> In more standard environments like Linux or FreeBSD drivers the
> solution
> is straight forward - call skb_linearize()/m_collapse() corresponding.
> In the non-conformist environment like DPDK life is not that easy -
> there is no easy way to collapse the cluster into a linear buffer from
> inside the device driver
> since device driver doesn't allocate memory in a fast path and utilizes
> the user allocated pools only.
>
> Here are two proposals for a solution:
>
>1. We may provide a callback that would return a user TRUE if a give
>   cluster has to be linearized and it should always be called before
>   rte_eth_tx_burst(). Alternatively it may be called from inside the
>   rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return
> some
>   error code for a case when one of the clusters it's given has
> to be
>   linearized.
>2. Another option is to allocate a mempool in the driver with the
>   elements consuming a single page each (standard 2KB buffers would
>   do). Number of elements in the pool should be as Tx ring length
>   multiplied by "64KB/(linear data length of the buffer in the pool
>   above)". Here I use 64KB as a maximum packet length and not taking
>   into an account esoteric things like "Giant" TSO mentioned in the
>   spec above. Then we may actually go and linearize the cluster if
>   needed on top of the buffers from the pool above, post the buffer
>   from the mempool above on the HW ring, link the original
> cluster to
>   that new cluster (using the private data) and release it when the
>   send is done.
 Or just silently drop heavily scattered packets (and increment oerrors)
 with a PMD_TX_LOG debug message.

 I think a DPDK driver doesn't have to accept all possible mbufs and do
 extra work. It seems reasonable to expect caller to be well behaved
 in this restricted ecosystem.

>>> How can the caller know what's well behaved?  It's device dependent.
>> +1
>>
>> Stephen, how do you imagine this well-behaved application? Having switch
>> case by an underlying device type and then "well-behaving" correspondingly?
>> Not to mention that to "well-behave" the application writer has to read
>> HW specs and understand them, which would limit the amount of DPDK
>> developers to a very small amount of people... ;) Not to mention that
>> the mentioned above switch-case would be a super ugly thing to be found
>> in an application that would raise a big question about the
>> justification of a DPDK existence as as SDK providing device drivers
>> interface. ;)
> Either have a RTE_MAX_MBUF_SEGMENTS that is global or
> a mbuf_linearize function?  Driver already can stash the
> mbuf pool used for Rx and reuse it for the transient Tx buffers.
>

The pass/fail criteria is much more complicated than that.  You might 
have a packet with 340 fragments successfully transmitted (64k/1500*8) 
or a packet with 9 fragments fail.

What's wrong with exposing the pass/fail criteria as a driver-supplied 
function?  If the application is sure that its mbufs pass, it can choose 
not to call it.  A less constrained application will call it, and 
linearize the packet itself if it fails the test.



[dpdk-dev] lost when learning how to test dpdk

2015-07-30 Thread Jan Viktorin
Thank you. I think the patch did not help. I applied and the error was
still there. After setting 1024 hugepages, it starts working.

Jan V.

On Thu, 30 Jul 2015 10:06:09 -0700
Ravi Kerur  wrote:

> On Thu, Jul 30, 2015 at 9:19 AM, Jan Viktorin 
> wrote:
> 
> > OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list.
> > Much better now:
> >
> > EAL: Requesting 64 pages of size 2MB from socket 0
> > EAL: TSC frequency is ~365 KHz
> > EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0])
> > EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1])
> > EAL: PCI device :03:00.0 on NUMA socket -1
> > EAL:   probe driver: 8086:1026 rte_em_pmd
> > EAL:   PCI memory mapped at 0x7fde44a0
> > EAL:   PCI memory mapped at 0x7fde44a2
> > PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026
> > EAL: PCI device :03:02.0 on NUMA socket -1
> > EAL:   probe driver: 8086:1026 rte_em_pmd
> > EAL:   Not managed by a supported kernel driver, skipped
> > Interactive-mode selected
> > EAL: Error - exiting with code: 1
> >   Cause: Creation of mbuf pool for socket 0 failed
> >
> > I've tried both uio_pci_generic and igb_uio. Is there anything else
> > I can do about it?
> >
> 
> I am attaching patch from M Jay/Cuming (Intel engineers), it's not yet
> integrated into DPDK mainline. You need it to fix above mbuf error.
> 
> >
> > Jan V.
> >
> > On Thu, 30 Jul 2015 08:41:47 -0700
> > Ravi Kerur  wrote:
> >
> > > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin 
> > > wrote:
> > >
> > > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in
> > > > rte_pci_dev_ids.h/e1000_hw.h:
> > > >
> > > > 196 #define E1000_DEV_ID_82545GM_COPPER   0x1026
> > > >
> > > > $ lspci -nn
> > > > ...
> > > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> > > > Ethernet Controller [8086:1026] (rev 04)
> > > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit
> > > > Ethernet Controller [8086:1026] (rev 04)
> > > >
> > > > However, it is rev 04 and in e1000_hw.h there is just
> > e1000_82545_rev_3.
> > > > But this should not avoid the match (?). Is it possible to grow the
> > > > verbosity
> > > > level of the device matching process in DPDK?
> > > >
> > >
> > > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add
> > > device-id via RTE_PCI_DEVEM_ID_DECL_EM.
> > >
> > > >
> > > > I do not expect any support, I just wanted to use it for sending
> > traffic
> > > > at 1 Gbps because there are two such cards mostly unused in my
> > computer.
> > > > I did not plan to use I217-V (in fact, I did not expect much from this
> > > > integrated NIC and I did not even notice it is an Intel one...).
> > > >
> > > > Regards
> > > > Jan Viktorin
> > > >
> > > > On Thu, 30 Jul 2015 07:44:14 -0700
> > > > Ravi Kerur  wrote:
> > > >
> > > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin <
> > viktorin at rehivetech.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > thanks for reply. I could see those docs but it does not help me a
> > lot.
> > > > > > I still do not understand very well the principle of the tool. How
> > it
> > > > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind
> > and
> > > > > > testpmd. They have somehow opposite meaning. I can start testpmd
> > now,
> > > > > > however, it does ot probe any NIC. I've tried -w to whitelist
> > certain
> > > > > > NICs but with no success.
> > > > > >
> > > > > > $ dpdk_nic_bind --status
> > > > > >
> > > > > > Network devices using DPDK-compatible driver
> > > > > > 
> > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller'
> > drv=uio_pci_generic
> > > > > > unused=e1000
> > > > > > :03:02.0 '82545GM Gigabit Ethernet Controller'
> > drv=uio_pci_generic
> > > > > > unused=e1000
> > > > > >
> > > > >
> > > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and
> > check
> > > > the
> > > > > device-id.  Adding support in PMD should not be a problem, but I am
> > not
> > > > > sure on support since there is End of Life listed on Intel Website
> > > > >
> > > > >
> > > >
> > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller
> > > > >
> > > > >
> > > > > > Network devices using kernel driver
> > > > > > ===
> > > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e
> > > > > > unused=uio_pci_generic *Active*
> > > > > >
> > > > >
> > > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being
> > tested
> > > > > currently.
> > > > >
> > > > >
> > > > > >
> > > > > > Other network devices
> > > > > > =
> > > > > > 
> > > > > >
> > > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048
> > > > > > EAL: Detected lcore 0 as core 0 on socket 0
> > > > > > EAL: Detected lcore 1 as core 1 on socket 0
> > > > > > EAL: Detected lcore 2 as core 0 on socket 0
> > > > > > EAL: Detected lcore 3 as core

[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Zhang, Helin


> -Original Message-
> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Thursday, July 30, 2015 9:44 AM
> To: Zhang, Helin; Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: i40e xmit path HW limitation
> 
> 
> 
> On 07/30/15 19:10, Zhang, Helin wrote:
> >
> >> -Original Message-
> >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> >> Sent: Thursday, July 30, 2015 7:58 AM
> >> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
> >> Subject: RFC: i40e xmit path HW limitation
> >>
> >> Hi, Konstantin, Helin,
> >> there is a documented limitation of xl710 controllers (i40e driver)
> >> which is not handled in any way by a DPDK driver.
> >>   From the datasheet chapter 8.4.1:
> >>
> >> "? A single transmit packet may span up to 8 buffers (up to 8 data
> >> descriptors per packet including both the header and payload buffers).
> >> ? The total number of data descriptors for the whole TSO (explained
> >> later on in this chapter) is unlimited as long as each segment within
> >> the TSO obeys the previous rule (up to 8 data descriptors per segment
> >> for both the TSO header and the segment payload buffers)."
> > Yes, I remember the RX side just supports 5 segments per packet receiving.
> > But what's the possible issue you thought about?
> Note that it's a Tx size we are talking about.
> 
> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo.
> If such a cluster arrives and you post it on the HW ring - HW will shut this 
> HW ring
> down permanently. The application will see that it's ring is stuck.
That issue was because of using more than 8 descriptors for a packet for TSO.

> 
> >
> >> This means that, for instance, long cluster with small fragments has to be
> >> linearized before it may be placed on the HW ring.
> > What type of size of the small fragments? Basically 2KB is the default size 
> > of
> mbuf of most
> > example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the
> maximum
> > packet size we supported.
> > If 1KB mbuf is used, don't expect it can transmit more than 8KB size of 
> > packet.
> 
> I kinda lost u here. Again, we talk about the Tx side here and buffers
> are not obligatory completely filled. Namely there may be a cluster with
> 15 fragments 100 bytes each.
The root cause is using more than 8 descriptors for a packet. Linux driver can 
help
on reducing number of descriptors to be used by merging small size of payload
together, right?
It is not for TSO, it is just for packet transmitting. 2 options in my mind:
1. Use should ensure it will not use more than 8 descriptors per packet for 
transmitting.
2. DPDK driver should try to merge small packet together for such case, like 
Linux kernel driver.
I prefer to use option 1, users should ensure that in the application or up 
layer software,
and keep the PMD driver as simple as possible.

But I have a thought that the maximum number of RX/TX descriptor should be able 
to be
queried somewhere.

Regards,
Helin
> 
> >
> >> In more standard environments like Linux or FreeBSD drivers the solution is
> >> straight forward - call skb_linearize()/m_collapse() corresponding.
> >> In the non-conformist environment like DPDK life is not that easy - there 
> >> is no
> >> easy way to collapse the cluster into a linear buffer from inside the 
> >> device
> driver
> >> since device driver doesn't allocate memory in a fast path and utilizes 
> >> the user
> >> allocated pools only.
> >> Here are two proposals for a solution:
> >>
> >>   1. We may provide a callback that would return a user TRUE if a give
> >>  cluster has to be linearized and it should always be called before
> >>  rte_eth_tx_burst(). Alternatively it may be called from inside the
> >>  rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
> >>  error code for a case when one of the clusters it's given has to be
> >>  linearized.
> >>   2. Another option is to allocate a mempool in the driver with the
> >>  elements consuming a single page each (standard 2KB buffers would
> >>  do). Number of elements in the pool should be as Tx ring length
> >>  multiplied by "64KB/(linear data length of the buffer in the pool
> >>  above)". Here I use 64KB as a maximum packet length and not taking
> >>  into an account esoteric things like "Giant" TSO mentioned in the
> >>  spec above. Then we may actually go and linearize the cluster if
> >>  needed on top of the buffers from the pool above, post the buffer
> >>  from the mempool above on the HW ring, link the original cluster to
> >>  that new cluster (using the private data) and release it when the
> >>  send is done.
> >>
> >>
> >> The first is a change in the API and would require from the application 
> >> some
> >> additional handling (linearization). The second would require some 
> >> additional
> >> memory but would keep all dirty details inside the driver and would le

[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Vlad Zolotarov


On 07/30/15 20:33, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Thursday, July 30, 2015 9:44 AM
>> To: Zhang, Helin; Ananyev, Konstantin
>> Cc: dev at dpdk.org
>> Subject: Re: i40e xmit path HW limitation
>>
>>
>>
>> On 07/30/15 19:10, Zhang, Helin wrote:
 -Original Message-
 From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
 Sent: Thursday, July 30, 2015 7:58 AM
 To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
 Subject: RFC: i40e xmit path HW limitation

 Hi, Konstantin, Helin,
 there is a documented limitation of xl710 controllers (i40e driver)
 which is not handled in any way by a DPDK driver.
From the datasheet chapter 8.4.1:

 "? A single transmit packet may span up to 8 buffers (up to 8 data
 descriptors per packet including both the header and payload buffers).
 ? The total number of data descriptors for the whole TSO (explained
 later on in this chapter) is unlimited as long as each segment within
 the TSO obeys the previous rule (up to 8 data descriptors per segment
 for both the TSO header and the segment payload buffers)."
>>> Yes, I remember the RX side just supports 5 segments per packet receiving.
>>> But what's the possible issue you thought about?
>> Note that it's a Tx size we are talking about.
>>
>> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo.
>> If such a cluster arrives and you post it on the HW ring - HW will shut this 
>> HW ring
>> down permanently. The application will see that it's ring is stuck.
> That issue was because of using more than 8 descriptors for a packet for TSO.

There is no problem in transmitting the TSO packet with more than 8 
fragments.
On the opposite - one can't transmit a non-TSO packet with more than 8 
fragments.
One also can't transmit the TSO packet that would contain more than 8 
fragments in a single TSO segment including the TSO headers.

Pls., read the HW spec as I quoted above for more details.

>
 This means that, for instance, long cluster with small fragments has to be
 linearized before it may be placed on the HW ring.
>>> What type of size of the small fragments? Basically 2KB is the default size 
>>> of
>> mbuf of most
>>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the
>> maximum
>>> packet size we supported.
>>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of 
>>> packet.
>> I kinda lost u here. Again, we talk about the Tx side here and buffers
>> are not obligatory completely filled. Namely there may be a cluster with
>> 15 fragments 100 bytes each.
> The root cause is using more than 8 descriptors for a packet.

That would be if u would like to SUPER simplify the HW limitation above. 
In that case u would significantly limit the different packets that may 
be sent without the linearization.

> Linux driver can help
> on reducing number of descriptors to be used by merging small size of payload
> together, right?
> It is not for TSO, it is just for packet transmitting. 2 options in my mind:
> 1. Use should ensure it will not use more than 8 descriptors per packet for 
> transmitting.

This requirement is too restricting. Pls., see above.

> 2. DPDK driver should try to merge small packet together for such case, like 
> Linux kernel driver.
> I prefer to use option 1, users should ensure that in the application or up 
> layer software,
> and keep the PMD driver as simple as possible.

The above statement is super confusing: on the one hand u suggest the 
DPDK driver to merge the small packet (fragments?) together (how?) and 
then u immediately propose the user application to do that. Could u, 
pls., clarify what exactly u suggest here?
If that's to leave it to the application - note that it would demand 
patching all existing DPDK applications that send TCP packets.

>
> But I have a thought that the maximum number of RX/TX descriptor should be 
> able to be
> queried somewhere.

There is no such thing as maximum number of Tx fragments in a TSO case. 
It's only limited by the Tx ring size.

>
> Regards,
> Helin
 In more standard environments like Linux or FreeBSD drivers the solution is
 straight forward - call skb_linearize()/m_collapse() corresponding.
 In the non-conformist environment like DPDK life is not that easy - there 
 is no
 easy way to collapse the cluster into a linear buffer from inside the 
 device
>> driver
 since device driver doesn't allocate memory in a fast path and utilizes 
 the user
 allocated pools only.
 Here are two proposals for a solution:

1. We may provide a callback that would return a user TRUE if a give
   cluster has to be linearized and it should always be called before
   rte_eth_tx_burst(). Alternatively it may be called from inside the
   rte_eth_tx_burst() and rte_e

[dpdk-dev] [PATCH v3 0/6] log de-spamming

2015-07-30 Thread Thomas Monjalon
2015-07-09 16:01, Stephen Hemminger:
> From: Stephen Hemminger 
> 
> These patches were sent earlier, updated to current tree.
> 
> They make Intel drivers not spam the log with information
> messages that cause questions in production.
> 
> Unfortunately, developers seem to get attached to log messages
> which are not appropriate in a production product
> 
> Stephen Hemminger (6):
>   ixgbe: convert debug messages to DEBUG level
>   ixgbe: raise priority of significant log events
>   ixgbe: allow pruning log during build
>   e1000: allow pruning log during build
>   e1000: change log level of debug messages
>   e1000: raise log level of signifcant events

Applied, thanks


[dpdk-dev] [PATCH v3 3/6] ixgbe: allow pruning log during build

2015-07-30 Thread Thomas Monjalon
2015-07-09 16:01, Stephen Hemminger:
> From: Stephen Hemminger 
> 
> The ixgbe driver was not following DPDK convention and
> was leaving loggin always in even if LOG_LEVEL was configured
> to disable debug logs.
> 
> Signed-off-by: Stephen Hemminger 

This series is fixing e1000 and ixgbe.
There is the same issue with i40e, fm10k and bnx2x.
I will fix them in the same way.

For consistency, examples/l3fwd-power and eal_common_tailqs.c should
use RTE_LOG instead of rte_log.


[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Zhang, Helin


> -Original Message-
> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Thursday, July 30, 2015 10:56 AM
> To: Zhang, Helin; Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: i40e xmit path HW limitation
> 
> 
> 
> On 07/30/15 20:33, Zhang, Helin wrote:
> >
> >> -Original Message-
> >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> >> Sent: Thursday, July 30, 2015 9:44 AM
> >> To: Zhang, Helin; Ananyev, Konstantin
> >> Cc: dev at dpdk.org
> >> Subject: Re: i40e xmit path HW limitation
> >>
> >>
> >>
> >> On 07/30/15 19:10, Zhang, Helin wrote:
>  -Original Message-
>  From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>  Sent: Thursday, July 30, 2015 7:58 AM
>  To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
>  Subject: RFC: i40e xmit path HW limitation
> 
>  Hi, Konstantin, Helin,
>  there is a documented limitation of xl710 controllers (i40e driver)
>  which is not handled in any way by a DPDK driver.
> From the datasheet chapter 8.4.1:
> 
>  "? A single transmit packet may span up to 8 buffers (up to 8 data
>  descriptors per packet including both the header and payload buffers).
>  ? The total number of data descriptors for the whole TSO (explained
>  later on in this chapter) is unlimited as long as each segment
>  within the TSO obeys the previous rule (up to 8 data descriptors
>  per segment for both the TSO header and the segment payload buffers)."
> >>> Yes, I remember the RX side just supports 5 segments per packet receiving.
> >>> But what's the possible issue you thought about?
> >> Note that it's a Tx size we are talking about.
> >>
> >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next
> repo.
> >> If such a cluster arrives and you post it on the HW ring - HW will
> >> shut this HW ring down permanently. The application will see that it's 
> >> ring is
> stuck.
> > That issue was because of using more than 8 descriptors for a packet for 
> > TSO.
> 
> There is no problem in transmitting the TSO packet with more than 8 fragments.
> On the opposite - one can't transmit a non-TSO packet with more than 8
> fragments.
> One also can't transmit the TSO packet that would contain more than 8 
> fragments
> in a single TSO segment including the TSO headers.
> 
> Pls., read the HW spec as I quoted above for more details.
I meant a packet to be transmitted by the hardware, but not the TSO packet in 
memory.
It could be a segment in TSO packet in memory.
The linearize check in kernel driver is not for TSO only, it is for both TSO and
NON-TSO cases.

> 
> >
>  This means that, for instance, long cluster with small fragments
>  has to be linearized before it may be placed on the HW ring.
> >>> What type of size of the small fragments? Basically 2KB is the
> >>> default size of
> >> mbuf of most
> >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough
> >>> for the
> >> maximum
> >>> packet size we supported.
> >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of
> packet.
> >> I kinda lost u here. Again, we talk about the Tx side here and
> >> buffers are not obligatory completely filled. Namely there may be a
> >> cluster with
> >> 15 fragments 100 bytes each.
> > The root cause is using more than 8 descriptors for a packet.
> 
> That would be if u would like to SUPER simplify the HW limitation above.
> In that case u would significantly limit the different packets that may be 
> sent
> without the linearization.
> 
> > Linux driver can help
> > on reducing number of descriptors to be used by merging small size of
> > payload together, right?
> > It is not for TSO, it is just for packet transmitting. 2 options in my mind:
> > 1. Use should ensure it will not use more than 8 descriptors per packet for
> transmitting.
> 
> This requirement is too restricting. Pls., see above.
> 
> > 2. DPDK driver should try to merge small packet together for such case, like
> Linux kernel driver.
> > I prefer to use option 1, users should ensure that in the application
> > or up layer software, and keep the PMD driver as simple as possible.
> 
> The above statement is super confusing: on the one hand u suggest the DPDK
> driver to merge the small packet (fragments?) together (how?) and then u
> immediately propose the user application to do that. Could u, pls., clarify 
> what
> exactly u suggest here?
> If that's to leave it to the application - note that it would demand patching 
> all
> existing DPDK applications that send TCP packets.
Those are two of obvious options. One is to do that in PMD, the other one is to 
do
that in up layer. I did not mean it needs to do both!


> 
> >
> > But I have a thought that the maximum number of RX/TX descriptor
> > should be able to be queried somewhere.
> 
> There is no such thing as maximum number of Tx fragments in a TSO case.
> It's only limited by the Tx ring s

[dpdk-dev] [PATCH v2] enic: silence log message unless debug enabled

2015-07-30 Thread Thomas Monjalon
2015-07-30 10:03, Stephen Hemminger:
> --- a/drivers/net/enic/enic_compat.h
> +++ b/drivers/net/enic/enic_compat.h
> @@ -82,7 +82,11 @@
>  #define dev_err(x, args...) dev_printk(ERR, args)
>  #define dev_info(x, args...) dev_printk(INFO,  args)
>  #define dev_warning(x, args...) dev_printk(WARNING, args)
> +#ifdef RTE_LIBRTE_ENIC_DEBUG
>  #define dev_debug(x, args...) dev_printk(DEBUG, args)
> +#else
> +#define dev_debug(x, args...) do { } while(0)
> +#endif

I don't understand why it is needed:
dev_debug won't print anything if the log level is higher than DEBUG.



[dpdk-dev] i40e xmit path HW limitation

2015-07-30 Thread Vladislav Zolotarov
On Jul 30, 2015 22:00, "Zhang, Helin"  wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Thursday, July 30, 2015 10:56 AM
> > To: Zhang, Helin; Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: i40e xmit path HW limitation
> >
> >
> >
> > On 07/30/15 20:33, Zhang, Helin wrote:
> > >
> > >> -Original Message-
> > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > >> Sent: Thursday, July 30, 2015 9:44 AM
> > >> To: Zhang, Helin; Ananyev, Konstantin
> > >> Cc: dev at dpdk.org
> > >> Subject: Re: i40e xmit path HW limitation
> > >>
> > >>
> > >>
> > >> On 07/30/15 19:10, Zhang, Helin wrote:
> >  -Original Message-
> >  From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> >  Sent: Thursday, July 30, 2015 7:58 AM
> >  To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin
> >  Subject: RFC: i40e xmit path HW limitation
> > 
> >  Hi, Konstantin, Helin,
> >  there is a documented limitation of xl710 controllers (i40e driver)
> >  which is not handled in any way by a DPDK driver.
> > From the datasheet chapter 8.4.1:
> > 
> >  "? A single transmit packet may span up to 8 buffers (up to 8 data
> >  descriptors per packet including both the header and payload
buffers).
> >  ? The total number of data descriptors for the whole TSO (explained
> >  later on in this chapter) is unlimited as long as each segment
> >  within the TSO obeys the previous rule (up to 8 data descriptors
> >  per segment for both the TSO header and the segment payload
buffers)."
> > >>> Yes, I remember the RX side just supports 5 segments per packet
receiving.
> > >>> But what's the possible issue you thought about?
> > >> Note that it's a Tx size we are talking about.
> > >>
> > >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next
> > repo.
> > >> If such a cluster arrives and you post it on the HW ring - HW will
> > >> shut this HW ring down permanently. The application will see that
it's ring is
> > stuck.
> > > That issue was because of using more than 8 descriptors for a packet
for TSO.
> >
> > There is no problem in transmitting the TSO packet with more than 8
fragments.
> > On the opposite - one can't transmit a non-TSO packet with more than 8
> > fragments.
> > One also can't transmit the TSO packet that would contain more than 8
fragments
> > in a single TSO segment including the TSO headers.
> >
> > Pls., read the HW spec as I quoted above for more details.
> I meant a packet to be transmitted by the hardware, but not the TSO
packet in memory.
> It could be a segment in TSO packet in memory.
> The linearize check in kernel driver is not for TSO only, it is for both
TSO and
> NON-TSO cases.

That's what i was trying to tell u. Great we are on the same page at
last... ?

>
> >
> > >
> >  This means that, for instance, long cluster with small fragments
> >  has to be linearized before it may be placed on the HW ring.
> > >>> What type of size of the small fragments? Basically 2KB is the
> > >>> default size of
> > >> mbuf of most
> > >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough
> > >>> for the
> > >> maximum
> > >>> packet size we supported.
> > >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB
size of
> > packet.
> > >> I kinda lost u here. Again, we talk about the Tx side here and
> > >> buffers are not obligatory completely filled. Namely there may be a
> > >> cluster with
> > >> 15 fragments 100 bytes each.
> > > The root cause is using more than 8 descriptors for a packet.
> >
> > That would be if u would like to SUPER simplify the HW limitation above.
> > In that case u would significantly limit the different packets that may
be sent
> > without the linearization.
> >
> > > Linux driver can help
> > > on reducing number of descriptors to be used by merging small size of
> > > payload together, right?
> > > It is not for TSO, it is just for packet transmitting. 2 options in
my mind:
> > > 1. Use should ensure it will not use more than 8 descriptors per
packet for
> > transmitting.
> >
> > This requirement is too restricting. Pls., see above.
> >
> > > 2. DPDK driver should try to merge small packet together for such
case, like
> > Linux kernel driver.
> > > I prefer to use option 1, users should ensure that in the application
> > > or up layer software, and keep the PMD driver as simple as possible.
> >
> > The above statement is super confusing: on the one hand u suggest the
DPDK
> > driver to merge the small packet (fragments?) together (how?) and then u
> > immediately propose the user application to do that. Could u, pls.,
clarify what
> > exactly u suggest here?
> > If that's to leave it to the application - note that it would demand
patching all
> > existing DPDK applications that send TCP packets.
> Those are two of obvious options. One is to do that in PMD, the othe

[dpdk-dev] [PATCH] eal: fix compilation for x86_x32-native-linuxapp-gcc

2015-07-30 Thread Olivier Matz
Compiling for dpdk x86_x32 gives the following error:

  CC eal_common_timer.o
In file included from /usr/include/sys/sysctl.h:63:0,
 from dpdk.org/lib/librte_eal/common/eal_common_timer.c:39:
/usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is 
unsupported in x32 kernel"
 # error "sysctl system call is unsupported in x32 kernel"
   ^
dpdk.org/mk/internal/rte.compile-pre.mk:126: recipe for target 
'eal_common_timer.o' failed
make[6]: *** [eal_common_timer.o] Error 1

Including sysctl.h was added by mistake when merging bsd and linux EAL
timer code. It can be safely removed in this file, fixing the
compilation.

Fixes: 040cf8a411 ("eal: deduplicate timer functions")
Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/eal_common_timer.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_timer.c 
b/lib/librte_eal/common/eal_common_timer.c
index 255f995..72371b8 100644
--- a/lib/librte_eal/common/eal_common_timer.c
+++ b/lib/librte_eal/common/eal_common_timer.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 

 #include 
-- 
2.1.4



[dpdk-dev] [PATCH v2] mbuf: enforce alignment of mbuf private area

2015-07-30 Thread Ananyev, Konstantin


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 5:22 PM
> To: dev at dpdk.org
> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; 
> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com
> Subject: [PATCH v2] mbuf: enforce alignment of mbuf private area
> 
> It looks better to have a data buffer address that is aligned to
> 8 bytes. This is the case when there is no mbuf private area, but
> if there is one, the alignment depends on the size of this area
> that is located between the mbuf structure and the data buffer.
> 
> Indeed, some drivers expects to have the buffer address aligned
> to an even address, and moreover an unaligned buffer may impact
> the performance when accessing to network headers.
> 
> Add a check in rte_pktmbuf_pool_create() to verify the alignment
> constraint before creating the mempool. For applications that use
> the alternative way (direct call to rte_mempool_create), also
> add an assertion in rte_pktmbuf_init().
> 
> By the way, also add the MBUF log type.
> 
> Signed-off-by: Olivier Matz 
> ---

Acked-by: Konstantin Ananyev