[dpdk-dev] [PATCH] i40e: fix the issue of wrongly reporting descriptor done
Header buffer address for header split will be filled with the physical address for DMA, which is actually not needed at all, as header split hasn't been supported. Hardware requires the least bit of header address which is 'Descriptor Done' bit when write back should be set to 0 by driver. The issue is that if the user wants to reserve an odd number of bytes between the mbuf header and data buffer, the physical address to be filled in the descriptor would happen to be odd. That means the DD bit would be set to non-zero by driver. That will result in reporting descriptor done wrongly. Signed-off-by: Helin Zhang --- drivers/net/i40e/i40e_rxtx.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 891a221..a267b4d 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -1367,7 +1367,7 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq) mb->port = rxq->port_id; dma_addr = rte_cpu_to_le_64(\ RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); - rxdp[i].read.hdr_addr = dma_addr; + rxdp[i].read.hdr_addr = 0; rxdp[i].read.pkt_addr = dma_addr; } @@ -1514,7 +1514,7 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) rxe->mbuf = nmb; dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); - rxdp->read.hdr_addr = dma_addr; + rxdp->read.hdr_addr = 0; rxdp->read.pkt_addr = dma_addr; rx_packet_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -1640,7 +1640,7 @@ i40e_recv_scattered_pkts(void *rx_queue, rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); /* Set data buffer address and data length of the mbuf */ - rxdp->read.hdr_addr = dma_addr; + rxdp->read.hdr_addr = 0; rxdp->read.pkt_addr = dma_addr; rx_packet_len = (qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> I40E_RXD_QW1_LENGTH_PBUF_SHIFT; @@ -3047,7 +3047,7 @@ i40e_alloc_rx_queue_mbufs(struct i40e_rx_queue *rxq) rxd = &rxq->rx_ring[i]; rxd->read.pkt_addr = dma_addr; - rxd->read.hdr_addr = dma_addr; + rxd->read.hdr_addr = 0; #ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC rxd->read.rsvd1 = 0; rxd->read.rsvd2 = 0; -- 1.9.3
[dpdk-dev] [PATCH] testpmd: Fix wrong message in testpmd
2015-06-24 15:56, Michael Qiu: > When close one port twice, testpmd will give out wrong messagse. > > testpmd> port stop 0 > Stopping ports... > Checking link statuses... > Port 0 Link Up - speed 0 Mbps - full-duplex > Port 1 Link Up - speed 0 Mbps - full-duplex > Done > testpmd> port close 0 > Closing ports... > Done > testpmd> port close 0 > Closing ports... > Port 0 is now not stopped > Done > testpmd> > > > Signed-off-by: Michael Qiu Applied, thanks
[dpdk-dev] [PATCH v3] i40evf: fix crash when setup tx queues on vf port
> This patch fixes the issue: > Testpmd crashed with Segmentation fault when setup tx queues on vf > Steps for reproduce: > - create one vf device from i40e driver > - bind vf device to igb_uio and start testpmd > > With debugging tools, we saw the struct i40e_vf is cleared after > memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf)) in > rte_eth_dev_configure, which should not happen, and the pointer to > i40e_vf isn't in the range of i40e_adapter. > > The root cause is the dev_private_size in i40e virtual function driver struct > rte_i40evf_pmd was set incorrectly. > > Signed-off-by: Jingjing Wu Applied, thanks Does it mean that Tx with i40evf never worked before?
[dpdk-dev] [PATCHv2 0/2] ixgbe: Two fixes for RX scatter functions.
> Acked-by: Wenzhuo Lu Applied, thanks
[dpdk-dev] [PATCH v1 1/1] ixgbe: Fix oerrors by setting it to 0
> > Fix afebc86be1346136125af8026dc215f81c202c50. oerrors was txdgpc - > > hw_stats->gptc, txdgpc is the number of packets DMA'ed by the host > > and was being reset on every call to read stats so it could be < gptc. > > Because we currently have no way to add txdgpc to struct hw_stats so > > that we can maintain a persistent value per port oerrors has now been > > set to 0. References to txdgpc is now removed as we don't use it. This > > patch also removes rxnfgpc as it's not used anywhere. > > > > Signed-off-by: Maryam Tahhan > Acked-by: Konstantin Ananyev Applied, thanks It's a bit sad. Is it a consequence of forbidding updates in the base driver?
[dpdk-dev] [PATCH v4] ixgbe: fix data access on big endian cpu.
> > 1. cpu use data owned by ixgbe must use rte_le_to_cpu_xx(...) > > 2. cpu fill data to ixgbe must use rte_cpu_to_le_xx(...) > > 3. checking pci status with converted constant > > > > Signed-off-by: Xuelin Shi > > Acked-by: Konstantin Ananyev Applied without added blank lines, thanks
[dpdk-dev] [PATCH v2] Make the thash library arch-independent
2015-07-29 09:56, Vladimir Medvedkin: > v2 changes > - Fix SSE to SSE3 typo > - remove unnecessary comments > - Leave unalligned union rte_thash_tuple if no support for SSE3 > - Makes 32bit compiler happy by adding ULL suffix > > Signed-off-by: Vladimir Medvedkin Applied, thanks
[dpdk-dev] [PATCH] eal: fix build
2015-07-29 17:08, Thomas Monjalon: > 2015-07-29 15:00, Zhang, Helin: > > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/common/eal_common_pci.c: > > In function ???rte_eal_pci_probe_one_driver???: > > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/common/eal_common_pci.c:188:4: > > error: implicit declaration of function ???pci_config_space_set??? > > [-Werror=implicit-function-declaration] > > pci_config_space_set(dev); > > ^ > > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/common/eal_common_pci.c:188:4: > > error: nested extern declaration of ???pci_config_space_set??? > > [-Werror=nested-externs] > > cc1: all warnings being treated as errors > > /home/hzhan75/r22/isg_cid-dpdk_org/lib/librte_eal/linuxapp/eal/eal_pci.c:561:1: > > error: ???pci_config_space_set??? defined but not used > > [-Werror=unused-function] > > pci_config_space_set(struct rte_pci_device *dev) > > ^ > > cc1: all warnings being treated as errors > > So I will change the title to: > eal: fix build with pci config enabled > > and add this into the message: > Build log: > lib/librte_eal/common/eal_common_pci.c:188:4: error: > implicit declaration of function pci_config_space_set > > > > > 2015-07-29 06:48, Helin Zhang: > > > > It fixes the build error of implicit declaration of function. Applied, thanks
[dpdk-dev] [PATCH v3] i40evf: fix crash when setup tx queues on vf port
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, July 30, 2015 6:33 AM > To: Wu, Jingjing > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] i40evf: fix crash when setup tx queues > on vf port > > > This patch fixes the issue: > > Testpmd crashed with Segmentation fault when setup tx queues on vf > > Steps for reproduce: > > - create one vf device from i40e driver > > - bind vf device to igb_uio and start testpmd > > > > With debugging tools, we saw the struct i40e_vf is cleared after > > memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf)) > in > > rte_eth_dev_configure, which should not happen, and the pointer to > > i40e_vf isn't in the range of i40e_adapter. > > > > The root cause is the dev_private_size in i40e virtual function driver > > struct rte_i40evf_pmd was set incorrectly. > > > > Signed-off-by: Jingjing Wu > > Applied, thanks > > Does it mean that Tx with i40evf never worked before? Actually we didn't face crash with previous version, i40vf tx works before, what makes me surprised. Maybe just lucky.
[dpdk-dev] [PATCH 2/2] virtio: allow running w/o vlan filtering
I have comments for that. Pls see below. > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, July 29, 2015 8:57 PM > To: Ouyang, Changchun > Cc: dev at dpdk.org; Stephen Hemminger > Subject: Re: [dpdk-dev] [PATCH 2/2] virtio: allow running w/o vlan filtering > > Back on this old patch, it seems justified but nobody agreed. > > --- a/lib/librte_pmd_virtio/virtio_ethdev.c > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c > @@ -1288,7 +1288,6 @@ virtio_dev_configure(struct rte_eth_dev *dev) > && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) { > PMD_DRV_LOG(NOTICE, > "vlan filtering not available on this host"); > - return -ENOTSUP; > } > > 2015-03-06 08:24, Stephen Hemminger: > > "Ouyang, Changchun" wrote: > > > > From: Stephen Hemminger > > > > Vlan filtering is an option, and not a requirement. > > > > If host does not support filtering then it can be done in software. Yes, vlan filter is an option, but currently virtio driver has no software solution for vlan filter. So I would like to disable hw_vlan_filter in rxmode if the dev can't really support it rather than removing the return there. > > > > > > The question is that guest only send command, no real action to do the > vlan filter. > > > So if both host and guest have no real action for vlan filter, who will > > > do it? > > > > The virtio driver has features. > > Guest can not send commands to host where feature bit not enabled. > > Application can call filter_set and check if filter worked or not. > > > > Our code already had to do MAC and VLAN validation of incoming packets There is vlan strip, but have no vlan filter in the rx function. > > therefore if hardware can't do vlan match, there is no problem. > > I would expect other applications would do the same thing. > > > > Failing during configuration is bad. DPDK API should never force > > application to play "guess the working configuration" with the device > > driver or do string match on "which device is this anyway"
[dpdk-dev] [PATCH v2] doc: announce abi change for interrupt mode
The patch announces the planned ABI changes for interrupt mode on v2.2. Signed-off-by: Cunming Liang --- v2 change: - rebase to recent master doc/guides/rel_notes/deprecation.rst | 8 1 file changed, 8 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5330d3b..645ce32 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -35,3 +35,11 @@ Deprecation Notices * The following fields have been deprecated in rte_eth_stats: imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff + +* The ABI changes are planned for struct rte_intr_handle, struct rte_eth_conf + and struct eth_dev_ops in order to support interrupt mode feature. + The upcoming release 2.1 will not contain these ABI changes by default. + This change will be in release 2.2. There's no backwards compatibility planed + due to the additional interrupt mode feature enabling. + Binaries using this library build prior to version 2.2 will require updating + and recompilation. -- 1.8.1.4
[dpdk-dev] Why only rx queue "0" can receive network packet by i40e NIC
Hi Helin, We do not want RSS to include L4 ports in the hash because packet fragments would get routed to queue #0 and would be more difficult to work with. We are using the model where multiple CPUs are pulling from the NIC queues independently with no shared state, so each 'pipeline' has private fragment reassembly state for the sessions it is managing. Getting RSS Toeplitz hash to work on { source_ip, dest_ip } tuples only using a symmetric rss-key is important. This works properly with all other Intel NICs in the DPDK thus far that we have tested until the i40E PMD with the Intel X710-DA4. The Microsoft RSS specification allows for this. With the i40E PMD, we have been unsuccessful at enabling this RSS configuration. From the source code and XL710 controller datasheet, we cannot find any reference to the flags for this RSS mode. Unless we can achieve feature parity with the other Intel NICs, we don't want to write special case code for this one driver which makes the XL710 controller unusable for us and seems contrary to the intent of the DPDK APIs which are abstracting this behavior. Do you have any suggestions? Thanks kindly, Jeff -Original Message- From: Zhang, Helin [mailto:helin.zh...@intel.com] Sent: Wednesday, July 22, 2015 5:56 PM To: Jeff Venable, Sr. ; lhffjzh ; 'Thomas Monjalon' Cc: dev at dpdk.org Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network packet by i40e NIC > -Original Message- > From: Jeff Venable, Sr. [mailto:jeff at vectranetworks.com] > Sent: Wednesday, July 22, 2015 5:47 PM > To: Zhang, Helin; lhffjzh; 'Thomas Monjalon' > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network > packet by i40e NIC > > Is the I40E incapable of operating RSS with ETH_RSS_IP (i.e. hashing > without L4 ports)? Why do you think like this? Sorry, I am a bit confused. ETH_RSS_IP is a super set of all IP based rss types. Please see the rss types listed in rte_ethdev.h. The supports rss types of each NIC can be queried via 'struct rte_eth_dev_info' of field 'flow_type_rss_offloads'. Regards, Helin > > Thanks, > > Jeff > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhang, Helin > Sent: Saturday, February 28, 2015 6:34 AM > To: lhffjzh; 'Thomas Monjalon' > Cc: dev at dpdk.org; maintainers at dpdk.org > Subject: Re: [dpdk-dev] Why only rx queue "0" can receive network > packet by i40e NIC > > Good to know that! > > > -Original Message- > > From: lhffjzh [mailto:lhffjzh at 126.com] > > Sent: Saturday, February 28, 2015 12:34 PM > > To: Zhang, Helin; 'Thomas Monjalon' > > Cc: dev at dpdk.org; maintainers at dpdk.org > > Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network > > packet by i40e NIC > > > > Hi Helin, > > > > Thanks a lot for your great help, all of rx queue received network > > packet after I update rss_hf from "ETH_RSS_IP" to " ETH_RSS_PROTO_MASK ". > > > > static struct rte_eth_conf port_conf = { > > .rxmode = { > > .mq_mode= ETH_MQ_RX_RSS, > > .max_rx_pkt_len = ETHER_MAX_LEN, > > .split_hdr_size = 0, > > .header_split = 0, /**< Header Split disabled */ > > .hw_ip_checksum = 1, /**< IP checksum offload enabled */ > > .hw_vlan_filter = 0, /**< VLAN filtering disabled */ > > .jumbo_frame= 0, /**< Jumbo Frame Support disabled */ > > .hw_strip_crc = 0, /**< CRC stripped by hardware */ > > }, > > .rx_adv_conf = { > > .rss_conf = { > > .rss_key = NULL, > > .rss_hf = ETH_RSS_PROTO_MASK, > > }, > > }, > > .txmode = { > > .mq_mode = ETH_MQ_TX_NONE, > > }, > > .fdir_conf.mode = RTE_FDIR_MODE_SIGNATURE, }; > > > > > > Regards, > > Haifeng > > > > -Original Message- > > From: Zhang, Helin [mailto:helin.zhang at intel.com] > > Sent: Saturday, February 28, 2015 11:18 AM > > To: lhffjzh; 'Thomas Monjalon' > > Cc: dev at dpdk.org; maintainers at dpdk.org > > Subject: RE: [dpdk-dev] Why only rx queue "0" can receive network > > packet by i40e NIC > > > > Hi Haifeng > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of lhffjzh > > > Sent: Saturday, February 28, 2015 9:48 AM > > > To: 'Thomas Monjalon' > > > Cc: dev at dpdk.org; maintainers at dpdk.org > > > Subject: Re: [dpdk-dev] Why only rx queue "0" can receive network > > > packet > > by > > > i40e NIC > > > > > > Hi Thomas, > > > > > > Thanks very much for your reminder, you give me many help in this > > > mail > > list. > > > > > > The issue with detailed information just as below. but I don't > > > know who is > > the > > > dpdk i40e maintainers? is maintainers at dpdk.org? > > > > > > Hardware list: > > > 2 i40e 40G NICs > > > Xeon E5-2670 v2(10 cores) > > > 32G memory > > > > > > I loopback 2 i40e NICs by QSFP cable, one NIC send UDP network > > > packet by DPDK
[dpdk-dev] [PATCH] e1000: fix the issue of wrongly reporting descriptor done
Header buffer address for header split will be filled with the physical address for DMA, which is actually not needed at all, as header split hasn't been supported. Hardware requires the least bit of header address which is 'Descriptor Done' bit when write back should be set to 0 by driver. The issue is that if the user wants to reserve an odd number of bytes between the mbuf header and data buffer, the physical address to be filled in the descriptor would happen to be odd. That means the DD bit would be set to non-zero by driver. That will result in reporting descriptor done wrongly. Signed-off-by: Wenzhuo Lu --- drivers/net/e1000/igb_rxtx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index 3a31b21..b13930e 100644 --- a/drivers/net/e1000/igb_rxtx.c +++ b/drivers/net/e1000/igb_rxtx.c @@ -851,7 +851,7 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, rxe->mbuf = nmb; dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); - rxdp->read.hdr_addr = dma_addr; + rxdp->read.hdr_addr = 0; rxdp->read.pkt_addr = dma_addr; /* @@ -1040,7 +1040,7 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, rxe->mbuf = nmb; dma = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); rxdp->read.pkt_addr = dma; - rxdp->read.hdr_addr = dma; + rxdp->read.hdr_addr = 0; /* * Set data length & data buffer address of mbuf. @@ -1990,7 +1990,7 @@ igb_alloc_rx_queue_mbufs(struct igb_rx_queue *rxq) dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); rxd = &rxq->rx_ring[i]; - rxd->read.hdr_addr = dma_addr; + rxd->read.hdr_addr = 0; rxd->read.pkt_addr = dma_addr; rxe[i].mbuf = mbuf; } -- 1.9.3
[dpdk-dev] [PATCH v2] lpm: fix extended flag check when adding a "depth small" entry
When adding a "depth small" entry, if its extended flag is not set and its depth is smaller than the one in the tbl24, nothing should be done otherwise will operate on the wrong memory area. Signed-off-by: Zhe Tao --- PATCH v2: Edit to keep line size below 80 characters PATCH v1: Fix extended flag check when adding a "depth small" entry lib/librte_lpm/rte_lpm.c | 54 +++- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index de05307..163ba3c 100644 --- a/lib/librte_lpm/rte_lpm.c +++ b/lib/librte_lpm/rte_lpm.c @@ -442,35 +442,41 @@ add_depth_small(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, }; /* Setting tbl24 entry in one go to avoid race -* conditions */ +* conditions +*/ lpm->tbl24[i] = new_tbl24_entry; continue; } - /* If tbl24 entry is valid and extended calculate the index -* into tbl8. */ - tbl8_index = lpm->tbl24[i].tbl8_gindex * - RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl8_group_end = tbl8_index + RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - - for (j = tbl8_index; j < tbl8_group_end; j++) { - if (!lpm->tbl8[j].valid || - lpm->tbl8[j].depth <= depth) { - struct rte_lpm_tbl8_entry new_tbl8_entry = { - .valid = VALID, - .valid_group = VALID, - .depth = depth, - .next_hop = next_hop, - }; - - /* -* Setting tbl8 entry in one go to avoid race -* conditions -*/ - lpm->tbl8[j] = new_tbl8_entry; - - continue; + if (lpm->tbl24[i].ext_entry == 1) { + /* If tbl24 entry is valid and extended calculate the +* index into tbl8. +*/ + tbl8_index = lpm->tbl24[i].tbl8_gindex * + RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl8_group_end = tbl8_index + + RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + + for (j = tbl8_index; j < tbl8_group_end; j++) { + if (!lpm->tbl8[j].valid || + lpm->tbl8[j].depth <= depth) { + struct rte_lpm_tbl8_entry + new_tbl8_entry = { + .valid = VALID, + .valid_group = VALID, + .depth = depth, + .next_hop = next_hop, + }; + + /* +* Setting tbl8 entry in one go to avoid +* race conditions +*/ + lpm->tbl8[j] = new_tbl8_entry; + + continue; + } } } } -- 1.9.3
[dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_fdir_filter
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jingjing Wu > Sent: Monday, July 20, 2015 3:04 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_fdir_filter > > To fix the FVL's flow director issue for SCTP flow, rte_eth_fdir_filter > need to be change to support SCTP flow keys extension. Here announce > the ABI deprecation. > > Signed-off-by: jingjing.wu > --- > doc/guides/rel_notes/deprecation.rst | 4 > 1 file changed, 4 insertions(+) > > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 5330d3b..63e19c7 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -35,3 +35,7 @@ Deprecation Notices > * The following fields have been deprecated in rte_eth_stats: >imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, >tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff > + > +* Significant ABI change is planned for struct rte_eth_fdir_filter to extend > + the SCTP flow's key input from release 2.1. The change may be enabled in > + the upcoming release 2.1 with CONFIG_RTE_NEXT_ABI. > -- > 2.4.0 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v2] lpm: fix extended flag check when adding a "depth small" entry
> -Original Message- > From: Tao, Zhe > Sent: Thursday, July 30, 2015 11:19 AM > To: dev at dpdk.org > Cc: Tao, Zhe; Liang, Cunming; Richardson, Bruce > Subject: [dpdk-dev][PATCH v2] lpm: fix extended flag check when adding a > "depth small" entry > > When adding a "depth small" entry, if its extended flag is not set and > its depth is smaller than the one in the tbl24, nothing should be done > otherwise will operate on the wrong memory area. > > Signed-off-by: Zhe Tao > --- > PATCH v2: Edit to keep line size below 80 characters > > PATCH v1: Fix extended flag check when adding a "depth small" entry > > lib/librte_lpm/rte_lpm.c | 54 +++--- > -- > 1 file changed, 30 insertions(+), 24 deletions(-) > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c > index de05307..163ba3c 100644 > --- a/lib/librte_lpm/rte_lpm.c > +++ b/lib/librte_lpm/rte_lpm.c > @@ -442,35 +442,41 @@ add_depth_small(struct rte_lpm *lpm, uint32_t ip, > uint8_t depth, > }; > > /* Setting tbl24 entry in one go to avoid race > - * conditions */ > + * conditions > + */ > lpm->tbl24[i] = new_tbl24_entry; > > continue; > } > > - /* If tbl24 entry is valid and extended calculate the index > - * into tbl8. */ > - tbl8_index = lpm->tbl24[i].tbl8_gindex * > - RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > - tbl8_group_end = tbl8_index + > RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > - > - for (j = tbl8_index; j < tbl8_group_end; j++) { > - if (!lpm->tbl8[j].valid || > - lpm->tbl8[j].depth <= depth) { > - struct rte_lpm_tbl8_entry new_tbl8_entry = { > - .valid = VALID, > - .valid_group = VALID, > - .depth = depth, > - .next_hop = next_hop, > - }; > - > - /* > - * Setting tbl8 entry in one go to avoid race > - * conditions > - */ > - lpm->tbl8[j] = new_tbl8_entry; > - > - continue; > + if (lpm->tbl24[i].ext_entry == 1) { > + /* If tbl24 entry is valid and extended calculate the > + * index into tbl8. > + */ > + tbl8_index = lpm->tbl24[i].tbl8_gindex * > + RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > + tbl8_group_end = tbl8_index + > + RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > + > + for (j = tbl8_index; j < tbl8_group_end; j++) { > + if (!lpm->tbl8[j].valid || > + lpm->tbl8[j].depth <= depth) { > + struct rte_lpm_tbl8_entry > + new_tbl8_entry = { > + .valid = VALID, > + .valid_group = VALID, > + .depth = depth, > + .next_hop = next_hop, > + }; > + > + /* > + * Setting tbl8 entry in one go to avoid > + * race conditions > + */ > + lpm->tbl8[j] = new_tbl8_entry; > + > + continue; > + } > } > } > } > -- > 1.9.3 Acked-by: Cunming Liang
[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode
The patch announces the planned ABI changes for interrupt mode. Signed-off-by: Cunming Liang --- v3 change: - reword for CONFIG_RTE_NEXT_ABI v2 change: - rebase to recent master doc/guides/rel_notes/deprecation.rst | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 5330d3b..d36d267 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -35,3 +35,8 @@ Deprecation Notices * The following fields have been deprecated in rte_eth_stats: imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff + +* The ABI changes are planned for struct rte_intr_handle, struct rte_eth_conf + and struct eth_dev_ops to support interrupt mode feature from release 2.1. + Those changes may be enabled in the upcoming release 2.1 + with CONFIG_RTE_NEXT_ABI. -- 1.8.1.4
[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode
Acked-by: Marvin Liu > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > Sent: Thursday, July 30, 2015 1:05 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode > > The patch announces the planned ABI changes for interrupt mode. > > Signed-off-by: Cunming Liang > --- > v3 change: >- reword for CONFIG_RTE_NEXT_ABI > > v2 change: >- rebase to recent master > > doc/guides/rel_notes/deprecation.rst | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 5330d3b..d36d267 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -35,3 +35,8 @@ Deprecation Notices > * The following fields have been deprecated in rte_eth_stats: >imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, >tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff > + > +* The ABI changes are planned for struct rte_intr_handle, struct > rte_eth_conf > + and struct eth_dev_ops to support interrupt mode feature from release > 2.1. > + Those changes may be enabled in the upcoming release 2.1 > + with CONFIG_RTE_NEXT_ABI. > -- > 1.8.1.4
[dpdk-dev] [PATCH] app test: fix mempool cache_size not match limited cache_size
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yong Liu > Sent: Wednesday, July 29, 2015 11:22 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] app test: fix mempool cache_size not match > limited cache_size > > From: Marvin Liu > > In previous setting, mempool size and cache_size are both 32. > This is not satisfied with cache_size checking rule by now. > Cache size should less than CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE and > mempool size / 1.5. > > Signed-off-by: Marvin Liu > Acked-by: Jingjing Wu > diff --git a/app/test/test_sched.c b/app/test/test_sched.c index > 1ef6910..7a38db3 100644 > --- a/app/test/test_sched.c > +++ b/app/test/test_sched.c > @@ -87,7 +87,7 @@ static struct rte_sched_port_params port_param = { > > #define NB_MBUF 32 > #define MBUF_DATA_SZ (2048 + RTE_PKTMBUF_HEADROOM) > -#define PKT_BURST_SZ 32 > +#define PKT_BURST_SZ 0 > #define MEMPOOL_CACHE_SZ PKT_BURST_SZ > #define SOCKET 0 > > -- > 1.9.3
[dpdk-dev] [PATCH] app test: fix eal --no-huge option should work with -m option
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yong Liu > Sent: Wednesday, July 29, 2015 12:38 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] app test: fix eal --no-huge option should work > with -m option > > From: Marvin Liu > > '--no-huge' option now can workable with -m option. > Unit test for eal flag should change pass criterion. > > Signed-off-by: Marvin Liu > Acked-by: Jingjing Wu > diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c index > 0352f87..e6f7035 100644 > --- a/app/test/test_eal_flags.c > +++ b/app/test/test_eal_flags.c > @@ -748,8 +748,8 @@ test_no_hpet_flag(void) } > > /* > - * Test that the app runs with --no-huge and doesn't run when either > - * -m or --socket-mem are specified with --no-huge. > + * Test that the app runs with --no-huge and doesn't run when > + --socket-mem are > + * specified with --no-huge. > */ > static int > test_no_huge_flag(void) > @@ -778,8 +778,8 @@ test_no_huge_flag(void) > printf("Error - process did not run ok with --no-huge flag\n"); > return -1; > } > - if (launch_proc(argv2) == 0) { > - printf("Error - process run ok with --no-huge and -m flags\n"); > + if (launch_proc(argv2) != 0) { > + printf("Error - process did not run ok with --no-huge and -m > +flags\n"); > return -1; > } > #ifdef RTE_EXEC_ENV_BSDAPP > -- > 1.9.3
[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
Hi, On 07/29/2015 10:24 PM, Zhang, Helin wrote: > Hi Martin > > Thank you very much for the good catch! > > The similar situation in i40e, as explained by Konstantin. > As header split hasn't been supported by DPDK till now. It would be better to > put the header address in RX descriptor to 0. > But in the future, during header split enabling. We may need to pay extra > attention to that. As at least x710 datasheet said specifically as below. > "The header address should be set by the software to an even number (word > aligned address)". We may need to find a way to ensure that during > mempool/mbuf allocation. Indeed it would be good to force the priv_size to be aligned. The priv_size could be aligned automatically in rte_pktmbuf_pool_create(). The only possible problem I could see is that it would break applications that access to the data buffer by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the best thing to do (I didn't find any applications like this in dpdk). For applications that directly use rte_mempool_create() instead of rte_pktmbuf_pool_create(), we could add a check using an assert in rte_pktmbuf_init() and some words in the documentation. The question is: what should be the proper alignment? I would say at least 8 bytes, but maybe cache_aligned is an option too. Regards, Olivier > > Regards, > Helin > >> -Original Message- >> From: Ananyev, Konstantin >> Sent: Wednesday, July 29, 2015 11:12 AM >> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com >> Cc: dev at dpdk.org >> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when >> mbuf >> private area size is odd >> >> Hi Martin, >> >>> -Original Message- >>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser >>> Sent: Wednesday, July 29, 2015 4:07 PM >>> To: Zhang, Helin; olivier.matz at 6wind.com >>> Cc: dev at dpdk.org >>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when >>> mbuf private area size is odd >>> >>> Hi Helin, Hi Olivier, >>> >>> we are seeing an issue with the ixgbe and i40e drivers which we could >>> track down to our setting of the private area size of the mbufs. >>> The issue can be easily reproduced with the l2fwd example application >>> when a small modification is done: just set the priv_size parameter in >>> the call to the rte_pktmbuf_pool_create function to an odd number like >>> 1. In our tests this causes every call to rte_eth_rx_burst to return >>> 32 (which is the setting of nb_pkts) nonsense mbufs although no >>> packets are received on the interface and the hardware counters do not >>> report any received packets. >> >> From Niantic datasheet: >> >> "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the >> advanced receive descriptor programming by the software. The ... >> Packet Buffer Address (64) >> This is the physical address of the packet buffer. The lowest bit is A0 (LSB >> of the >> address). >> Header Buffer Address (64) >> The physical address of the header buffer with the lowest bit being >> Descriptor >> Done (DD). >> When a packet spans in multiple descriptors, the header buffer address is >> used >> only on the first descriptor. During the programming phase, software must set >> the DD bit to zero (see the description of the DD bit in this section). This >> means >> that header buffer addresses are always word aligned." >> >> Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)and >> Header Buffer Address (HBA) to the same value: >> buf_physaddr + RTE_PKTMBUF_HEADROOM >> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and >> then >> SW considers that HW already done with it. >> In other words, right now for ixgbe you can't use RX buffer that is not >> aligned on >> word boundary. >> >> So the advice would be, right now - don't set priv_size to the odd value. >> As we don't support split header feature anyway, I think we can fix it just >> by >> always setting HBA in the RXD to zero. >> Could you try the fix for ixgbe below? >> >> Same story with FVL, I believe. >> Konstantin >> >> >>> Interestingly this does not happen if we force the scattered rx path. >>> >>> I assume the drivers have some expectations regarding the alignment of >>> the buf_addr in the mbuf and setting an odd private are size breaks >>> this alignment in the rte_pktmbuf_init function. If this is the case >>> then one possible fix might be to enforce an alignment on the private area >>> size. >>> >>> Best regards, >>> Martin >> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >> index a0c8847..94967c5 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool >> reset_mbuf) >> >> /* populate the descriptors */ >> dma_addr = >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); >> -
[dpdk-dev] [PATCH v10 0/3] deduplicate EAL common functions
Hi Thomas & Ravi, On 07/27/2015 02:59 AM, Thomas Monjalon wrote: > 2015-07-27 02:56, Thomas Monjalon: >> v9 was a subset of previous deduplications by Ravi Kerur. >> This v10 address the comments I've done on v9. >> >> Ravi Kerur (3): >>eal: deduplicate lcore initialization >>eal: deduplicate timer functions >>eal: deduplicate memory initialization > > Applied shortly to integrate this old pending cleanup in RC2. > When I try to compile the dpdk for x86_x32-native-linuxapp-gcc , I get the following compilation error: CC eal_common_timer.o In file included from /usr/include/sys/sysctl.h:63:0, from /home/matz/dpdk-pkg-cron/dpdk.org/lib/librte_eal/common/eal_common_timer.c:39: /usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is unsupported in x32 kernel" # error "sysctl system call is unsupported in x32 kernel" ^ Removing the "#include " line fixes the issue without impacting the compilation. I think this include is not needed and could be removed. I can provide a patch if it's ok for you. Regards, Olivier
[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode
Hi, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > Sent: Thursday, July 30, 2015 1:05 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v3] doc: announce abi change for interrupt > mode > > The patch announces the planned ABI changes for interrupt mode. > > Signed-off-by: Cunming Liang Acked-by: Shaopeng He
[dpdk-dev] [PATCH] e1000: fix ieee1588 timestamp issue
Ieee1588 reads system time to set its timestamp. On 1G NICs, for example, i350, system time is disabled by default. It means the ieee1588 timestamp will always be 0. This patch enables system time when ieee1588 is enabled. Signed-off-by: Wenzhuo Lu --- drivers/net/e1000/igb_ethdev.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 56734a3..8fb67ac 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -3898,11 +3898,19 @@ eth_igb_set_mc_addr_list(struct rte_eth_dev *dev, return 0; } +#define E1000_TSAUXC_DISABLE_SYSTIME 0x8000 + static int igb_timesync_enable(struct rte_eth_dev *dev) { struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); uint32_t tsync_ctl; + uint32_t tsauxc; + + /* Enable system time for it isn't on by default. */ + tsauxc = E1000_READ_REG(hw, E1000_TSAUXC); + tsauxc &= ~E1000_TSAUXC_DISABLE_SYSTIME; + E1000_WRITE_REG(hw, E1000_TSAUXC, tsauxc); /* Start incrementing the register used to timestamp PTP packets. */ E1000_WRITE_REG(hw, E1000_TIMINCA, E1000_TIMINCA_INIT); -- 1.9.3
[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
Hi Olivier, > -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 9:12 AM > To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > mbuf private area size is odd > > Hi, > > On 07/29/2015 10:24 PM, Zhang, Helin wrote: > > Hi Martin > > > > Thank you very much for the good catch! > > > > The similar situation in i40e, as explained by Konstantin. > > As header split hasn't been supported by DPDK till now. It would be better > > to put the header address in RX descriptor to 0. > > But in the future, during header split enabling. We may need to pay extra > > attention to that. As at least x710 datasheet said > specifically as below. > > "The header address should be set by the software to an even number (word > > aligned address)". We may need to find a way to > ensure that during mempool/mbuf allocation. > > Indeed it would be good to force the priv_size to be aligned. > > The priv_size could be aligned automatically in > rte_pktmbuf_pool_create(). The only possible problem I could see > is that it would break applications that access to the data buffer > by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the > best thing to do (I didn't find any applications like this in dpdk). Might be just make rte_pktmbuf_pool_create() fail if input priv_size % MIN_ALIGN != 0? > > For applications that directly use rte_mempool_create() instead of > rte_pktmbuf_pool_create(), we could add a check using an assert in > rte_pktmbuf_init() and some words in the documentation. > > The question is: what should be the proper alignment? I would say > at least 8 bytes, but maybe cache_aligned is an option too. 8 bytes seems enough to me. Konstantin > > Regards, > Olivier > > > > > > Regards, > > Helin > > > >> -Original Message- > >> From: Ananyev, Konstantin > >> Sent: Wednesday, July 29, 2015 11:12 AM > >> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com > >> Cc: dev at dpdk.org > >> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > >> mbuf > >> private area size is odd > >> > >> Hi Martin, > >> > >>> -Original Message- > >>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser > >>> Sent: Wednesday, July 29, 2015 4:07 PM > >>> To: Zhang, Helin; olivier.matz at 6wind.com > >>> Cc: dev at dpdk.org > >>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > >>> mbuf private area size is odd > >>> > >>> Hi Helin, Hi Olivier, > >>> > >>> we are seeing an issue with the ixgbe and i40e drivers which we could > >>> track down to our setting of the private area size of the mbufs. > >>> The issue can be easily reproduced with the l2fwd example application > >>> when a small modification is done: just set the priv_size parameter in > >>> the call to the rte_pktmbuf_pool_create function to an odd number like > >>> 1. In our tests this causes every call to rte_eth_rx_burst to return > >>> 32 (which is the setting of nb_pkts) nonsense mbufs although no > >>> packets are received on the interface and the hardware counters do not > >>> report any received packets. > >> > >> From Niantic datasheet: > >> > >> "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the > >> advanced receive descriptor programming by the software. The ... > >> Packet Buffer Address (64) > >> This is the physical address of the packet buffer. The lowest bit is A0 > >> (LSB of the > >> address). > >> Header Buffer Address (64) > >> The physical address of the header buffer with the lowest bit being > >> Descriptor > >> Done (DD). > >> When a packet spans in multiple descriptors, the header buffer address is > >> used > >> only on the first descriptor. During the programming phase, software must > >> set > >> the DD bit to zero (see the description of the DD bit in this section). > >> This means > >> that header buffer addresses are always word aligned." > >> > >> Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)and > >> Header Buffer Address (HBA) to the same value: > >> buf_physaddr + RTE_PKTMBUF_HEADROOM > >> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and > >> then > >> SW considers that HW already done with it. > >> In other words, right now for ixgbe you can't use RX buffer that is not > >> aligned on > >> word boundary. > >> > >> So the advice would be, right now - don't set priv_size to the odd value. > >> As we don't support split header feature anyway, I think we can fix it > >> just by > >> always setting HBA in the RXD to zero. > >> Could you try the fix for ixgbe below? > >> > >> Same story with FVL, I believe. > >> Konstantin > >> > >> > >>> Interestingly this does not happen if we force the scattered rx path. > >>> > >>> I assume the drivers have some expectations regarding the alignment of > >>> the buf_add
[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
Hi Konstantin, On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote: > Hi Olivier, > >> -Original Message- >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] >> Sent: Thursday, July 30, 2015 9:12 AM >> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser >> Cc: dev at dpdk.org >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when >> mbuf private area size is odd >> >> Hi, >> >> On 07/29/2015 10:24 PM, Zhang, Helin wrote: >>> Hi Martin >>> >>> Thank you very much for the good catch! >>> >>> The similar situation in i40e, as explained by Konstantin. >>> As header split hasn't been supported by DPDK till now. It would be better >>> to put the header address in RX descriptor to 0. >>> But in the future, during header split enabling. We may need to pay extra >>> attention to that. As at least x710 datasheet said >> specifically as below. >>> "The header address should be set by the software to an even number (word >>> aligned address)". We may need to find a way to >> ensure that during mempool/mbuf allocation. >> >> Indeed it would be good to force the priv_size to be aligned. >> >> The priv_size could be aligned automatically in >> rte_pktmbuf_pool_create(). The only possible problem I could see >> is that it would break applications that access to the data buffer >> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the >> best thing to do (I didn't find any applications like this in dpdk). > > > Might be just make rte_pktmbuf_pool_create() fail if input priv_size % > MIN_ALIGN != 0? Hmm maybe it would break more applications: an odd priv_size is probably rare, but a priv_size that is not aligned to 8 bytes is maybe more common. It's maybe safer to align the size transparently? Regards, Olivier > >> >> For applications that directly use rte_mempool_create() instead of >> rte_pktmbuf_pool_create(), we could add a check using an assert in >> rte_pktmbuf_init() and some words in the documentation. >> >> The question is: what should be the proper alignment? I would say >> at least 8 bytes, but maybe cache_aligned is an option too. > > 8 bytes seems enough to me. > > Konstantin > >> >> Regards, >> Olivier >> >> >>> >>> Regards, >>> Helin >>> -Original Message- From: Ananyev, Konstantin Sent: Wednesday, July 29, 2015 11:12 AM To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com Cc: dev at dpdk.org Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd Hi Martin, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser > Sent: Wednesday, July 29, 2015 4:07 PM > To: Zhang, Helin; olivier.matz at 6wind.com > Cc: dev at dpdk.org > Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > mbuf private area size is odd > > Hi Helin, Hi Olivier, > > we are seeing an issue with the ixgbe and i40e drivers which we could > track down to our setting of the private area size of the mbufs. > The issue can be easily reproduced with the l2fwd example application > when a small modification is done: just set the priv_size parameter in > the call to the rte_pktmbuf_pool_create function to an odd number like > 1. In our tests this causes every call to rte_eth_rx_burst to return > 32 (which is the setting of nb_pkts) nonsense mbufs although no > packets are received on the interface and the hardware counters do not > report any received packets. From Niantic datasheet: "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the advanced receive descriptor programming by the software. The ... Packet Buffer Address (64) This is the physical address of the packet buffer. The lowest bit is A0 (LSB of the address). Header Buffer Address (64) The physical address of the header buffer with the lowest bit being Descriptor Done (DD). When a packet spans in multiple descriptors, the header buffer address is used only on the first descriptor. During the programming phase, software must set the DD bit to zero (see the description of the DD bit in this section). This means that header buffer addresses are always word aligned." Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)and Header Buffer Address (HBA) to the same value: buf_physaddr + RTE_PKTMBUF_HEADROOM So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and then SW considers that HW already done with it. In other words, right now for ixgbe you can't use RX buffer that is not aligned on word boundary. So the advice would be, right now - don't set priv_size to the odd value. As we don't support split header feature anyway, I think we can fix it just by >>>
[dpdk-dev] [PATCH v1 1/1] ixgbe: Fix oerrors by setting it to 0
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, July 30, 2015 12:20 AM > To: Tahhan, Maryam > Cc: dev at dpdk.org; Ananyev, Konstantin > Subject: Re: [dpdk-dev] [PATCH v1 1/1] ixgbe: Fix oerrors by setting it to 0 > > > > Fix afebc86be1346136125af8026dc215f81c202c50. oerrors was txdgpc - > > > hw_stats->gptc, txdgpc is the number of packets DMA'ed by the host > > > and was being reset on every call to read stats so it could be < gptc. > > > Because we currently have no way to add txdgpc to struct hw_stats so > > > that we can maintain a persistent value per port oerrors has now > > > been set to 0. References to txdgpc is now removed as we don't use > > > it. This patch also removes rxnfgpc as it's not used anywhere. > > > > > > Signed-off-by: Maryam Tahhan > > Acked-by: Konstantin Ananyev > > Applied, thanks > > It's a bit sad. > Is it a consequence of forbidding updates in the base driver? Yes, that's exactly it. In the meantime I'm going to look at/investigate another way to allow us to maintain additional (anything not in struct hw_stats) per port stats/registers in addition to the base driver. All the best Maryam
[dpdk-dev] abi change announce
Hi Thomas: I am doing virtio/vhost performance optimization, so there is possibly some change, for example to virtio or vhost virtqueue data structure. Do i need to announce the ABI change even if the change hasn't been determined? /huawei
[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 10:10 AM > To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > mbuf private area size is odd > > Hi Konstantin, > > On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote: > > Hi Olivier, > > > >> -Original Message- > >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > >> Sent: Thursday, July 30, 2015 9:12 AM > >> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser > >> Cc: dev at dpdk.org > >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > >> mbuf private area size is odd > >> > >> Hi, > >> > >> On 07/29/2015 10:24 PM, Zhang, Helin wrote: > >>> Hi Martin > >>> > >>> Thank you very much for the good catch! > >>> > >>> The similar situation in i40e, as explained by Konstantin. > >>> As header split hasn't been supported by DPDK till now. It would be > >>> better to put the header address in RX descriptor to 0. > >>> But in the future, during header split enabling. We may need to pay extra > >>> attention to that. As at least x710 datasheet said > >> specifically as below. > >>> "The header address should be set by the software to an even number (word > >>> aligned address)". We may need to find a way to > >> ensure that during mempool/mbuf allocation. > >> > >> Indeed it would be good to force the priv_size to be aligned. > >> > >> The priv_size could be aligned automatically in > >> rte_pktmbuf_pool_create(). The only possible problem I could see > >> is that it would break applications that access to the data buffer > >> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the > >> best thing to do (I didn't find any applications like this in dpdk). > > > > > > Might be just make rte_pktmbuf_pool_create() fail if input priv_size % > > MIN_ALIGN != 0? > > Hmm maybe it would break more applications: an odd priv_size is > probably rare, but a priv_size that is not aligned to 8 bytes is > maybe more common. My thought was that rte_mempool_create() was just introduced in 2.1, so if we add extra requirement for the input parameter now - there would be no ABI breakage, and not many people started to use it already. For me just seems a bit easier and more straightforward then silent alignment - user would not have wrong assumptions here. Though if you think that a silent alignment would be more convenient for most users - I wouldn't insist. Konstantin > It's maybe safer to align the size transparently? > > > Regards, > Olivier > > > > > > >> > >> For applications that directly use rte_mempool_create() instead of > >> rte_pktmbuf_pool_create(), we could add a check using an assert in > >> rte_pktmbuf_init() and some words in the documentation. > >> > >> The question is: what should be the proper alignment? I would say > >> at least 8 bytes, but maybe cache_aligned is an option too. > > > > 8 bytes seems enough to me. > > > > Konstantin > > > >> > >> Regards, > >> Olivier > >> > >> > >>> > >>> Regards, > >>> Helin > >>> > -Original Message- > From: Ananyev, Konstantin > Sent: Wednesday, July 29, 2015 11:12 AM > To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e > when mbuf > private area size is odd > > Hi Martin, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser > > Sent: Wednesday, July 29, 2015 4:07 PM > > To: Zhang, Helin; olivier.matz at 6wind.com > > Cc: dev at dpdk.org > > Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > > mbuf private area size is odd > > > > Hi Helin, Hi Olivier, > > > > we are seeing an issue with the ixgbe and i40e drivers which we could > > track down to our setting of the private area size of the mbufs. > > The issue can be easily reproduced with the l2fwd example application > > when a small modification is done: just set the priv_size parameter in > > the call to the rte_pktmbuf_pool_create function to an odd number like > > 1. In our tests this causes every call to rte_eth_rx_burst to return > > 32 (which is the setting of nb_pkts) nonsense mbufs although no > > packets are received on the interface and the hardware counters do not > > report any received packets. > > From Niantic datasheet: > > "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the > advanced receive descriptor programming by the software. The ... > Packet Buffer Address (64) > This is the physical address of the packet buffer. The lowest bit is A0 > (LSB of the > address). > Header Buffer Address (64) > The physical address of the header bu
[dpdk-dev] [PATCH] e1000: fix ieee1588 timestamp issue
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu > Sent: Thursday, July 30, 2015 9:34 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] e1000: fix ieee1588 timestamp issue > > Ieee1588 reads system time to set its timestamp. On 1G NICs, for example, > i350, system time is disabled by default. It means the ieee1588 timestamp > will always be 0. > This patch enables system time when ieee1588 is enabled. Looks good. > +#define E1000_TSAUXC_DISABLE_SYSTIME 0x8000 Probably best to move this to the top of the file with the other timesync defines. I wonder if this would also fix the following known issue with i210 timesyncing from the release notes: http://dpdk.org/doc/guides/rel_notes/known_issues.html#ieee1588-support-possibly-not-working-with-an-intel-ethernet-controller-i210-nic I don't have an i210 NIC to test but perhaps someone could verify it. John
[dpdk-dev] abi change announce
2015-07-30 09:25, Xie, Huawei: > Hi Thomas: > I am doing virtio/vhost performance optimization, so there is possibly > some change, for example to virtio or vhost virtqueue data structure. > Do i need to announce the ABI change even if the change hasn't been > determined? I have no strong opinion. It seems strange to announce something which is not known. You may be able to introduce your change without previous notice by using NEXT_ABI if not too invasive. Neil, an opinion?
[dpdk-dev] abi change announce
On Thu, Jul 30, 2015 at 12:18:41PM +0200, Thomas Monjalon wrote: > 2015-07-30 09:25, Xie, Huawei: > > Hi Thomas: > > I am doing virtio/vhost performance optimization, so there is possibly > > some change, for example to virtio or vhost virtqueue data structure. > > Do i need to announce the ABI change even if the change hasn't been > > determined? > > I have no strong opinion. > It seems strange to announce something which is not known. > You may be able to introduce your change without previous notice by using > NEXT_ABI if not too invasive. > > Neil, an opinion? > Given the process, you can't announce the change until you know what it is, since you need to detail in the announcement what the change is going to be. We have no method to reserve an 'ABI break to be determined later', nor should we. Write the code, then we figure out if ABI needs to change and there is a need to announce. Neil
[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
Hi, On 07/30/2015 11:43 AM, Ananyev, Konstantin wrote: > > >> -Original Message- >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] >> Sent: Thursday, July 30, 2015 10:10 AM >> To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser >> Cc: dev at dpdk.org >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when >> mbuf private area size is odd >> >> Hi Konstantin, >> >> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote: >>> Hi Olivier, >>> -Original Message- From: Olivier MATZ [mailto:olivier.matz at 6wind.com] Sent: Thursday, July 30, 2015 9:12 AM To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser Cc: dev at dpdk.org Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd Hi, On 07/29/2015 10:24 PM, Zhang, Helin wrote: > Hi Martin > > Thank you very much for the good catch! > > The similar situation in i40e, as explained by Konstantin. > As header split hasn't been supported by DPDK till now. It would be > better to put the header address in RX descriptor to 0. > But in the future, during header split enabling. We may need to pay extra > attention to that. As at least x710 datasheet said specifically as below. > "The header address should be set by the software to an even number (word > aligned address)". We may need to find a way to ensure that during mempool/mbuf allocation. Indeed it would be good to force the priv_size to be aligned. The priv_size could be aligned automatically in rte_pktmbuf_pool_create(). The only possible problem I could see is that it would break applications that access to the data buffer by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the best thing to do (I didn't find any applications like this in dpdk). >>> >>> >>> Might be just make rte_pktmbuf_pool_create() fail if input priv_size % >>> MIN_ALIGN != 0? >> >> Hmm maybe it would break more applications: an odd priv_size is >> probably rare, but a priv_size that is not aligned to 8 bytes is >> maybe more common. > > My thought was that rte_mempool_create() was just introduced in 2.1, > so if we add extra requirement for the input parameter now - > there would be no ABI breakage, and not many people started to use it already. > For me just seems a bit easier and more straightforward then silent alignment > - > user would not have wrong assumptions here. > Though if you think that a silent alignment would be more convenient > for most users - I wouldn't insist. Yes, I agree on the principle, but it depends whether this fix is integrated for 2.1 or not. I think it may already be a bit late for that, especially as it is not a very critical bug. Thomas, what do you think? Olivier > Konstantin > >> It's maybe safer to align the size transparently? >> >> >> Regards, >> Olivier >> >> >> >>> For applications that directly use rte_mempool_create() instead of rte_pktmbuf_pool_create(), we could add a check using an assert in rte_pktmbuf_init() and some words in the documentation. The question is: what should be the proper alignment? I would say at least 8 bytes, but maybe cache_aligned is an option too. >>> >>> 8 bytes seems enough to me. >>> >>> Konstantin >>> Regards, Olivier > > Regards, > Helin > >> -Original Message- >> From: Ananyev, Konstantin >> Sent: Wednesday, July 29, 2015 11:12 AM >> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com >> Cc: dev at dpdk.org >> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e >> when mbuf >> private area size is odd >> >> Hi Martin, >> >>> -Original Message- >>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser >>> Sent: Wednesday, July 29, 2015 4:07 PM >>> To: Zhang, Helin; olivier.matz at 6wind.com >>> Cc: dev at dpdk.org >>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when >>> mbuf private area size is odd >>> >>> Hi Helin, Hi Olivier, >>> >>> we are seeing an issue with the ixgbe and i40e drivers which we could >>> track down to our setting of the private area size of the mbufs. >>> The issue can be easily reproduced with the l2fwd example application >>> when a small modification is done: just set the priv_size parameter in >>> the call to the rte_pktmbuf_pool_create function to an odd number like >>> 1. In our tests this causes every call to rte_eth_rx_burst to return >>> 32 (which is the setting of nb_pkts) nonsense mbufs although no >>> packets are received on the interface and the hardware counters do not >>> report any received packets. >> >>From Niantic datasheet: >> >> "7.1.6.1 Advanced Receive Descriptors ? Read Forma
[dpdk-dev] lost when learning how to test dpdk
Hi, thanks for reply. I could see those docs but it does not help me a lot. I still do not understand very well the principle of the tool. How it chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and testpmd. They have somehow opposite meaning. I can start testpmd now, however, it does ot probe any NIC. I've tried -w to whitelist certain NICs but with no success. $ dpdk_nic_bind --status Network devices using DPDK-compatible driver :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic unused=e1000 :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic unused=e1000 Network devices using kernel driver === :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e unused=uio_pci_generic *Active* Other network devices = $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 0 on socket 0 EAL: Detected lcore 3 as core 1 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 4 lcore(s) EAL: VFIO modules not all loaded, skip VFIO support... EAL: Setting up physically contiguous memory... EAL: Ask a virtual area of 0x3c0 bytes EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fe973a0 (size = 0x20) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fe97360 (size = 0x20) EAL: Ask a virtual area of 0x3c0 bytes EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fe96f40 (size = 0x20) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fe96f00 (size = 0x20) EAL: Requesting 64 pages of size 2MB from socket 0 EAL: TSC frequency is ~368 KHz EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0]) EAL: lcore 1 is ready (tid=6efff700;cpuset=[1]) EAL: No probed ethernet devices Interactive-mode selected Done testpmd> Thanks Jan Viktorin On Wed, 29 Jul 2015 12:09:06 +0300 ciprian.barbu wrote: > > > On 28.07.2015 21:13, Jan Viktorin wrote: > > Hello all, > > > > I am learning how to measure throughput with dpdk. I have 4 cores > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs connected > > together. I do not understand very well, how to setup testpmd. > > http://dpdk.org/doc > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html > http://dpdk.org/doc/quick-start > > > > > I've successfully bound the NICs to dpdk: > > > > $ dpdk_nic_bind --status > > > > Network devices using DPDK-compatible driver > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > unused=e1000 > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > unused=e1000 > > > > Network devices using kernel driver > > === > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > unused=uio_pci_generic *Active* > > > > Other network devices > > = > > > > > > and then I tried to run testpmd: > > > > sudo ./testpmd -b :03:00.0 -b :03:02.0 -c 0xf -n2 -- --nb-cores=1 > > --nb-ports=0 --rxd=2048 --txd=2048 --mbcache=512 --burst=512 > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html#testpmd-command-line-options > > The -b option black lists your PCI devices, you don't need those. The > --nb-ports is of course the number of ports, it cannot be 0. > > > ... > > EAL: Ask a virtual area of 0x40 bytes > > EAL: Virtual area found at 0x7f154980 (size = 0x40) > > EAL: Requesting 1024 pages of size 2MB from socket 0 > > EAL: TSC frequency is ~369 KHz > > EAL: Master lcore 0 is ready (tid=de94a8c0;cpuset=[0]) > > EAL: lcore 2 is ready (tid=487fd700;cpuset=[2]) > > EAL: lcore 3 is ready (tid=47ffc700;cpuset=[3]) > > EAL: lcore 1 is ready (tid=48ffe700;cpuset=[1]) > > EAL: No probed ethernet devices > > EAL: Error - exiting with code: 1 > >Cause: Invalid port 0 > > > > I tried --nb-ports={0,1,2} but neither of them works. BTW, what does this > > option it mean? :) > > I could not find any description in the docs nor in the help (maybe I've > > omitted something). > > > > > > Well, if I manage the testpmd to work I need a packet generator, right? > > I've downloaded > > the dpdk-pktgen. And I am lost again. How can I start it? > > > > After several attempts (mostly trying to use the pktgen-master/slave.sh, > > what is > > their purpose?), the most "successful" output was: > > > > ... > > EAL: Detected lcore 0 as core 0 on socket 03 handles port 1 rx & core 4 > > handles port 0-7 tx > > EAL: Detected lcore 1 as core 1 on socket 0 as it does not matter to the > > syntax. > > EAL: Detected lcore 2 as core 0 on soc
[dpdk-dev] [PATCH v10 0/3] deduplicate EAL common functions
Hi Olivier, On Thu, Jul 30, 2015 at 1:12 AM, Olivier MATZ wrote: > Hi Thomas & Ravi, > > > On 07/27/2015 02:59 AM, Thomas Monjalon wrote: > >> 2015-07-27 02:56, Thomas Monjalon: >> >>> v9 was a subset of previous deduplications by Ravi Kerur. >>> This v10 address the comments I've done on v9. >>> >>> Ravi Kerur (3): >>>eal: deduplicate lcore initialization >>>eal: deduplicate timer functions >>>eal: deduplicate memory initialization >>> >> >> Applied shortly to integrate this old pending cleanup in RC2. >> >> > When I try to compile the dpdk for x86_x32-native-linuxapp-gcc , I > get the following compilation error: > > CC eal_common_timer.o > In file included from /usr/include/sys/sysctl.h:63:0, > from /home/matz/dpdk-pkg-cron/ > dpdk.org/lib/librte_eal/common/eal_common_timer.c:39: > /usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is > unsupported in x32 kernel" > # error "sysctl system call is unsupported in x32 kernel" >^ > > Removing the "#include " line fixes the issue without > impacting the compilation. I think this include is not needed and > could be removed. > I can provide a patch if it's ok for you. > > If it compiles fine on FreeBSD then it should be fine. It primarily needed for eal_timer.c in FreeBSD environment, during code movement it slipped through my mind. Sorry for the inconvenience. Thanks, Ravi > Regards, > Olivier > >
[dpdk-dev] how to compile kernel drivers only
Hi all, I'm trying to compile DPDK kernel drivers (i.e., igb_uio.ko and kni.ko if I got it right) only on a certain machine. On that machine, I'm not interested in anything else. how can I tweak .config file to achieve it? I have tried to set all options to =n, except for: CONFIG_RTE_LIBRTE_EAL=y CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y CONFIG_RTE_EAL_IGB_UIO=y CONFIG_RTE_EAL_VFIO=y CONFIG_RTE_LIBRTE_KNI=y But then I get a compile error about app/dump_cfg: == Build app/dump_cfg ? CC main.o ? LD dump_cfg dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_common_log.o): In function `rte_eal_common_log_init': eal_common_log.c:(.text+0x1b0): undefined reference to `rte_mempool_create' eal_common_log.c:(.text+0x1fe): undefined reference to `rte_mempool_lookup' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_pci_uio.o): In function `pci_uio_map_resource': eal_pci_uio.c:(.text+0x4dd): undefined reference to `rte_zmalloc' eal_pci_uio.c:(.text+0x873): undefined reference to `rte_malloc' eal_pci_uio.c:(.text+0x9bf): undefined reference to `rte_malloc' eal_pci_uio.c:(.text+0xb0a): undefined reference to `rte_malloc' eal_pci_uio.c:(.text+0xc55): undefined reference to `rte_malloc' eal_pci_uio.c:(.text+0xda6): undefined reference to `rte_malloc' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_pci_uio.o):eal_pci_uio.c:(.text+0xf00): more undefined references to `rte_malloc' follow dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_pci_uio.o): In function `pci_uio_map_resource': eal_pci_uio.c:(.text+0x10e4): undefined reference to `rte_free' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_interrupts.o): In function `rte_intr_callback_unregister': eal_interrupts.c:(.text+0x7b6): undefined reference to `rte_free' eal_interrupts.c:(.text+0x7f2): undefined reference to `rte_free' eal_interrupts.c:(.text+0x84c): undefined reference to `rte_free' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_interrupts.o): In function `rte_intr_callback_register': eal_interrupts.c:(.text+0x8fb): undefined reference to `rte_zmalloc' eal_interrupts.c:(.text+0x96f): undefined reference to `rte_zmalloc' eal_interrupts.c:(.text+0xac4): undefined reference to `rte_free' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_alarm.o): In function `rte_eal_alarm_cancel': eal_alarm.c:(.text+0xa4): undefined reference to `rte_free' eal_alarm.c:(.text+0x128): undefined reference to `rte_free' eal_alarm.c:(.text+0x156): undefined reference to `rte_free' eal_alarm.c:(.text+0x1d3): undefined reference to `rte_free' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_alarm.o): In function `rte_eal_alarm_set': eal_alarm.c:(.text+0x31e): undefined reference to `rte_zmalloc' dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/lib/librte_eal.a(eal_alarm.o): In function `eal_alarm_callback': eal_alarm.c:(.text+0x59e): undefined reference to `rte_free' How can I avoid building any app like dump_cfg? Thanks! Francesco
[dpdk-dev] how to compile kernel drivers only
2015-07-30 12:17, Montorsi, Francesco: > How can I avoid building any app like dump_cfg? In app/Makefile, you'll find the options to disable: DIRS-$(CONFIG_RTE_APP_TEST) += test DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info
[dpdk-dev] how to compile kernel drivers only
Hi Thomas, Thanks for your reply. My problem is that I have in app/Makefile: DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += dump_cfg So that I should put CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n To disable dump_cfg application build. However, If I do so, the kernel drivers are not built at all and make just says: make T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc EXTRA_LDFLAGS="" --directory=dpdk-2.0.0 all make[1]: Entering directory `/home/hammer/share/CSA-Hamachi-Sprint/HW-Accel/drivers/dpdk/dpdk-2.0.0' == Build lib == Build lib/librte_compat SYMLINK-FILE include/rte_compat.h == Build lib/librte_eal == Build app Build complete So that CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y Seems to be a pre-requisite of kernel drivers... or am I missing something? Thanks, Francesco -Original Message- From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] Sent: gioved? 30 luglio 2015 14:23 To: Montorsi, Francesco Cc: dev at dpdk.org Subject: Re: [dpdk-dev] how to compile kernel drivers only 2015-07-30 12:17, Montorsi, Francesco: > How can I avoid building any app like dump_cfg? In app/Makefile, you'll find the options to disable: DIRS-$(CONFIG_RTE_APP_TEST) += test DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info
[dpdk-dev] how to compile kernel drivers only
Francesco, please reply below (easier to follow the thread). 2015-07-30 12:48, Montorsi, Francesco: > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > 2015-07-30 12:17, Montorsi, Francesco: > > > How can I avoid building any app like dump_cfg? > > > > In app/Makefile, you'll find the options to disable: > > DIRS-$(CONFIG_RTE_APP_TEST) += test > > DIRS-$(CONFIG_RTE_LIBRTE_ACL) += test-acl > > DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline > > DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd > > DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test > > DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += proc_info > > My problem is that I have in app/Makefile: > > DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += dump_cfg > > So that I should put > > CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n > > To disable dump_cfg application build. However, If I do so, the kernel > drivers are not built at all and make just says: > > make T=x86_64-native-linuxapp-gcc O=x86_64-native-linuxapp-gcc > EXTRA_LDFLAGS="" --directory=dpdk-2.0.0 all > make[1]: Entering directory > `/home/hammer/share/CSA-Hamachi-Sprint/HW-Accel/drivers/dpdk/dpdk-2.0.0' > == Build lib > == Build lib/librte_compat > SYMLINK-FILE include/rte_compat.h > == Build lib/librte_eal > == Build app > Build complete > > So that > CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y > Seems to be a pre-requisite of kernel drivers... or am I missing something? You're right. You cannot build only kernel drivers. You are welcome to add a new config option to enable/disable apps.
[dpdk-dev] [PATCH] pci: fix build on FreeBSD
Build log: lib/librte_eal/bsdapp/eal/eal_pci.c:462:9: error: incompatible integer to pointer conversion passing 'u_int32_t' (aka 'unsigned int') to parameter of type 'void *' It is fixed by passing the pointer of pi.pi_data to memcpy. By the way, it seems strange that pi_data is initialized twice: .pi_data = *(u_int32_t *)buf memcpy(&pi.pi_data, buf, len); Signed-off-by: Thomas Monjalon --- lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index ff56cd3..6fa0d08 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -459,7 +459,7 @@ int rte_eal_pci_write_config(const struct rte_pci_device *dev, goto error; } - memcpy(pi.pi_data, buf, len); + memcpy(&pi.pi_data, buf, len); fd = open("/dev/pci", O_RDONLY); if (fd < 0) { -- 2.4.2
[dpdk-dev] [PATCH v10 0/3] deduplicate EAL common functions
2015-07-30 10:12, Olivier MATZ: > Hi Thomas & Ravi, > > On 07/27/2015 02:59 AM, Thomas Monjalon wrote: > > 2015-07-27 02:56, Thomas Monjalon: > >> v9 was a subset of previous deduplications by Ravi Kerur. > >> This v10 address the comments I've done on v9. > >> > >> Ravi Kerur (3): > >>eal: deduplicate lcore initialization > >>eal: deduplicate timer functions > >>eal: deduplicate memory initialization > > > > Applied shortly to integrate this old pending cleanup in RC2. > > > > When I try to compile the dpdk for x86_x32-native-linuxapp-gcc , I > get the following compilation error: > >CC eal_common_timer.o > In file included from /usr/include/sys/sysctl.h:63:0, > from > /home/matz/dpdk-pkg-cron/dpdk.org/lib/librte_eal/common/eal_common_timer.c:39: > /usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is > unsupported in x32 kernel" > # error "sysctl system call is unsupported in x32 kernel" > ^ > > Removing the "#include " line fixes the issue without > impacting the compilation. I think this include is not needed and > could be removed. > I can provide a patch if it's ok for you. After fixing another build issue on FreeBSD (patch sent), it builds well without sys/sysctl.h. So it seems to be an useless inclusion.
[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
2015-07-30 13:22, Olivier MATZ: > On 07/30/2015 11:43 AM, Ananyev, Konstantin wrote: > > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > >> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote: > >>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > On 07/29/2015 10:24 PM, Zhang, Helin wrote: > > The similar situation in i40e, as explained by Konstantin. > > As header split hasn't been supported by DPDK till now. It would be > > better to put the header address in RX descriptor to 0. > > But in the future, during header split enabling. We may need to pay > > extra attention to that. As at least x710 datasheet said > specifically as below. > > "The header address should be set by the software to an even number > > (word aligned address)". We may need to find a way to > ensure that during mempool/mbuf allocation. > > Indeed it would be good to force the priv_size to be aligned. > > The priv_size could be aligned automatically in > rte_pktmbuf_pool_create(). The only possible problem I could see > is that it would break applications that access to the data buffer > by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the > best thing to do (I didn't find any applications like this in dpdk). > >>> > >>> > >>> Might be just make rte_pktmbuf_pool_create() fail if input priv_size % > >>> MIN_ALIGN != 0? > >> > >> Hmm maybe it would break more applications: an odd priv_size is > >> probably rare, but a priv_size that is not aligned to 8 bytes is > >> maybe more common. > > > > My thought was that rte_mempool_create() was just introduced in 2.1, > > so if we add extra requirement for the input parameter now - > > there would be no ABI breakage, and not many people started to use it > > already. > > For me just seems a bit easier and more straightforward then silent > > alignment - > > user would not have wrong assumptions here. > > Though if you think that a silent alignment would be more convenient > > for most users - I wouldn't insist. > > > Yes, I agree on the principle, but it depends whether this fix > is integrated for 2.1 or not. > I think it may already be a bit late for that, especially as it > is not a very critical bug. > > Thomas, what do you think? It is a fix. Adding a doc comment, an assert and an alignment constraint or a new automatic alignment in the not yet released function shouldn't hurt. A patch would be welcome for 2.1. Thanks
[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area
It looks better to have a data buffer address that is aligned to 8 bytes. This is the case when there is no mbuf private area, but if there is one, the alignment depends on the size of this area that is located between the mbuf structure and the data buffer. Indeed, some drivers expects to have the buffer address aligned to an even address, and moreover an unaligned buffer may impact the performance when accessing to network headers. Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint before creating the mempool. For applications that use the alternative way (direct call to rte_mempool_create), also add an assertion in rte_pktmbuf_init(). By the way, also add the MBUF log type. Signed-off-by: Olivier Matz --- lib/librte_eal/common/include/rte_log.h | 1 + lib/librte_mbuf/rte_mbuf.c | 8 +++- lib/librte_mbuf/rte_mbuf.h | 7 +-- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h index 24a55cc..ede0dca 100644 --- a/lib/librte_eal/common/include/rte_log.h +++ b/lib/librte_eal/common/include/rte_log.h @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs; #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */ #define RTE_LOGTYPE_TABLE 0x4000 /**< Log related to table. */ #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */ +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */ /* these log types can be used in an application */ #define RTE_LOGTYPE_USER1 0x0100 /**< User-defined log type 1. */ diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 4320dd4..a1ddbb3 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, mbuf_size = sizeof(struct rte_mbuf) + priv_size; buf_len = rte_pktmbuf_data_room_size(mp); + RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0); RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size); RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, struct rte_pktmbuf_pool_private mbp_priv; unsigned elt_size; - + if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) { + RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", + priv_size); + rte_errno = EINVAL; + return NULL; + } elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + (unsigned)data_room_size; mbp_priv.mbuf_data_room_size = data_room_size; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 010b32d..c3b8c98 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -698,6 +698,9 @@ extern "C" { RTE_PTYPE_INNER_L4_MASK)) #endif /* RTE_NEXT_ABI */ +/** Alignment constraint of mbuf private area. */ +#define RTE_MBUF_PRIV_ALIGN 8 + /** * Get the name of a RX offload flag * @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg); * details. * @param priv_size * Size of application private are between the rte_mbuf structure - * and the data buffer. + * and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN. * @param data_room_size * Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM. * @param socket_id @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg); * with rte_errno set appropriately. Possible rte_errno values include: *- E_RTE_NO_CONFIG - function could not get pointer to rte_config structure *- E_RTE_SECONDARY - function was called from a secondary process instance - *- EINVAL - cache size provided is too large + *- EINVAL - cache size provided is too large, or priv_size is not aligned. *- ENOSPC - the maximum number of memzones has already been allocated *- EEXIST - a memzone with the same name already exists *- ENOMEM - no appropriate memory area found in which to create memzone -- 2.1.4
[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area
Hi Olivier, If fails to compile for me: /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c: In function ?rte_pktmbuf_pool_create?: /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: error: ?rte_errno? undeclared (first use in this function) rte_errno = EINVAL; ^ /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: note: each undeclared identifier is reported only once for each function it appears in I had to add: diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index a1ddbb3..04344c0 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -58,6 +58,7 @@ #include #include #include +#include /* * ctrlmbuf constructor, given as a callback function to Apart from that - looks good to me. Konstantin > -Original Message- > From: Olivier Matz [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 2:56 PM > To: dev at dpdk.org > Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; > martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com > Subject: [PATCH] mbuf: enforce alignment of mbuf private area > > It looks better to have a data buffer address that is aligned to > 8 bytes. This is the case when there is no mbuf private area, but > if there is one, the alignment depends on the size of this area > that is located between the mbuf structure and the data buffer. > > Indeed, some drivers expects to have the buffer address aligned > to an even address, and moreover an unaligned buffer may impact > the performance when accessing to network headers. > > Add a check in rte_pktmbuf_pool_create() to verify the alignment > constraint before creating the mempool. For applications that use > the alternative way (direct call to rte_mempool_create), also > add an assertion in rte_pktmbuf_init(). > > By the way, also add the MBUF log type. > > Signed-off-by: Olivier Matz > --- > lib/librte_eal/common/include/rte_log.h | 1 + > lib/librte_mbuf/rte_mbuf.c | 8 +++- > lib/librte_mbuf/rte_mbuf.h | 7 +-- > 3 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_eal/common/include/rte_log.h > b/lib/librte_eal/common/include/rte_log.h > index 24a55cc..ede0dca 100644 > --- a/lib/librte_eal/common/include/rte_log.h > +++ b/lib/librte_eal/common/include/rte_log.h > @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs; > #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */ > #define RTE_LOGTYPE_TABLE 0x4000 /**< Log related to table. */ > #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */ > +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */ > > /* these log types can be used in an application */ > #define RTE_LOGTYPE_USER1 0x0100 /**< User-defined log type 1. */ > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c > index 4320dd4..a1ddbb3 100644 > --- a/lib/librte_mbuf/rte_mbuf.c > +++ b/lib/librte_mbuf/rte_mbuf.c > @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, > mbuf_size = sizeof(struct rte_mbuf) + priv_size; > buf_len = rte_pktmbuf_data_room_size(mp); > > + RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0); > RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size); > RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); > > @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, > struct rte_pktmbuf_pool_private mbp_priv; > unsigned elt_size; > > - > + if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) { > + RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", > + priv_size); > + rte_errno = EINVAL; > + return NULL; > + } > elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + > (unsigned)data_room_size; > mbp_priv.mbuf_data_room_size = data_room_size; > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 010b32d..c3b8c98 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -698,6 +698,9 @@ extern "C" { > RTE_PTYPE_INNER_L4_MASK)) > #endif /* RTE_NEXT_ABI */ > > +/** Alignment constraint of mbuf private area. */ > +#define RTE_MBUF_PRIV_ALIGN 8 > + > /** > * Get the name of a RX offload flag > * > @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void > *opaque_arg); > * details. > * @param priv_size > * Size of application private are between the rte_mbuf structure > - * and the data buffer. > + * and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN. > * @param data_room_size > * Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM. > * @param socket_id > @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void > *opaque_arg); > * with rte_errno set appropriately. Possib
[dpdk-dev] [PATCH v3] doc: announce abi change for interrupt mode
Hi Neil, There have been a few deprecation notices like this one submitted. Since you drove the ABI policy, it would be good to get confirmation from you that these are compliant with the policy and that you don't see any issues. Ideally, it would be great if you can review and ack them. If you don't have the time, even just a general indication that you don't see any problems would be useful. Thanks, Tim > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liu, Yong > Sent: Thursday, July 30, 2015 6:15 AM > To: Liang, Cunming; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] doc: announce abi change for > interrupt mode > > Acked-by: Marvin Liu > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang > > Sent: Thursday, July 30, 2015 1:05 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v3] doc: announce abi change for interrupt > mode > > > > The patch announces the planned ABI changes for interrupt mode. > > > > Signed-off-by: Cunming Liang > > --- > > v3 change: > >- reword for CONFIG_RTE_NEXT_ABI > > > > v2 change: > >- rebase to recent master > > > > doc/guides/rel_notes/deprecation.rst | 5 + > > 1 file changed, 5 insertions(+) > > > > diff --git a/doc/guides/rel_notes/deprecation.rst > > b/doc/guides/rel_notes/deprecation.rst > > index 5330d3b..d36d267 100644 > > --- a/doc/guides/rel_notes/deprecation.rst > > +++ b/doc/guides/rel_notes/deprecation.rst > > @@ -35,3 +35,8 @@ Deprecation Notices > > * The following fields have been deprecated in rte_eth_stats: > >imissed, ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss, > >tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff > > + > > +* The ABI changes are planned for struct rte_intr_handle, struct > > rte_eth_conf > > + and struct eth_dev_ops to support interrupt mode feature from > release > > 2.1. > > + Those changes may be enabled in the upcoming release 2.1 > > + with CONFIG_RTE_NEXT_ABI. > > -- > > 1.8.1.4
[dpdk-dev] lost when learning how to test dpdk
On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin wrote: > Hi, > > thanks for reply. I could see those docs but it does not help me a lot. > I still do not understand very well the principle of the tool. How it > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and > testpmd. They have somehow opposite meaning. I can start testpmd now, > however, it does ot probe any NIC. I've tried -w to whitelist certain > NICs but with no success. > > $ dpdk_nic_bind --status > > Network devices using DPDK-compatible driver > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > unused=e1000 > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > unused=e1000 > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check the device-id. Adding support in PMD should not be a problem, but I am not sure on support since there is End of Life listed on Intel Website http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller > Network devices using kernel driver > === > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > unused=uio_pci_generic *Active* > DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested currently. > > Other network devices > = > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 > EAL: Detected lcore 0 as core 0 on socket 0 > EAL: Detected lcore 1 as core 1 on socket 0 > EAL: Detected lcore 2 as core 0 on socket 0 > EAL: Detected lcore 3 as core 1 on socket 0 > EAL: Support maximum 128 logical core(s) by configuration. > EAL: Detected 4 lcore(s) > EAL: VFIO modules not all loaded, skip VFIO support... > EAL: Setting up physically contiguous memory... > EAL: Ask a virtual area of 0x3c0 bytes > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fe973a0 (size = 0x20) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fe97360 (size = 0x20) > EAL: Ask a virtual area of 0x3c0 bytes > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fe96f40 (size = 0x20) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fe96f00 (size = 0x20) > EAL: Requesting 64 pages of size 2MB from socket 0 > EAL: TSC frequency is ~368 KHz > EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0]) > EAL: lcore 1 is ready (tid=6efff700;cpuset=[1]) > EAL: No probed ethernet devices > Interactive-mode selected > Done > testpmd> > > Thanks > Jan Viktorin > > On Wed, 29 Jul 2015 12:09:06 +0300 > ciprian.barbu wrote: > > > > > > > On 28.07.2015 21:13, Jan Viktorin wrote: > > > Hello all, > > > > > > I am learning how to measure throughput with dpdk. I have 4 cores > > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs connected > > > together. I do not understand very well, how to setup testpmd. > > > > http://dpdk.org/doc > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html > > http://dpdk.org/doc/quick-start > > > > > > > > I've successfully bound the NICs to dpdk: > > > > > > $ dpdk_nic_bind --status > > > > > > Network devices using DPDK-compatible driver > > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > unused=e1000 > > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > unused=e1000 > > > > > > Network devices using kernel driver > > > === > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > unused=uio_pci_generic *Active* > > > > > > Other network devices > > > = > > > > > > > > > and then I tried to run testpmd: > > > > > > sudo ./testpmd -b :03:00.0 -b :03:02.0 -c 0xf -n2 -- > --nb-cores=1 --nb-ports=0 --rxd=2048 --txd=2048 --mbcache=512 --burst=512 > > > > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html#testpmd-command-line-options > > > > The -b option black lists your PCI devices, you don't need those. The > > --nb-ports is of course the number of ports, it cannot be 0. > > > > > ... > > > EAL: Ask a virtual area of 0x40 bytes > > > EAL: Virtual area found at 0x7f154980 (size = 0x40) > > > EAL: Requesting 1024 pages of size 2MB from socket 0 > > > EAL: TSC frequency is ~369 KHz > > > EAL: Master lcore 0 is ready (tid=de94a8c0;cpuset=[0]) > > > EAL: lcore 2 is ready (tid=487fd700;cpuset=[2]) > > > EAL: lcore 3 is ready (tid=47ffc700;cpuset=[3]) > > > EAL: lcore 1 is ready (tid=48ffe700;cpuset=[1]) > > > EAL: No probed ethernet devices > > > EAL: Error - exiting with code: 1 > > >Cause: Invalid port 0 > > > > > > I tried --nb-ports={0,1,2} but neither of them works. BTW, what does > this option it mean? :) > > > I could not find a
[dpdk-dev] [PACTH v2 1/2] mk: use LDLIBS variable when building the shared object file
Some .so libraries needs to be linked with external libraries. For that the LDLIBS variable should be present on the link line when those .so files are created. PMD Makefile is responsible for filling the LDLIBS variable with the link to the external library it needs. Signed-off-by: Nelio Laranjeiro Acked-by: Olivier Matz --- Changelog: add missing EXTRA_LDFLAGS variable necessary to link with an external library when it is not installed on the system or located somewhere else. mk/rte.lib.mk | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk index 9ff5cce..fcc8e20 100644 --- a/mk/rte.lib.mk +++ b/mk/rte.lib.mk @@ -81,7 +81,8 @@ O_TO_A_DO = @set -e; \ $(O_TO_A) && \ echo $(O_TO_A_CMD) > $(call exe2cmd,$(@)) -O_TO_S = $(LD) $(_CPU_LDFLAGS) -shared $(OBJS-y) -Wl,-soname,$(LIB) -o $(LIB) +O_TO_S = $(LD) $(_CPU_LDFLAGS) $(EXTRA_LDFLAGS) $(LDLIBS) -shared $(OBJS-y) \ +-Wl,-soname,$(LIB) -o $(LIB) O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)"," LD $(@)") O_TO_S_DO = @set -e; \ -- 1.9.1
[dpdk-dev] [PACTH v2 2/2] mlx4: fix shared library dependency
librte_pmd_mlx4.so needs to be linked with libiverbs otherwise, the PMD is not able to open Mellanox devices and the following message is printed by testpmd at startup "librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?". Applications dependency on libverbs are moved to be only valid in static mode, in shared mode, applications do not depend on it anymore, librte_pmd_mlx4.so keeps this dependency and thus is linked with libverbs. Signed-off-by: Nelio Laranjeiro Acked-by: Olivier Matz --- Changelog: don't compiled MLX4 PMD when the DPDK is build in combined shared library. doc/guides/nics/mlx4.rst | 5 + drivers/net/Makefile | 6 +- drivers/net/mlx4/Makefile | 1 + mk/rte.app.mk | 2 +- 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index c33aa38..840cb65 100644 --- a/doc/guides/nics/mlx4.rst +++ b/doc/guides/nics/mlx4.rst @@ -47,6 +47,11 @@ There is also a `section dedicated to this poll mode driver be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and recompiling DPDK. +.. warning:: + + ``CONFIG_RTE_BUILD_COMBINE_LIBS`` is not supported (if set, it will not + compile this PMD even if ``CONFIG_RTE_LIBRTE_MLX4_PMD`` is set). + Implementation details -- diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 5ebf963..1725c94 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -40,7 +40,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe -DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4 DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap @@ -49,5 +48,10 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt +# Drivers not support in combined mode +ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n) +DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4 +endif + include $(RTE_SDK)/mk/rte.sharelib.mk include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile index 14cb53f..d2f5692 100644 --- a/drivers/net/mlx4/Makefile +++ b/drivers/net/mlx4/Makefile @@ -50,6 +50,7 @@ CFLAGS += -g CFLAGS += -I. CFLAGS += -D_XOPEN_SOURCE=600 CFLAGS += $(WERROR_FLAGS) +LDLIBS += -libverbs # A few warnings cannot be avoided in external headers. CFLAGS += -Wno-error=cast-qual diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 97719cb..04af756 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -100,7 +100,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),n) _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST) += -lfuse endif -_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -libverbs _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += -lz _LDLIBS-y += --start-group @@ -140,6 +139,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING) += -lrte_pmd_ring _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null +_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -libverbs endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB) -- 1.9.1
[dpdk-dev] RFC: i40e xmit path HW limitation
Hi, Konstantin, Helin, there is a documented limitation of xl710 controllers (i40e driver) which is not handled in any way by a DPDK driver. From the datasheet chapter 8.4.1: "? A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including both the header and payload buffers). ? The total number of data descriptors for the whole TSO (explained later on in this chapter) is unlimited as long as each segment within the TSO obeys the previous rule (up to 8 data descriptors per segment for both the TSO header and the segment payload buffers)." This means that, for instance, long cluster with small fragments has to be linearized before it may be placed on the HW ring. In more standard environments like Linux or FreeBSD drivers the solution is straight forward - call skb_linearize()/m_collapse() corresponding. In the non-conformist environment like DPDK life is not that easy - there is no easy way to collapse the cluster into a linear buffer from inside the device driver since device driver doesn't allocate memory in a fast path and utilizes the user allocated pools only. Here are two proposals for a solution: 1. We may provide a callback that would return a user TRUE if a give cluster has to be linearized and it should always be called before rte_eth_tx_burst(). Alternatively it may be called from inside the rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some error code for a case when one of the clusters it's given has to be linearized. 2. Another option is to allocate a mempool in the driver with the elements consuming a single page each (standard 2KB buffers would do). Number of elements in the pool should be as Tx ring length multiplied by "64KB/(linear data length of the buffer in the pool above)". Here I use 64KB as a maximum packet length and not taking into an account esoteric things like "Giant" TSO mentioned in the spec above. Then we may actually go and linearize the cluster if needed on top of the buffers from the pool above, post the buffer from the mempool above on the HW ring, link the original cluster to that new cluster (using the private data) and release it when the send is done. The first is a change in the API and would require from the application some additional handling (linearization). The second would require some additional memory but would keep all dirty details inside the driver and would leave the rest of the code intact. Pls., comment. thanks, vlad
[dpdk-dev] lost when learning how to test dpdk
The 82545 is listed at http://dpdk.org/doc/nics and I can see it in rte_pci_dev_ids.h/e1000_hw.h: 196 #define E1000_DEV_ID_82545GM_COPPER 0x1026 $ lspci -nn ... 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit Ethernet Controller [8086:1026] (rev 04) 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit Ethernet Controller [8086:1026] (rev 04) However, it is rev 04 and in e1000_hw.h there is just e1000_82545_rev_3. But this should not avoid the match (?). Is it possible to grow the verbosity level of the device matching process in DPDK? I do not expect any support, I just wanted to use it for sending traffic at 1 Gbps because there are two such cards mostly unused in my computer. I did not plan to use I217-V (in fact, I did not expect much from this integrated NIC and I did not even notice it is an Intel one...). Regards Jan Viktorin On Thu, 30 Jul 2015 07:44:14 -0700 Ravi Kerur wrote: > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin > wrote: > > > Hi, > > > > thanks for reply. I could see those docs but it does not help me a lot. > > I still do not understand very well the principle of the tool. How it > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and > > testpmd. They have somehow opposite meaning. I can start testpmd now, > > however, it does ot probe any NIC. I've tried -w to whitelist certain > > NICs but with no success. > > > > $ dpdk_nic_bind --status > > > > Network devices using DPDK-compatible driver > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > unused=e1000 > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > unused=e1000 > > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check the > device-id. Adding support in PMD should not be a problem, but I am not > sure on support since there is End of Life listed on Intel Website > > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller > > > > Network devices using kernel driver > > === > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > unused=uio_pci_generic *Active* > > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested > currently. > > > > > > Other network devices > > = > > > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 > > EAL: Detected lcore 0 as core 0 on socket 0 > > EAL: Detected lcore 1 as core 1 on socket 0 > > EAL: Detected lcore 2 as core 0 on socket 0 > > EAL: Detected lcore 3 as core 1 on socket 0 > > EAL: Support maximum 128 logical core(s) by configuration. > > EAL: Detected 4 lcore(s) > > EAL: VFIO modules not all loaded, skip VFIO support... > > EAL: Setting up physically contiguous memory... > > EAL: Ask a virtual area of 0x3c0 bytes > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0) > > EAL: Ask a virtual area of 0x20 bytes > > EAL: Virtual area found at 0x7fe973a0 (size = 0x20) > > EAL: Ask a virtual area of 0x20 bytes > > EAL: Virtual area found at 0x7fe97360 (size = 0x20) > > EAL: Ask a virtual area of 0x3c0 bytes > > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0) > > EAL: Ask a virtual area of 0x20 bytes > > EAL: Virtual area found at 0x7fe96f40 (size = 0x20) > > EAL: Ask a virtual area of 0x20 bytes > > EAL: Virtual area found at 0x7fe96f00 (size = 0x20) > > EAL: Requesting 64 pages of size 2MB from socket 0 > > EAL: TSC frequency is ~368 KHz > > EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0]) > > EAL: lcore 1 is ready (tid=6efff700;cpuset=[1]) > > EAL: No probed ethernet devices > > Interactive-mode selected > > Done > > testpmd> > > > > Thanks > > Jan Viktorin > > > > On Wed, 29 Jul 2015 12:09:06 +0300 > > ciprian.barbu wrote: > > > > > > > > > > > On 28.07.2015 21:13, Jan Viktorin wrote: > > > > Hello all, > > > > > > > > I am learning how to measure throughput with dpdk. I have 4 cores > > > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs connected > > > > together. I do not understand very well, how to setup testpmd. > > > > > > http://dpdk.org/doc > > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html > > > http://dpdk.org/doc/quick-start > > > > > > > > > > > I've successfully bound the NICs to dpdk: > > > > > > > > $ dpdk_nic_bind --status > > > > > > > > Network devices using DPDK-compatible driver > > > > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > unused=e1000 > > > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > unused=e1000 > > > > > > > > Network devices using kernel driver > > > > === > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > unused=uio_pci_generic *Active* > > > > > > > > O
[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area
> -Original Message- > From: Olivier Matz [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 6:56 AM > To: dev at dpdk.org > Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; > martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com > Subject: [PATCH] mbuf: enforce alignment of mbuf private area > > It looks better to have a data buffer address that is aligned to > 8 bytes. This is the case when there is no mbuf private area, but if there is > one, > the alignment depends on the size of this area that is located between the > mbuf > structure and the data buffer. > > Indeed, some drivers expects to have the buffer address aligned to an even > address, and moreover an unaligned buffer may impact the performance when > accessing to network headers. > > Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint > before creating the mempool. For applications that use the alternative way > (direct call to rte_mempool_create), also add an assertion in > rte_pktmbuf_init(). > > By the way, also add the MBUF log type. > > Signed-off-by: Olivier Matz > --- > lib/librte_eal/common/include/rte_log.h | 1 + > lib/librte_mbuf/rte_mbuf.c | 8 +++- > lib/librte_mbuf/rte_mbuf.h | 7 +-- > 3 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_eal/common/include/rte_log.h > b/lib/librte_eal/common/include/rte_log.h > index 24a55cc..ede0dca 100644 > --- a/lib/librte_eal/common/include/rte_log.h > +++ b/lib/librte_eal/common/include/rte_log.h > @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs; > #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */ > #define RTE_LOGTYPE_TABLE 0x4000 /**< Log related to table. */ > #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */ > +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */ > > /* these log types can be used in an application */ > #define RTE_LOGTYPE_USER1 0x0100 /**< User-defined log type 1. */ > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index > 4320dd4..a1ddbb3 100644 > --- a/lib/librte_mbuf/rte_mbuf.c > +++ b/lib/librte_mbuf/rte_mbuf.c > @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, > mbuf_size = sizeof(struct rte_mbuf) + priv_size; > buf_len = rte_pktmbuf_data_room_size(mp); > > + RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0); Using RTE_ALIGN() could be more readable? > RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size); > RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); > > @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned > n, > struct rte_pktmbuf_pool_private mbp_priv; > unsigned elt_size; > > - > + if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) { Using RTE_ALIGN() could be more readable? > + RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", > + priv_size); > + rte_errno = EINVAL; > + return NULL; > + } > elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + > (unsigned)data_room_size; > mbp_priv.mbuf_data_room_size = data_room_size; diff --git > a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index > 010b32d..c3b8c98 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -698,6 +698,9 @@ extern "C" { > > RTE_PTYPE_INNER_L4_MASK)) #endif /* RTE_NEXT_ABI */ > > +/** Alignment constraint of mbuf private area. */ #define > +RTE_MBUF_PRIV_ALIGN 8 > + > /** > * Get the name of a RX offload flag > * > @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, > void *opaque_arg); > * details. > * @param priv_size > * Size of application private are between the rte_mbuf structure > - * and the data buffer. > + * and the data buffer. This value must be aligned to > RTE_MBUF_PRIV_ALIGN. > * @param data_room_size > * Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM. > * @param socket_id > @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, > void *opaque_arg); > * with rte_errno set appropriately. Possible rte_errno values include: > *- E_RTE_NO_CONFIG - function could not get pointer to rte_config > structure > *- E_RTE_SECONDARY - function was called from a secondary process > instance > - *- EINVAL - cache size provided is too large > + *- EINVAL - cache size provided is too large, or priv_size is not > aligned. > *- ENOSPC - the maximum number of memzones has already been > allocated > *- EEXIST - a memzone with the same name already exists > *- ENOMEM - no appropriate memory area found in which to create > memzone > -- > 2.1.4
[dpdk-dev] lost when learning how to test dpdk
On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin wrote: > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in > rte_pci_dev_ids.h/e1000_hw.h: > > 196 #define E1000_DEV_ID_82545GM_COPPER 0x1026 > > $ lspci -nn > ... > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > Ethernet Controller [8086:1026] (rev 04) > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > Ethernet Controller [8086:1026] (rev 04) > > However, it is rev 04 and in e1000_hw.h there is just e1000_82545_rev_3. > But this should not avoid the match (?). Is it possible to grow the > verbosity > level of the device matching process in DPDK? > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add device-id via RTE_PCI_DEV_ID_DECL_EM. > > I do not expect any support, I just wanted to use it for sending traffic > at 1 Gbps because there are two such cards mostly unused in my computer. > I did not plan to use I217-V (in fact, I did not expect much from this > integrated NIC and I did not even notice it is an Intel one...). > > Regards > Jan Viktorin > > On Thu, 30 Jul 2015 07:44:14 -0700 > Ravi Kerur wrote: > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin > > wrote: > > > > > Hi, > > > > > > thanks for reply. I could see those docs but it does not help me a lot. > > > I still do not understand very well the principle of the tool. How it > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and > > > testpmd. They have somehow opposite meaning. I can start testpmd now, > > > however, it does ot probe any NIC. I've tried -w to whitelist certain > > > NICs but with no success. > > > > > > $ dpdk_nic_bind --status > > > > > > Network devices using DPDK-compatible driver > > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > > unused=e1000 > > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > > unused=e1000 > > > > > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check > the > > device-id. Adding support in PMD should not be a problem, but I am not > > sure on support since there is End of Life listed on Intel Website > > > > > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller > > > > > > > Network devices using kernel driver > > > === > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > > unused=uio_pci_generic *Active* > > > > > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested > > currently. > > > > > > > > > > Other network devices > > > = > > > > > > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 > > > EAL: Detected lcore 0 as core 0 on socket 0 > > > EAL: Detected lcore 1 as core 1 on socket 0 > > > EAL: Detected lcore 2 as core 0 on socket 0 > > > EAL: Detected lcore 3 as core 1 on socket 0 > > > EAL: Support maximum 128 logical core(s) by configuration. > > > EAL: Detected 4 lcore(s) > > > EAL: VFIO modules not all loaded, skip VFIO support... > > > EAL: Setting up physically contiguous memory... > > > EAL: Ask a virtual area of 0x3c0 bytes > > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0) > > > EAL: Ask a virtual area of 0x20 bytes > > > EAL: Virtual area found at 0x7fe973a0 (size = 0x20) > > > EAL: Ask a virtual area of 0x20 bytes > > > EAL: Virtual area found at 0x7fe97360 (size = 0x20) > > > EAL: Ask a virtual area of 0x3c0 bytes > > > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0) > > > EAL: Ask a virtual area of 0x20 bytes > > > EAL: Virtual area found at 0x7fe96f40 (size = 0x20) > > > EAL: Ask a virtual area of 0x20 bytes > > > EAL: Virtual area found at 0x7fe96f00 (size = 0x20) > > > EAL: Requesting 64 pages of size 2MB from socket 0 > > > EAL: TSC frequency is ~368 KHz > > > EAL: Master lcore 0 is ready (tid=7989d8c0;cpuset=[0]) > > > EAL: lcore 1 is ready (tid=6efff700;cpuset=[1]) > > > EAL: No probed ethernet devices > > > Interactive-mode selected > > > Done > > > testpmd> > > > > > > Thanks > > > Jan Viktorin > > > > > > On Wed, 29 Jul 2015 12:09:06 +0300 > > > ciprian.barbu wrote: > > > > > > > > > > > > > > > On 28.07.2015 21:13, Jan Viktorin wrote: > > > > > Hello all, > > > > > > > > > > I am learning how to measure throughput with dpdk. I have 4 cores > > > > > Intel(R) Core(TM) i3-4360 CPU @ 3.70GHz and two 82545GM NICs > connected > > > > > together. I do not understand very well, how to setup testpmd. > > > > > > > > http://dpdk.org/doc > > > > http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html > > > > http://dpdk.org/doc/quick-start > > > > > > > > > > > > > > I've successfully bound the NICs to dpdk: > > > > > > > > > > $ dpdk_nic_bind --status > > > > > > > > > > Network devices using DPDK-compatible driver > > > > > ==
[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area
On 07/30/2015 04:13 PM, Ananyev, Konstantin wrote: > > Hi Olivier, > > If fails to compile for me: > > /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c: In function > ?rte_pktmbuf_pool_create?: > /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: error: > ?rte_errno? undeclared (first use in this function) > rte_errno = EINVAL; > ^ > /local/kananye1/dpdk.org-mbprv1/lib/librte_mbuf/rte_mbuf.c:161:3: note: each > undeclared identifier is reported only once for each function it appears in > > I had to add: Sorry I had the same error but I forgot to squash the fix... :/ I'm sending a v2 > > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c > index a1ddbb3..04344c0 100644 > --- a/lib/librte_mbuf/rte_mbuf.c > +++ b/lib/librte_mbuf/rte_mbuf.c > @@ -58,6 +58,7 @@ > #include > #include > #include > +#include > > /* >* ctrlmbuf constructor, given as a callback function to > > Apart from that - looks good to me. > Konstantin > >> -Original Message- >> From: Olivier Matz [mailto:olivier.matz at 6wind.com] >> Sent: Thursday, July 30, 2015 2:56 PM >> To: dev at dpdk.org >> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; >> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com >> Subject: [PATCH] mbuf: enforce alignment of mbuf private area >> >> It looks better to have a data buffer address that is aligned to >> 8 bytes. This is the case when there is no mbuf private area, but >> if there is one, the alignment depends on the size of this area >> that is located between the mbuf structure and the data buffer. >> >> Indeed, some drivers expects to have the buffer address aligned >> to an even address, and moreover an unaligned buffer may impact >> the performance when accessing to network headers. >> >> Add a check in rte_pktmbuf_pool_create() to verify the alignment >> constraint before creating the mempool. For applications that use >> the alternative way (direct call to rte_mempool_create), also >> add an assertion in rte_pktmbuf_init(). >> >> By the way, also add the MBUF log type. >> >> Signed-off-by: Olivier Matz >> --- >> lib/librte_eal/common/include/rte_log.h | 1 + >> lib/librte_mbuf/rte_mbuf.c | 8 +++- >> lib/librte_mbuf/rte_mbuf.h | 7 +-- >> 3 files changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/lib/librte_eal/common/include/rte_log.h >> b/lib/librte_eal/common/include/rte_log.h >> index 24a55cc..ede0dca 100644 >> --- a/lib/librte_eal/common/include/rte_log.h >> +++ b/lib/librte_eal/common/include/rte_log.h >> @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs; >> #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */ >> #define RTE_LOGTYPE_TABLE 0x4000 /**< Log related to table. */ >> #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */ >> +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */ >> >> /* these log types can be used in an application */ >> #define RTE_LOGTYPE_USER1 0x0100 /**< User-defined log type 1. */ >> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c >> index 4320dd4..a1ddbb3 100644 >> --- a/lib/librte_mbuf/rte_mbuf.c >> +++ b/lib/librte_mbuf/rte_mbuf.c >> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, >> mbuf_size = sizeof(struct rte_mbuf) + priv_size; >> buf_len = rte_pktmbuf_data_room_size(mp); >> >> +RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0); >> RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size); >> RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >> >> @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, >> struct rte_pktmbuf_pool_private mbp_priv; >> unsigned elt_size; >> >> - >> +if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) { >> +RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", >> +priv_size); >> +rte_errno = EINVAL; >> +return NULL; >> +} >> elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + >> (unsigned)data_room_size; >> mbp_priv.mbuf_data_room_size = data_room_size; >> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h >> index 010b32d..c3b8c98 100644 >> --- a/lib/librte_mbuf/rte_mbuf.h >> +++ b/lib/librte_mbuf/rte_mbuf.h >> @@ -698,6 +698,9 @@ extern "C" { >>RTE_PTYPE_INNER_L4_MASK)) >> #endif /* RTE_NEXT_ABI */ >> >> +/** Alignment constraint of mbuf private area. */ >> +#define RTE_MBUF_PRIV_ALIGN 8 >> + >> /** >>* Get the name of a RX offload flag >>* >> @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, >> void *opaque_arg); >>* details. >>* @param priv_size >>* Size of application private are between the rte_mbuf structure >> - * and the data buffer. >> + * and the data buffer. This value must be aligned to RTE_MBUF_PR
[dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area
On 07/30/2015 05:33 PM, Zhang, Helin wrote: > > >> -Original Message- >> From: Olivier Matz [mailto:olivier.matz at 6wind.com] >> Sent: Thursday, July 30, 2015 6:56 AM >> To: dev at dpdk.org >> Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; >> martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com >> Subject: [PATCH] mbuf: enforce alignment of mbuf private area >> >> It looks better to have a data buffer address that is aligned to >> 8 bytes. This is the case when there is no mbuf private area, but if there >> is one, >> the alignment depends on the size of this area that is located between the >> mbuf >> structure and the data buffer. >> >> Indeed, some drivers expects to have the buffer address aligned to an even >> address, and moreover an unaligned buffer may impact the performance when >> accessing to network headers. >> >> Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint >> before creating the mempool. For applications that use the alternative way >> (direct call to rte_mempool_create), also add an assertion in >> rte_pktmbuf_init(). >> >> By the way, also add the MBUF log type. >> >> Signed-off-by: Olivier Matz >> --- >> lib/librte_eal/common/include/rte_log.h | 1 + >> lib/librte_mbuf/rte_mbuf.c | 8 +++- >> lib/librte_mbuf/rte_mbuf.h | 7 +-- >> 3 files changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/lib/librte_eal/common/include/rte_log.h >> b/lib/librte_eal/common/include/rte_log.h >> index 24a55cc..ede0dca 100644 >> --- a/lib/librte_eal/common/include/rte_log.h >> +++ b/lib/librte_eal/common/include/rte_log.h >> @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs; >> #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */ >> #define RTE_LOGTYPE_TABLE 0x4000 /**< Log related to table. */ >> #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */ >> +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */ >> >> /* these log types can be used in an application */ >> #define RTE_LOGTYPE_USER1 0x0100 /**< User-defined log type 1. */ >> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index >> 4320dd4..a1ddbb3 100644 >> --- a/lib/librte_mbuf/rte_mbuf.c >> +++ b/lib/librte_mbuf/rte_mbuf.c >> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, >> mbuf_size = sizeof(struct rte_mbuf) + priv_size; >> buf_len = rte_pktmbuf_data_room_size(mp); >> >> +RTE_MBUF_ASSERT((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) == 0); > Using RTE_ALIGN() could be more readable? > >> RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size); >> RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >> >> @@ -154,7 +155,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned >> n, >> struct rte_pktmbuf_pool_private mbp_priv; >> unsigned elt_size; >> >> - >> +if ((priv_size & (RTE_MBUF_PRIV_ALIGN - 1)) != 0) { > Using RTE_ALIGN() could be more readable? Will do, thanks for commenting. > >> +RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", >> +priv_size); >> +rte_errno = EINVAL; >> +return NULL; >> +} >> elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + >> (unsigned)data_room_size; >> mbp_priv.mbuf_data_room_size = data_room_size; diff --git >> a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index >> 010b32d..c3b8c98 100644 >> --- a/lib/librte_mbuf/rte_mbuf.h >> +++ b/lib/librte_mbuf/rte_mbuf.h >> @@ -698,6 +698,9 @@ extern "C" { >> >> RTE_PTYPE_INNER_L4_MASK)) #endif /* RTE_NEXT_ABI */ >> >> +/** Alignment constraint of mbuf private area. */ #define >> +RTE_MBUF_PRIV_ALIGN 8 >> + >> /** >>* Get the name of a RX offload flag >>* >> @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, >> void *opaque_arg); >>* details. >>* @param priv_size >>* Size of application private are between the rte_mbuf structure >> - * and the data buffer. >> + * and the data buffer. This value must be aligned to >> RTE_MBUF_PRIV_ALIGN. >>* @param data_room_size >>* Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM. >>* @param socket_id >> @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, >> void *opaque_arg); >>* with rte_errno set appropriately. Possible rte_errno values include: >>*- E_RTE_NO_CONFIG - function could not get pointer to rte_config >> structure >>*- E_RTE_SECONDARY - function was called from a secondary process >> instance >> - *- EINVAL - cache size provided is too large >> + *- EINVAL - cache size provided is too large, or priv_size is not >> aligned. >>*- ENOSPC - the maximum number of memzones has already been >> allocated >>*- EEXIST - a memzone with the same name already exists >>*- ENOMEM - no appropriate memory area found in which to create >> memzon
[dpdk-dev] i40e xmit path HW limitation
> -Original Message- > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > Sent: Thursday, July 30, 2015 7:58 AM > To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin > Subject: RFC: i40e xmit path HW limitation > > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver) which is > not > handled in any way by a DPDK driver. > From the datasheet chapter 8.4.1: > > "? A single transmit packet may span up to 8 buffers (up to 8 data > descriptors per > packet including both the header and payload buffers). > ? The total number of data descriptors for the whole TSO (explained later on > in > this chapter) is unlimited as long as each segment within the TSO obeys the > previous rule (up to 8 data descriptors per segment for both the TSO header > and > the segment payload buffers)." Yes, I remember the RX side just supports 5 segments per packet receiving. But what's the possible issue you thought about? > > This means that, for instance, long cluster with small fragments has to be > linearized before it may be placed on the HW ring. What type of size of the small fragments? Basically 2KB is the default size of mbuf of most example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the maximum packet size we supported. If 1KB mbuf is used, don't expect it can transmit more than 8KB size of packet. > In more standard environments like Linux or FreeBSD drivers the solution is > straight forward - call skb_linearize()/m_collapse() corresponding. > In the non-conformist environment like DPDK life is not that easy - there is > no > easy way to collapse the cluster into a linear buffer from inside the device > driver > since device driver doesn't allocate memory in a fast path and utilizes the > user > allocated pools only. > > Here are two proposals for a solution: > > 1. We may provide a callback that would return a user TRUE if a give > cluster has to be linearized and it should always be called before > rte_eth_tx_burst(). Alternatively it may be called from inside the > rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some > error code for a case when one of the clusters it's given has to be > linearized. > 2. Another option is to allocate a mempool in the driver with the > elements consuming a single page each (standard 2KB buffers would > do). Number of elements in the pool should be as Tx ring length > multiplied by "64KB/(linear data length of the buffer in the pool > above)". Here I use 64KB as a maximum packet length and not taking > into an account esoteric things like "Giant" TSO mentioned in the > spec above. Then we may actually go and linearize the cluster if > needed on top of the buffers from the pool above, post the buffer > from the mempool above on the HW ring, link the original cluster to > that new cluster (using the private data) and release it when the > send is done. > > > The first is a change in the API and would require from the application some > additional handling (linearization). The second would require some additional > memory but would keep all dirty details inside the driver and would leave the > rest of the code intact. > > Pls., comment. > > thanks, > vlad >
[dpdk-dev] RFC: i40e xmit path HW limitation
On Thu, 30 Jul 2015 17:57:33 +0300 Vlad Zolotarov wrote: > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver) > which is not handled in any way by a DPDK driver. > From the datasheet chapter 8.4.1: > > "? A single transmit packet may span up to 8 buffers (up to 8 data > descriptors per packet including > both the header and payload buffers). > ? The total number of data descriptors for the whole TSO (explained later on > in this chapter) is > unlimited as long as each segment within the TSO obeys the previous rule (up > to 8 data descriptors > per segment for both the TSO header and the segment payload buffers)." > > This means that, for instance, long cluster with small fragments has to > be linearized before it may be placed on the HW ring. > In more standard environments like Linux or FreeBSD drivers the solution > is straight forward - call skb_linearize()/m_collapse() corresponding. > In the non-conformist environment like DPDK life is not that easy - > there is no easy way to collapse the cluster into a linear buffer from > inside the device driver > since device driver doesn't allocate memory in a fast path and utilizes > the user allocated pools only. > > Here are two proposals for a solution: > > 1. We may provide a callback that would return a user TRUE if a give > cluster has to be linearized and it should always be called before > rte_eth_tx_burst(). Alternatively it may be called from inside the > rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some > error code for a case when one of the clusters it's given has to be > linearized. > 2. Another option is to allocate a mempool in the driver with the > elements consuming a single page each (standard 2KB buffers would > do). Number of elements in the pool should be as Tx ring length > multiplied by "64KB/(linear data length of the buffer in the pool > above)". Here I use 64KB as a maximum packet length and not taking > into an account esoteric things like "Giant" TSO mentioned in the > spec above. Then we may actually go and linearize the cluster if > needed on top of the buffers from the pool above, post the buffer > from the mempool above on the HW ring, link the original cluster to > that new cluster (using the private data) and release it when the > send is done. Or just silently drop heavily scattered packets (and increment oerrors) with a PMD_TX_LOG debug message. I think a DPDK driver doesn't have to accept all possible mbufs and do extra work. It seems reasonable to expect caller to be well behaved in this restricted ecosystem.
[dpdk-dev] RFC: i40e xmit path HW limitation
On 07/30/2015 07:17 PM, Stephen Hemminger wrote: > On Thu, 30 Jul 2015 17:57:33 +0300 > Vlad Zolotarov wrote: > >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) >> which is not handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "? A single transmit packet may span up to 8 buffers (up to 8 data >> descriptors per packet including >> both the header and payload buffers). >> ? The total number of data descriptors for the whole TSO (explained later on >> in this chapter) is >> unlimited as long as each segment within the TSO obeys the previous rule (up >> to 8 data descriptors >> per segment for both the TSO header and the segment payload buffers)." >> >> This means that, for instance, long cluster with small fragments has to >> be linearized before it may be placed on the HW ring. >> In more standard environments like Linux or FreeBSD drivers the solution >> is straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - >> there is no easy way to collapse the cluster into a linear buffer from >> inside the device driver >> since device driver doesn't allocate memory in a fast path and utilizes >> the user allocated pools only. >> >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called before >> rte_eth_tx_burst(). Alternatively it may be called from inside the >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some >> error code for a case when one of the clusters it's given has to be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not taking >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster to >> that new cluster (using the private data) and release it when the >> send is done. > Or just silently drop heavily scattered packets (and increment oerrors) > with a PMD_TX_LOG debug message. > > I think a DPDK driver doesn't have to accept all possible mbufs and do > extra work. It seems reasonable to expect caller to be well behaved > in this restricted ecosystem. > How can the caller know what's well behaved? It's device dependent.
[dpdk-dev] lost when learning how to test dpdk
OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list. Much better now: EAL: Requesting 64 pages of size 2MB from socket 0 EAL: TSC frequency is ~365 KHz EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0]) EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1]) EAL: PCI device :03:00.0 on NUMA socket -1 EAL: probe driver: 8086:1026 rte_em_pmd EAL: PCI memory mapped at 0x7fde44a0 EAL: PCI memory mapped at 0x7fde44a2 PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026 EAL: PCI device :03:02.0 on NUMA socket -1 EAL: probe driver: 8086:1026 rte_em_pmd EAL: Not managed by a supported kernel driver, skipped Interactive-mode selected EAL: Error - exiting with code: 1 Cause: Creation of mbuf pool for socket 0 failed I've tried both uio_pci_generic and igb_uio. Is there anything else I can do about it? Jan V. On Thu, 30 Jul 2015 08:41:47 -0700 Ravi Kerur wrote: > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin > wrote: > > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in > > rte_pci_dev_ids.h/e1000_hw.h: > > > > 196 #define E1000_DEV_ID_82545GM_COPPER 0x1026 > > > > $ lspci -nn > > ... > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > > Ethernet Controller [8086:1026] (rev 04) > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > > Ethernet Controller [8086:1026] (rev 04) > > > > However, it is rev 04 and in e1000_hw.h there is just e1000_82545_rev_3. > > But this should not avoid the match (?). Is it possible to grow the > > verbosity > > level of the device matching process in DPDK? > > > > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add > device-id via RTE_PCI_DEVEM_ID_DECL_EM. > > > > > I do not expect any support, I just wanted to use it for sending traffic > > at 1 Gbps because there are two such cards mostly unused in my computer. > > I did not plan to use I217-V (in fact, I did not expect much from this > > integrated NIC and I did not even notice it is an Intel one...). > > > > Regards > > Jan Viktorin > > > > On Thu, 30 Jul 2015 07:44:14 -0700 > > Ravi Kerur wrote: > > > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin > > > wrote: > > > > > > > Hi, > > > > > > > > thanks for reply. I could see those docs but it does not help me a lot. > > > > I still do not understand very well the principle of the tool. How it > > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind and > > > > testpmd. They have somehow opposite meaning. I can start testpmd now, > > > > however, it does ot probe any NIC. I've tried -w to whitelist certain > > > > NICs but with no success. > > > > > > > > $ dpdk_nic_bind --status > > > > > > > > Network devices using DPDK-compatible driver > > > > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > > > unused=e1000 > > > > :03:02.0 '82545GM Gigabit Ethernet Controller' drv=uio_pci_generic > > > > unused=e1000 > > > > > > > > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and check > > the > > > device-id. Adding support in PMD should not be a problem, but I am not > > > sure on support since there is End of Life listed on Intel Website > > > > > > > > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller > > > > > > > > > > Network devices using kernel driver > > > > === > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > > > unused=uio_pci_generic *Active* > > > > > > > > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being tested > > > currently. > > > > > > > > > > > > > > Other network devices > > > > = > > > > > > > > > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 > > > > EAL: Detected lcore 0 as core 0 on socket 0 > > > > EAL: Detected lcore 1 as core 1 on socket 0 > > > > EAL: Detected lcore 2 as core 0 on socket 0 > > > > EAL: Detected lcore 3 as core 1 on socket 0 > > > > EAL: Support maximum 128 logical core(s) by configuration. > > > > EAL: Detected 4 lcore(s) > > > > EAL: VFIO modules not all loaded, skip VFIO support... > > > > EAL: Setting up physically contiguous memory... > > > > EAL: Ask a virtual area of 0x3c0 bytes > > > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0) > > > > EAL: Ask a virtual area of 0x20 bytes > > > > EAL: Virtual area found at 0x7fe973a0 (size = 0x20) > > > > EAL: Ask a virtual area of 0x20 bytes > > > > EAL: Virtual area found at 0x7fe97360 (size = 0x20) > > > > EAL: Ask a virtual area of 0x3c0 bytes > > > > EAL: Virtual area found at 0x7fe96f80 (size = 0x3c0) > > > > EAL: Ask a virtual area of 0x20 bytes > > > > EAL: Virtual area found at 0x7fe96f40 (size = 0x20) > > > > EAL: Ask a virtual area of 0x20 bytes > > > > EAL: Virtual area found at 0x7fe96f00 (size = 0x20) >
[dpdk-dev] [PATCH v2] mbuf: enforce alignment of mbuf private area
It looks better to have a data buffer address that is aligned to 8 bytes. This is the case when there is no mbuf private area, but if there is one, the alignment depends on the size of this area that is located between the mbuf structure and the data buffer. Indeed, some drivers expects to have the buffer address aligned to an even address, and moreover an unaligned buffer may impact the performance when accessing to network headers. Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint before creating the mempool. For applications that use the alternative way (direct call to rte_mempool_create), also add an assertion in rte_pktmbuf_init(). By the way, also add the MBUF log type. Signed-off-by: Olivier Matz --- lib/librte_eal/common/include/rte_log.h | 1 + lib/librte_mbuf/rte_mbuf.c | 9 - lib/librte_mbuf/rte_mbuf.h | 7 +-- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h index 24a55cc..ede0dca 100644 --- a/lib/librte_eal/common/include/rte_log.h +++ b/lib/librte_eal/common/include/rte_log.h @@ -77,6 +77,7 @@ extern struct rte_logs rte_logs; #define RTE_LOGTYPE_PORT0x2000 /**< Log related to port. */ #define RTE_LOGTYPE_TABLE 0x4000 /**< Log related to table. */ #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */ +#define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */ /* these log types can be used in an application */ #define RTE_LOGTYPE_USER1 0x0100 /**< User-defined log type 1. */ diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 4320dd4..e416312 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -58,6 +58,7 @@ #include #include #include +#include /* * ctrlmbuf constructor, given as a callback function to @@ -125,6 +126,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, mbuf_size = sizeof(struct rte_mbuf) + priv_size; buf_len = rte_pktmbuf_data_room_size(mp); + RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size); RTE_MBUF_ASSERT(mp->elt_size >= mbuf_size); RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); @@ -154,7 +156,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, struct rte_pktmbuf_pool_private mbp_priv; unsigned elt_size; - + if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) { + RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", + priv_size); + rte_errno = EINVAL; + return NULL; + } elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + (unsigned)data_room_size; mbp_priv.mbuf_data_room_size = data_room_size; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 010b32d..c3b8c98 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -698,6 +698,9 @@ extern "C" { RTE_PTYPE_INNER_L4_MASK)) #endif /* RTE_NEXT_ABI */ +/** Alignment constraint of mbuf private area. */ +#define RTE_MBUF_PRIV_ALIGN 8 + /** * Get the name of a RX offload flag * @@ -1238,7 +1241,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg); * details. * @param priv_size * Size of application private are between the rte_mbuf structure - * and the data buffer. + * and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN. * @param data_room_size * Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM. * @param socket_id @@ -1250,7 +1253,7 @@ void rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg); * with rte_errno set appropriately. Possible rte_errno values include: *- E_RTE_NO_CONFIG - function could not get pointer to rte_config structure *- E_RTE_SECONDARY - function was called from a secondary process instance - *- EINVAL - cache size provided is too large + *- EINVAL - cache size provided is too large, or priv_size is not aligned. *- ENOSPC - the maximum number of memzones has already been allocated *- EEXIST - a memzone with the same name already exists *- ENOMEM - no appropriate memory area found in which to create memzone -- 2.1.4
[dpdk-dev] [PACTH v2 2/2] mlx4: fix shared library dependency
2015-07-30 16:48, Nelio Laranjeiro: > librte_pmd_mlx4.so needs to be linked with libiverbs otherwise, the PMD is not > able to open Mellanox devices and the following message is printed by testpmd > at startup "librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?". > > Applications dependency on libverbs are moved to be only valid in static mode, > in shared mode, applications do not depend on it anymore, > librte_pmd_mlx4.so keeps this dependency and thus is linked with libverbs. > > Signed-off-by: Nelio Laranjeiro > Acked-by: Olivier Matz > --- > Changelog: don't compiled MLX4 PMD when the DPDK is build in combined shared > library. MLX4 cannot be supported in combined shared library because there is no clean way of adding -libverbs to the combined library. (This comment should be in the commit message) > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -40,7 +40,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic > DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k > DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e > DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe > -DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4 > DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe > DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null > DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap > @@ -49,5 +48,10 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio > DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 > DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt > > +# Drivers not support in combined mode This comment is useless. > +ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n) It can be enabled if building a static combined library. > +DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4 There is no good reason to move this line.
[dpdk-dev] [PATCH v2] mbuf: enforce alignment of mbuf private area
> -Original Message- > From: Olivier Matz [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 9:22 AM > To: dev at dpdk.org > Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; > martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com > Subject: [PATCH v2] mbuf: enforce alignment of mbuf private area > > It looks better to have a data buffer address that is aligned to > 8 bytes. This is the case when there is no mbuf private area, but if there is > one, > the alignment depends on the size of this area that is located between the > mbuf > structure and the data buffer. > > Indeed, some drivers expects to have the buffer address aligned to an even > address, and moreover an unaligned buffer may impact the performance when > accessing to network headers. > > Add a check in rte_pktmbuf_pool_create() to verify the alignment constraint > before creating the mempool. For applications that use the alternative way > (direct call to rte_mempool_create), also add an assertion in > rte_pktmbuf_init(). > > By the way, also add the MBUF log type. > > Signed-off-by: Olivier Matz Acked-by: Helin Zhang
[dpdk-dev] i40e xmit path HW limitation
On 07/30/15 19:10, Zhang, Helin wrote: > >> -Original Message- >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] >> Sent: Thursday, July 30, 2015 7:58 AM >> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin >> Subject: RFC: i40e xmit path HW limitation >> >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) which is >> not >> handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "? A single transmit packet may span up to 8 buffers (up to 8 data >> descriptors per >> packet including both the header and payload buffers). >> ? The total number of data descriptors for the whole TSO (explained later on >> in >> this chapter) is unlimited as long as each segment within the TSO obeys the >> previous rule (up to 8 data descriptors per segment for both the TSO header >> and >> the segment payload buffers)." > Yes, I remember the RX side just supports 5 segments per packet receiving. > But what's the possible issue you thought about? Note that it's a Tx size we are talking about. See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo. If such a cluster arrives and you post it on the HW ring - HW will shut this HW ring down permanently. The application will see that it's ring is stuck. > >> This means that, for instance, long cluster with small fragments has to be >> linearized before it may be placed on the HW ring. > What type of size of the small fragments? Basically 2KB is the default size > of mbuf of most > example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the > maximum > packet size we supported. > If 1KB mbuf is used, don't expect it can transmit more than 8KB size of > packet. I kinda lost u here. Again, we talk about the Tx side here and buffers are not obligatory completely filled. Namely there may be a cluster with 15 fragments 100 bytes each. > >> In more standard environments like Linux or FreeBSD drivers the solution is >> straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - there is >> no >> easy way to collapse the cluster into a linear buffer from inside the device >> driver >> since device driver doesn't allocate memory in a fast path and utilizes the >> user >> allocated pools only. >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called before >> rte_eth_tx_burst(). Alternatively it may be called from inside the >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some >> error code for a case when one of the clusters it's given has to be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not taking >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster to >> that new cluster (using the private data) and release it when the >> send is done. >> >> >> The first is a change in the API and would require from the application some >> additional handling (linearization). The second would require some additional >> memory but would keep all dirty details inside the driver and would leave the >> rest of the code intact. >> >> Pls., comment. >> >> thanks, >> vlad >>
[dpdk-dev] RFC: i40e xmit path HW limitation
On 07/30/15 19:20, Avi Kivity wrote: > > > On 07/30/2015 07:17 PM, Stephen Hemminger wrote: >> On Thu, 30 Jul 2015 17:57:33 +0300 >> Vlad Zolotarov wrote: >> >>> Hi, Konstantin, Helin, >>> there is a documented limitation of xl710 controllers (i40e driver) >>> which is not handled in any way by a DPDK driver. >>> From the datasheet chapter 8.4.1: >>> >>> "? A single transmit packet may span up to 8 buffers (up to 8 data >>> descriptors per packet including >>> both the header and payload buffers). >>> ? The total number of data descriptors for the whole TSO (explained >>> later on in this chapter) is >>> unlimited as long as each segment within the TSO obeys the previous >>> rule (up to 8 data descriptors >>> per segment for both the TSO header and the segment payload buffers)." >>> >>> This means that, for instance, long cluster with small fragments has to >>> be linearized before it may be placed on the HW ring. >>> In more standard environments like Linux or FreeBSD drivers the >>> solution >>> is straight forward - call skb_linearize()/m_collapse() corresponding. >>> In the non-conformist environment like DPDK life is not that easy - >>> there is no easy way to collapse the cluster into a linear buffer from >>> inside the device driver >>> since device driver doesn't allocate memory in a fast path and utilizes >>> the user allocated pools only. >>> >>> Here are two proposals for a solution: >>> >>> 1. We may provide a callback that would return a user TRUE if a give >>> cluster has to be linearized and it should always be called before >>> rte_eth_tx_burst(). Alternatively it may be called from inside the >>> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return >>> some >>> error code for a case when one of the clusters it's given has >>> to be >>> linearized. >>> 2. Another option is to allocate a mempool in the driver with the >>> elements consuming a single page each (standard 2KB buffers would >>> do). Number of elements in the pool should be as Tx ring length >>> multiplied by "64KB/(linear data length of the buffer in the pool >>> above)". Here I use 64KB as a maximum packet length and not taking >>> into an account esoteric things like "Giant" TSO mentioned in the >>> spec above. Then we may actually go and linearize the cluster if >>> needed on top of the buffers from the pool above, post the buffer >>> from the mempool above on the HW ring, link the original >>> cluster to >>> that new cluster (using the private data) and release it when the >>> send is done. >> Or just silently drop heavily scattered packets (and increment oerrors) >> with a PMD_TX_LOG debug message. >> >> I think a DPDK driver doesn't have to accept all possible mbufs and do >> extra work. It seems reasonable to expect caller to be well behaved >> in this restricted ecosystem. >> > > How can the caller know what's well behaved? It's device dependent. +1 Stephen, how do you imagine this well-behaved application? Having switch case by an underlying device type and then "well-behaving" correspondingly? Not to mention that to "well-behave" the application writer has to read HW specs and understand them, which would limit the amount of DPDK developers to a very small amount of people... ;) Not to mention that the mentioned above switch-case would be a super ugly thing to be found in an application that would raise a big question about the justification of a DPDK existence as as SDK providing device drivers interface. ;) > >
[dpdk-dev] RFC: i40e xmit path HW limitation
On Thu, 30 Jul 2015 19:50:27 +0300 Vlad Zolotarov wrote: > > > On 07/30/15 19:20, Avi Kivity wrote: > > > > > > On 07/30/2015 07:17 PM, Stephen Hemminger wrote: > >> On Thu, 30 Jul 2015 17:57:33 +0300 > >> Vlad Zolotarov wrote: > >> > >>> Hi, Konstantin, Helin, > >>> there is a documented limitation of xl710 controllers (i40e driver) > >>> which is not handled in any way by a DPDK driver. > >>> From the datasheet chapter 8.4.1: > >>> > >>> "? A single transmit packet may span up to 8 buffers (up to 8 data > >>> descriptors per packet including > >>> both the header and payload buffers). > >>> ? The total number of data descriptors for the whole TSO (explained > >>> later on in this chapter) is > >>> unlimited as long as each segment within the TSO obeys the previous > >>> rule (up to 8 data descriptors > >>> per segment for both the TSO header and the segment payload buffers)." > >>> > >>> This means that, for instance, long cluster with small fragments has to > >>> be linearized before it may be placed on the HW ring. > >>> In more standard environments like Linux or FreeBSD drivers the > >>> solution > >>> is straight forward - call skb_linearize()/m_collapse() corresponding. > >>> In the non-conformist environment like DPDK life is not that easy - > >>> there is no easy way to collapse the cluster into a linear buffer from > >>> inside the device driver > >>> since device driver doesn't allocate memory in a fast path and utilizes > >>> the user allocated pools only. > >>> > >>> Here are two proposals for a solution: > >>> > >>> 1. We may provide a callback that would return a user TRUE if a give > >>> cluster has to be linearized and it should always be called before > >>> rte_eth_tx_burst(). Alternatively it may be called from inside the > >>> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return > >>> some > >>> error code for a case when one of the clusters it's given has > >>> to be > >>> linearized. > >>> 2. Another option is to allocate a mempool in the driver with the > >>> elements consuming a single page each (standard 2KB buffers would > >>> do). Number of elements in the pool should be as Tx ring length > >>> multiplied by "64KB/(linear data length of the buffer in the pool > >>> above)". Here I use 64KB as a maximum packet length and not taking > >>> into an account esoteric things like "Giant" TSO mentioned in the > >>> spec above. Then we may actually go and linearize the cluster if > >>> needed on top of the buffers from the pool above, post the buffer > >>> from the mempool above on the HW ring, link the original > >>> cluster to > >>> that new cluster (using the private data) and release it when the > >>> send is done. > >> Or just silently drop heavily scattered packets (and increment oerrors) > >> with a PMD_TX_LOG debug message. > >> > >> I think a DPDK driver doesn't have to accept all possible mbufs and do > >> extra work. It seems reasonable to expect caller to be well behaved > >> in this restricted ecosystem. > >> > > > > How can the caller know what's well behaved? It's device dependent. > > +1 > > Stephen, how do you imagine this well-behaved application? Having switch > case by an underlying device type and then "well-behaving" correspondingly? > Not to mention that to "well-behave" the application writer has to read > HW specs and understand them, which would limit the amount of DPDK > developers to a very small amount of people... ;) Not to mention that > the mentioned above switch-case would be a super ugly thing to be found > in an application that would raise a big question about the > justification of a DPDK existence as as SDK providing device drivers > interface. ;) Either have a RTE_MAX_MBUF_SEGMENTS that is global or a mbuf_linearize function? Driver already can stash the mbuf pool used for Rx and reuse it for the transient Tx buffers.
[dpdk-dev] [PATCH v2] enic: silence log message unless debug enabled
This blocks the annoying ENIC driver initialization message unless debug is enabled. Drivers should speak only when spoken to and not be chatty. Signed-off-by: Stephen Hemminger --- drivers/net/enic/enic_compat.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/enic/enic_compat.h b/drivers/net/enic/enic_compat.h index f3598ed..94656c8 100644 --- a/drivers/net/enic/enic_compat.h +++ b/drivers/net/enic/enic_compat.h @@ -82,7 +82,11 @@ #define dev_err(x, args...) dev_printk(ERR, args) #define dev_info(x, args...) dev_printk(INFO, args) #define dev_warning(x, args...) dev_printk(WARNING, args) +#ifdef RTE_LIBRTE_ENIC_DEBUG #define dev_debug(x, args...) dev_printk(DEBUG, args) +#else +#define dev_debug(x, args...) do { } while(0) +#endif #define __le16 u16 #define __le32 u32 -- 2.1.4
[dpdk-dev] lost when learning how to test dpdk
On Thu, Jul 30, 2015 at 9:19 AM, Jan Viktorin wrote: > OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list. > Much better now: > > EAL: Requesting 64 pages of size 2MB from socket 0 > EAL: TSC frequency is ~365 KHz > EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0]) > EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1]) > EAL: PCI device :03:00.0 on NUMA socket -1 > EAL: probe driver: 8086:1026 rte_em_pmd > EAL: PCI memory mapped at 0x7fde44a0 > EAL: PCI memory mapped at 0x7fde44a2 > PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026 > EAL: PCI device :03:02.0 on NUMA socket -1 > EAL: probe driver: 8086:1026 rte_em_pmd > EAL: Not managed by a supported kernel driver, skipped > Interactive-mode selected > EAL: Error - exiting with code: 1 > Cause: Creation of mbuf pool for socket 0 failed > > I've tried both uio_pci_generic and igb_uio. Is there anything else > I can do about it? > I am attaching patch from M Jay/Cuming (Intel engineers), it's not yet integrated into DPDK mainline. You need it to fix above mbuf error. > > Jan V. > > On Thu, 30 Jul 2015 08:41:47 -0700 > Ravi Kerur wrote: > > > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin > > wrote: > > > > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in > > > rte_pci_dev_ids.h/e1000_hw.h: > > > > > > 196 #define E1000_DEV_ID_82545GM_COPPER 0x1026 > > > > > > $ lspci -nn > > > ... > > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > > > Ethernet Controller [8086:1026] (rev 04) > > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > > > Ethernet Controller [8086:1026] (rev 04) > > > > > > However, it is rev 04 and in e1000_hw.h there is just > e1000_82545_rev_3. > > > But this should not avoid the match (?). Is it possible to grow the > > > verbosity > > > level of the device matching process in DPDK? > > > > > > > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add > > device-id via RTE_PCI_DEVEM_ID_DECL_EM. > > > > > > > > I do not expect any support, I just wanted to use it for sending > traffic > > > at 1 Gbps because there are two such cards mostly unused in my > computer. > > > I did not plan to use I217-V (in fact, I did not expect much from this > > > integrated NIC and I did not even notice it is an Intel one...). > > > > > > Regards > > > Jan Viktorin > > > > > > On Thu, 30 Jul 2015 07:44:14 -0700 > > > Ravi Kerur wrote: > > > > > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin < > viktorin at rehivetech.com> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > thanks for reply. I could see those docs but it does not help me a > lot. > > > > > I still do not understand very well the principle of the tool. How > it > > > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind > and > > > > > testpmd. They have somehow opposite meaning. I can start testpmd > now, > > > > > however, it does ot probe any NIC. I've tried -w to whitelist > certain > > > > > NICs but with no success. > > > > > > > > > > $ dpdk_nic_bind --status > > > > > > > > > > Network devices using DPDK-compatible driver > > > > > > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' > drv=uio_pci_generic > > > > > unused=e1000 > > > > > :03:02.0 '82545GM Gigabit Ethernet Controller' > drv=uio_pci_generic > > > > > unused=e1000 > > > > > > > > > > > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and > check > > > the > > > > device-id. Adding support in PMD should not be a problem, but I am > not > > > > sure on support since there is End of Life listed on Intel Website > > > > > > > > > > > > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller > > > > > > > > > > > > > Network devices using kernel driver > > > > > === > > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > > > > unused=uio_pci_generic *Active* > > > > > > > > > > > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being > tested > > > > currently. > > > > > > > > > > > > > > > > > > Other network devices > > > > > = > > > > > > > > > > > > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 > > > > > EAL: Detected lcore 0 as core 0 on socket 0 > > > > > EAL: Detected lcore 1 as core 1 on socket 0 > > > > > EAL: Detected lcore 2 as core 0 on socket 0 > > > > > EAL: Detected lcore 3 as core 1 on socket 0 > > > > > EAL: Support maximum 128 logical core(s) by configuration. > > > > > EAL: Detected 4 lcore(s) > > > > > EAL: VFIO modules not all loaded, skip VFIO support... > > > > > EAL: Setting up physically contiguous memory... > > > > > EAL: Ask a virtual area of 0x3c0 bytes > > > > > EAL: Virtual area found at 0x7fe973e0 (size = 0x3c0) > > > > > EAL: Ask a virtual area of 0x20 bytes > > > > > EAL: Virtual area found at 0x7fe973a0 (
[dpdk-dev] lost when learning how to test dpdk
On Thu, Jul 30, 2015 at 10:06 AM, Ravi Kerur wrote: > > > On Thu, Jul 30, 2015 at 9:19 AM, Jan Viktorin > wrote: > >> OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list. >> Much better now: >> >> EAL: Requesting 64 pages of size 2MB from socket 0 >> EAL: TSC frequency is ~365 KHz >> EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0]) >> EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1]) >> EAL: PCI device :03:00.0 on NUMA socket -1 >> EAL: probe driver: 8086:1026 rte_em_pmd >> EAL: PCI memory mapped at 0x7fde44a0 >> EAL: PCI memory mapped at 0x7fde44a2 >> PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026 >> EAL: PCI device :03:02.0 on NUMA socket -1 >> EAL: probe driver: 8086:1026 rte_em_pmd >> EAL: Not managed by a supported kernel driver, skipped >> Interactive-mode selected >> EAL: Error - exiting with code: 1 >> Cause: Creation of mbuf pool for socket 0 failed >> >> I've tried both uio_pci_generic and igb_uio. Is there anything else >> I can do about it? >> > > I am attaching patch from M Jay/Cuming (Intel engineers), it's not yet > integrated into DPDK mainline. You need it to fix above mbuf error. > In addition if you are allocating too little hugepages it can cause issues esp 64. I usually allocate > 1024 hugepages. > >> Jan V. >> >> On Thu, 30 Jul 2015 08:41:47 -0700 >> Ravi Kerur wrote: >> >> > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin >> > wrote: >> > >> > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in >> > > rte_pci_dev_ids.h/e1000_hw.h: >> > > >> > > 196 #define E1000_DEV_ID_82545GM_COPPER 0x1026 >> > > >> > > $ lspci -nn >> > > ... >> > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit >> > > Ethernet Controller [8086:1026] (rev 04) >> > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit >> > > Ethernet Controller [8086:1026] (rev 04) >> > > >> > > However, it is rev 04 and in e1000_hw.h there is just >> e1000_82545_rev_3. >> > > But this should not avoid the match (?). Is it possible to grow the >> > > verbosity >> > > level of the device matching process in DPDK? >> > > >> > >> > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add >> > device-id via RTE_PCI_DEVEM_ID_DECL_EM. >> > >> > > >> > > I do not expect any support, I just wanted to use it for sending >> traffic >> > > at 1 Gbps because there are two such cards mostly unused in my >> computer. >> > > I did not plan to use I217-V (in fact, I did not expect much from this >> > > integrated NIC and I did not even notice it is an Intel one...). >> > > >> > > Regards >> > > Jan Viktorin >> > > >> > > On Thu, 30 Jul 2015 07:44:14 -0700 >> > > Ravi Kerur wrote: >> > > >> > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin < >> viktorin at rehivetech.com> >> > > > wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > thanks for reply. I could see those docs but it does not help me >> a lot. >> > > > > I still do not understand very well the principle of the tool. >> How it >> > > > > chooses the NICs to use? Previously I confused -b in >> dpdk_nic_bind and >> > > > > testpmd. They have somehow opposite meaning. I can start testpmd >> now, >> > > > > however, it does ot probe any NIC. I've tried -w to whitelist >> certain >> > > > > NICs but with no success. >> > > > > >> > > > > $ dpdk_nic_bind --status >> > > > > >> > > > > Network devices using DPDK-compatible driver >> > > > > >> > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' >> drv=uio_pci_generic >> > > > > unused=e1000 >> > > > > :03:02.0 '82545GM Gigabit Ethernet Controller' >> drv=uio_pci_generic >> > > > > unused=e1000 >> > > > > >> > > > >> > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and >> check >> > > the >> > > > device-id. Adding support in PMD should not be a problem, but I am >> not >> > > > sure on support since there is End of Life listed on Intel Website >> > > > >> > > > >> > > >> http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller >> > > > >> > > > >> > > > > Network devices using kernel driver >> > > > > === >> > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e >> > > > > unused=uio_pci_generic *Active* >> > > > > >> > > > >> > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being >> tested >> > > > currently. >> > > > >> > > > >> > > > > >> > > > > Other network devices >> > > > > = >> > > > > >> > > > > >> > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 >> > > > > EAL: Detected lcore 0 as core 0 on socket 0 >> > > > > EAL: Detected lcore 1 as core 1 on socket 0 >> > > > > EAL: Detected lcore 2 as core 0 on socket 0 >> > > > > EAL: Detected lcore 3 as core 1 on socket 0 >> > > > > EAL: Support maximum 128 logical core(s) by configuration. >> > > > > EAL: Detected 4 lcore(s) >> > > > > EAL: VFIO modules n
[dpdk-dev] RFC: i40e xmit path HW limitation
On 07/30/15 20:01, Stephen Hemminger wrote: > On Thu, 30 Jul 2015 19:50:27 +0300 > Vlad Zolotarov wrote: > >> >> On 07/30/15 19:20, Avi Kivity wrote: >>> >>> On 07/30/2015 07:17 PM, Stephen Hemminger wrote: On Thu, 30 Jul 2015 17:57:33 +0300 Vlad Zolotarov wrote: > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver) > which is not handled in any way by a DPDK driver. >From the datasheet chapter 8.4.1: > > "? A single transmit packet may span up to 8 buffers (up to 8 data > descriptors per packet including > both the header and payload buffers). > ? The total number of data descriptors for the whole TSO (explained > later on in this chapter) is > unlimited as long as each segment within the TSO obeys the previous > rule (up to 8 data descriptors > per segment for both the TSO header and the segment payload buffers)." > > This means that, for instance, long cluster with small fragments has to > be linearized before it may be placed on the HW ring. > In more standard environments like Linux or FreeBSD drivers the > solution > is straight forward - call skb_linearize()/m_collapse() corresponding. > In the non-conformist environment like DPDK life is not that easy - > there is no easy way to collapse the cluster into a linear buffer from > inside the device driver > since device driver doesn't allocate memory in a fast path and utilizes > the user allocated pools only. > > Here are two proposals for a solution: > >1. We may provide a callback that would return a user TRUE if a give > cluster has to be linearized and it should always be called before > rte_eth_tx_burst(). Alternatively it may be called from inside the > rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return > some > error code for a case when one of the clusters it's given has > to be > linearized. >2. Another option is to allocate a mempool in the driver with the > elements consuming a single page each (standard 2KB buffers would > do). Number of elements in the pool should be as Tx ring length > multiplied by "64KB/(linear data length of the buffer in the pool > above)". Here I use 64KB as a maximum packet length and not taking > into an account esoteric things like "Giant" TSO mentioned in the > spec above. Then we may actually go and linearize the cluster if > needed on top of the buffers from the pool above, post the buffer > from the mempool above on the HW ring, link the original > cluster to > that new cluster (using the private data) and release it when the > send is done. Or just silently drop heavily scattered packets (and increment oerrors) with a PMD_TX_LOG debug message. I think a DPDK driver doesn't have to accept all possible mbufs and do extra work. It seems reasonable to expect caller to be well behaved in this restricted ecosystem. >>> How can the caller know what's well behaved? It's device dependent. >> +1 >> >> Stephen, how do you imagine this well-behaved application? Having switch >> case by an underlying device type and then "well-behaving" correspondingly? >> Not to mention that to "well-behave" the application writer has to read >> HW specs and understand them, which would limit the amount of DPDK >> developers to a very small amount of people... ;) Not to mention that >> the mentioned above switch-case would be a super ugly thing to be found >> in an application that would raise a big question about the >> justification of a DPDK existence as as SDK providing device drivers >> interface. ;) > Either have a RTE_MAX_MBUF_SEGMENTS And what would it be in our care? 8? This would limit the maximum TSO packet to 16KB for 2KB buffers. > that is global or > a mbuf_linearize function? Driver already can stash the > mbuf pool used for Rx and reuse it for the transient Tx buffers. First of all who can guaranty that that pool would meet our needs - namely have large enough buffers? Secondly, using user's Rx mempool for that would be really not nice (read - dirty) towards the user that may had allocated the specific amount of buffers in it according to some calculations that didn't include the usage from the Tx flow. And lastly and most importantly, this would require using the atomic operations during access to Rx mempool, that would both require a specific mempool initialization and would significantly hit the performance. >
[dpdk-dev] RFC: i40e xmit path HW limitation
On 07/30/2015 08:01 PM, Stephen Hemminger wrote: > On Thu, 30 Jul 2015 19:50:27 +0300 > Vlad Zolotarov wrote: > >> >> On 07/30/15 19:20, Avi Kivity wrote: >>> >>> On 07/30/2015 07:17 PM, Stephen Hemminger wrote: On Thu, 30 Jul 2015 17:57:33 +0300 Vlad Zolotarov wrote: > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver) > which is not handled in any way by a DPDK driver. >From the datasheet chapter 8.4.1: > > "? A single transmit packet may span up to 8 buffers (up to 8 data > descriptors per packet including > both the header and payload buffers). > ? The total number of data descriptors for the whole TSO (explained > later on in this chapter) is > unlimited as long as each segment within the TSO obeys the previous > rule (up to 8 data descriptors > per segment for both the TSO header and the segment payload buffers)." > > This means that, for instance, long cluster with small fragments has to > be linearized before it may be placed on the HW ring. > In more standard environments like Linux or FreeBSD drivers the > solution > is straight forward - call skb_linearize()/m_collapse() corresponding. > In the non-conformist environment like DPDK life is not that easy - > there is no easy way to collapse the cluster into a linear buffer from > inside the device driver > since device driver doesn't allocate memory in a fast path and utilizes > the user allocated pools only. > > Here are two proposals for a solution: > >1. We may provide a callback that would return a user TRUE if a give > cluster has to be linearized and it should always be called before > rte_eth_tx_burst(). Alternatively it may be called from inside the > rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return > some > error code for a case when one of the clusters it's given has > to be > linearized. >2. Another option is to allocate a mempool in the driver with the > elements consuming a single page each (standard 2KB buffers would > do). Number of elements in the pool should be as Tx ring length > multiplied by "64KB/(linear data length of the buffer in the pool > above)". Here I use 64KB as a maximum packet length and not taking > into an account esoteric things like "Giant" TSO mentioned in the > spec above. Then we may actually go and linearize the cluster if > needed on top of the buffers from the pool above, post the buffer > from the mempool above on the HW ring, link the original > cluster to > that new cluster (using the private data) and release it when the > send is done. Or just silently drop heavily scattered packets (and increment oerrors) with a PMD_TX_LOG debug message. I think a DPDK driver doesn't have to accept all possible mbufs and do extra work. It seems reasonable to expect caller to be well behaved in this restricted ecosystem. >>> How can the caller know what's well behaved? It's device dependent. >> +1 >> >> Stephen, how do you imagine this well-behaved application? Having switch >> case by an underlying device type and then "well-behaving" correspondingly? >> Not to mention that to "well-behave" the application writer has to read >> HW specs and understand them, which would limit the amount of DPDK >> developers to a very small amount of people... ;) Not to mention that >> the mentioned above switch-case would be a super ugly thing to be found >> in an application that would raise a big question about the >> justification of a DPDK existence as as SDK providing device drivers >> interface. ;) > Either have a RTE_MAX_MBUF_SEGMENTS that is global or > a mbuf_linearize function? Driver already can stash the > mbuf pool used for Rx and reuse it for the transient Tx buffers. > The pass/fail criteria is much more complicated than that. You might have a packet with 340 fragments successfully transmitted (64k/1500*8) or a packet with 9 fragments fail. What's wrong with exposing the pass/fail criteria as a driver-supplied function? If the application is sure that its mbufs pass, it can choose not to call it. A less constrained application will call it, and linearize the packet itself if it fails the test.
[dpdk-dev] lost when learning how to test dpdk
Thank you. I think the patch did not help. I applied and the error was still there. After setting 1024 hugepages, it starts working. Jan V. On Thu, 30 Jul 2015 10:06:09 -0700 Ravi Kerur wrote: > On Thu, Jul 30, 2015 at 9:19 AM, Jan Viktorin > wrote: > > > OK, I've added the card into RTE_PCI_DEVEM_ID_DECL_EM list. > > Much better now: > > > > EAL: Requesting 64 pages of size 2MB from socket 0 > > EAL: TSC frequency is ~365 KHz > > EAL: Master lcore 0 is ready (tid=467d78c0;cpuset=[0]) > > EAL: lcore 1 is ready (tid=3a5ff700;cpuset=[1]) > > EAL: PCI device :03:00.0 on NUMA socket -1 > > EAL: probe driver: 8086:1026 rte_em_pmd > > EAL: PCI memory mapped at 0x7fde44a0 > > EAL: PCI memory mapped at 0x7fde44a2 > > PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1026 > > EAL: PCI device :03:02.0 on NUMA socket -1 > > EAL: probe driver: 8086:1026 rte_em_pmd > > EAL: Not managed by a supported kernel driver, skipped > > Interactive-mode selected > > EAL: Error - exiting with code: 1 > > Cause: Creation of mbuf pool for socket 0 failed > > > > I've tried both uio_pci_generic and igb_uio. Is there anything else > > I can do about it? > > > > I am attaching patch from M Jay/Cuming (Intel engineers), it's not yet > integrated into DPDK mainline. You need it to fix above mbuf error. > > > > > Jan V. > > > > On Thu, 30 Jul 2015 08:41:47 -0700 > > Ravi Kerur wrote: > > > > > On Thu, Jul 30, 2015 at 8:22 AM, Jan Viktorin > > > wrote: > > > > > > > The 82545 is listed at http://dpdk.org/doc/nics and I can see it in > > > > rte_pci_dev_ids.h/e1000_hw.h: > > > > > > > > 196 #define E1000_DEV_ID_82545GM_COPPER 0x1026 > > > > > > > > $ lspci -nn > > > > ... > > > > 03:00.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > > > > Ethernet Controller [8086:1026] (rev 04) > > > > 03:02.0 Ethernet controller [0200]: Intel Corporation 82545GM Gigabit > > > > Ethernet Controller [8086:1026] (rev 04) > > > > > > > > However, it is rev 04 and in e1000_hw.h there is just > > e1000_82545_rev_3. > > > > But this should not avoid the match (?). Is it possible to grow the > > > > verbosity > > > > level of the device matching process in DPDK? > > > > > > > > > > Check lib/librte_eal/common/include/rte_pci_dev_ids.h, you need to add > > > device-id via RTE_PCI_DEVEM_ID_DECL_EM. > > > > > > > > > > > I do not expect any support, I just wanted to use it for sending > > traffic > > > > at 1 Gbps because there are two such cards mostly unused in my > > computer. > > > > I did not plan to use I217-V (in fact, I did not expect much from this > > > > integrated NIC and I did not even notice it is an Intel one...). > > > > > > > > Regards > > > > Jan Viktorin > > > > > > > > On Thu, 30 Jul 2015 07:44:14 -0700 > > > > Ravi Kerur wrote: > > > > > > > > > On Thu, Jul 30, 2015 at 5:03 AM, Jan Viktorin < > > viktorin at rehivetech.com> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > thanks for reply. I could see those docs but it does not help me a > > lot. > > > > > > I still do not understand very well the principle of the tool. How > > it > > > > > > chooses the NICs to use? Previously I confused -b in dpdk_nic_bind > > and > > > > > > testpmd. They have somehow opposite meaning. I can start testpmd > > now, > > > > > > however, it does ot probe any NIC. I've tried -w to whitelist > > certain > > > > > > NICs but with no success. > > > > > > > > > > > > $ dpdk_nic_bind --status > > > > > > > > > > > > Network devices using DPDK-compatible driver > > > > > > > > > > > > :03:00.0 '82545GM Gigabit Ethernet Controller' > > drv=uio_pci_generic > > > > > > unused=e1000 > > > > > > :03:02.0 '82545GM Gigabit Ethernet Controller' > > drv=uio_pci_generic > > > > > > unused=e1000 > > > > > > > > > > > > > > > > NICs may not be supported by PMD drivers yet. Do "lspci -nn" and > > check > > > > the > > > > > device-id. Adding support in PMD should not be a problem, but I am > > not > > > > > sure on support since there is End of Life listed on Intel Website > > > > > > > > > > > > > > > > http://ark.intel.com/products/4964/Intel-82545GM-Gigabit-Ethernet-Controller > > > > > > > > > > > > > > > > Network devices using kernel driver > > > > > > === > > > > > > :00:19.0 'Ethernet Connection I217-V' if=eno1 drv=e1000e > > > > > > unused=uio_pci_generic *Active* > > > > > > > > > > > > > > > > DPDK doesn't bind Active NIC, support for I217-V in PMD is being > > tested > > > > > currently. > > > > > > > > > > > > > > > > > > > > > > Other network devices > > > > > > = > > > > > > > > > > > > > > > > > > $ sudo testpmd -c 0x3 -n 2 -- -i --total-num-mbufs=2048 > > > > > > EAL: Detected lcore 0 as core 0 on socket 0 > > > > > > EAL: Detected lcore 1 as core 1 on socket 0 > > > > > > EAL: Detected lcore 2 as core 0 on socket 0 > > > > > > EAL: Detected lcore 3 as core
[dpdk-dev] i40e xmit path HW limitation
> -Original Message- > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > Sent: Thursday, July 30, 2015 9:44 AM > To: Zhang, Helin; Ananyev, Konstantin > Cc: dev at dpdk.org > Subject: Re: i40e xmit path HW limitation > > > > On 07/30/15 19:10, Zhang, Helin wrote: > > > >> -Original Message- > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > >> Sent: Thursday, July 30, 2015 7:58 AM > >> To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin > >> Subject: RFC: i40e xmit path HW limitation > >> > >> Hi, Konstantin, Helin, > >> there is a documented limitation of xl710 controllers (i40e driver) > >> which is not handled in any way by a DPDK driver. > >> From the datasheet chapter 8.4.1: > >> > >> "? A single transmit packet may span up to 8 buffers (up to 8 data > >> descriptors per packet including both the header and payload buffers). > >> ? The total number of data descriptors for the whole TSO (explained > >> later on in this chapter) is unlimited as long as each segment within > >> the TSO obeys the previous rule (up to 8 data descriptors per segment > >> for both the TSO header and the segment payload buffers)." > > Yes, I remember the RX side just supports 5 segments per packet receiving. > > But what's the possible issue you thought about? > Note that it's a Tx size we are talking about. > > See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo. > If such a cluster arrives and you post it on the HW ring - HW will shut this > HW ring > down permanently. The application will see that it's ring is stuck. That issue was because of using more than 8 descriptors for a packet for TSO. > > > > >> This means that, for instance, long cluster with small fragments has to be > >> linearized before it may be placed on the HW ring. > > What type of size of the small fragments? Basically 2KB is the default size > > of > mbuf of most > > example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the > maximum > > packet size we supported. > > If 1KB mbuf is used, don't expect it can transmit more than 8KB size of > > packet. > > I kinda lost u here. Again, we talk about the Tx side here and buffers > are not obligatory completely filled. Namely there may be a cluster with > 15 fragments 100 bytes each. The root cause is using more than 8 descriptors for a packet. Linux driver can help on reducing number of descriptors to be used by merging small size of payload together, right? It is not for TSO, it is just for packet transmitting. 2 options in my mind: 1. Use should ensure it will not use more than 8 descriptors per packet for transmitting. 2. DPDK driver should try to merge small packet together for such case, like Linux kernel driver. I prefer to use option 1, users should ensure that in the application or up layer software, and keep the PMD driver as simple as possible. But I have a thought that the maximum number of RX/TX descriptor should be able to be queried somewhere. Regards, Helin > > > > >> In more standard environments like Linux or FreeBSD drivers the solution is > >> straight forward - call skb_linearize()/m_collapse() corresponding. > >> In the non-conformist environment like DPDK life is not that easy - there > >> is no > >> easy way to collapse the cluster into a linear buffer from inside the > >> device > driver > >> since device driver doesn't allocate memory in a fast path and utilizes > >> the user > >> allocated pools only. > >> Here are two proposals for a solution: > >> > >> 1. We may provide a callback that would return a user TRUE if a give > >> cluster has to be linearized and it should always be called before > >> rte_eth_tx_burst(). Alternatively it may be called from inside the > >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some > >> error code for a case when one of the clusters it's given has to be > >> linearized. > >> 2. Another option is to allocate a mempool in the driver with the > >> elements consuming a single page each (standard 2KB buffers would > >> do). Number of elements in the pool should be as Tx ring length > >> multiplied by "64KB/(linear data length of the buffer in the pool > >> above)". Here I use 64KB as a maximum packet length and not taking > >> into an account esoteric things like "Giant" TSO mentioned in the > >> spec above. Then we may actually go and linearize the cluster if > >> needed on top of the buffers from the pool above, post the buffer > >> from the mempool above on the HW ring, link the original cluster to > >> that new cluster (using the private data) and release it when the > >> send is done. > >> > >> > >> The first is a change in the API and would require from the application > >> some > >> additional handling (linearization). The second would require some > >> additional > >> memory but would keep all dirty details inside the driver and would le
[dpdk-dev] i40e xmit path HW limitation
On 07/30/15 20:33, Zhang, Helin wrote: > >> -Original Message- >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] >> Sent: Thursday, July 30, 2015 9:44 AM >> To: Zhang, Helin; Ananyev, Konstantin >> Cc: dev at dpdk.org >> Subject: Re: i40e xmit path HW limitation >> >> >> >> On 07/30/15 19:10, Zhang, Helin wrote: -Original Message- From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] Sent: Thursday, July 30, 2015 7:58 AM To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin Subject: RFC: i40e xmit path HW limitation Hi, Konstantin, Helin, there is a documented limitation of xl710 controllers (i40e driver) which is not handled in any way by a DPDK driver. From the datasheet chapter 8.4.1: "? A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including both the header and payload buffers). ? The total number of data descriptors for the whole TSO (explained later on in this chapter) is unlimited as long as each segment within the TSO obeys the previous rule (up to 8 data descriptors per segment for both the TSO header and the segment payload buffers)." >>> Yes, I remember the RX side just supports 5 segments per packet receiving. >>> But what's the possible issue you thought about? >> Note that it's a Tx size we are talking about. >> >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo. >> If such a cluster arrives and you post it on the HW ring - HW will shut this >> HW ring >> down permanently. The application will see that it's ring is stuck. > That issue was because of using more than 8 descriptors for a packet for TSO. There is no problem in transmitting the TSO packet with more than 8 fragments. On the opposite - one can't transmit a non-TSO packet with more than 8 fragments. One also can't transmit the TSO packet that would contain more than 8 fragments in a single TSO segment including the TSO headers. Pls., read the HW spec as I quoted above for more details. > This means that, for instance, long cluster with small fragments has to be linearized before it may be placed on the HW ring. >>> What type of size of the small fragments? Basically 2KB is the default size >>> of >> mbuf of most >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the >> maximum >>> packet size we supported. >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of >>> packet. >> I kinda lost u here. Again, we talk about the Tx side here and buffers >> are not obligatory completely filled. Namely there may be a cluster with >> 15 fragments 100 bytes each. > The root cause is using more than 8 descriptors for a packet. That would be if u would like to SUPER simplify the HW limitation above. In that case u would significantly limit the different packets that may be sent without the linearization. > Linux driver can help > on reducing number of descriptors to be used by merging small size of payload > together, right? > It is not for TSO, it is just for packet transmitting. 2 options in my mind: > 1. Use should ensure it will not use more than 8 descriptors per packet for > transmitting. This requirement is too restricting. Pls., see above. > 2. DPDK driver should try to merge small packet together for such case, like > Linux kernel driver. > I prefer to use option 1, users should ensure that in the application or up > layer software, > and keep the PMD driver as simple as possible. The above statement is super confusing: on the one hand u suggest the DPDK driver to merge the small packet (fragments?) together (how?) and then u immediately propose the user application to do that. Could u, pls., clarify what exactly u suggest here? If that's to leave it to the application - note that it would demand patching all existing DPDK applications that send TCP packets. > > But I have a thought that the maximum number of RX/TX descriptor should be > able to be > queried somewhere. There is no such thing as maximum number of Tx fragments in a TSO case. It's only limited by the Tx ring size. > > Regards, > Helin In more standard environments like Linux or FreeBSD drivers the solution is straight forward - call skb_linearize()/m_collapse() corresponding. In the non-conformist environment like DPDK life is not that easy - there is no easy way to collapse the cluster into a linear buffer from inside the device >> driver since device driver doesn't allocate memory in a fast path and utilizes the user allocated pools only. Here are two proposals for a solution: 1. We may provide a callback that would return a user TRUE if a give cluster has to be linearized and it should always be called before rte_eth_tx_burst(). Alternatively it may be called from inside the rte_eth_tx_burst() and rte_e
[dpdk-dev] [PATCH v3 0/6] log de-spamming
2015-07-09 16:01, Stephen Hemminger: > From: Stephen Hemminger > > These patches were sent earlier, updated to current tree. > > They make Intel drivers not spam the log with information > messages that cause questions in production. > > Unfortunately, developers seem to get attached to log messages > which are not appropriate in a production product > > Stephen Hemminger (6): > ixgbe: convert debug messages to DEBUG level > ixgbe: raise priority of significant log events > ixgbe: allow pruning log during build > e1000: allow pruning log during build > e1000: change log level of debug messages > e1000: raise log level of signifcant events Applied, thanks
[dpdk-dev] [PATCH v3 3/6] ixgbe: allow pruning log during build
2015-07-09 16:01, Stephen Hemminger: > From: Stephen Hemminger > > The ixgbe driver was not following DPDK convention and > was leaving loggin always in even if LOG_LEVEL was configured > to disable debug logs. > > Signed-off-by: Stephen Hemminger This series is fixing e1000 and ixgbe. There is the same issue with i40e, fm10k and bnx2x. I will fix them in the same way. For consistency, examples/l3fwd-power and eal_common_tailqs.c should use RTE_LOG instead of rte_log.
[dpdk-dev] i40e xmit path HW limitation
> -Original Message- > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > Sent: Thursday, July 30, 2015 10:56 AM > To: Zhang, Helin; Ananyev, Konstantin > Cc: dev at dpdk.org > Subject: Re: i40e xmit path HW limitation > > > > On 07/30/15 20:33, Zhang, Helin wrote: > > > >> -Original Message- > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > >> Sent: Thursday, July 30, 2015 9:44 AM > >> To: Zhang, Helin; Ananyev, Konstantin > >> Cc: dev at dpdk.org > >> Subject: Re: i40e xmit path HW limitation > >> > >> > >> > >> On 07/30/15 19:10, Zhang, Helin wrote: > -Original Message- > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > Sent: Thursday, July 30, 2015 7:58 AM > To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin > Subject: RFC: i40e xmit path HW limitation > > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver) > which is not handled in any way by a DPDK driver. > From the datasheet chapter 8.4.1: > > "? A single transmit packet may span up to 8 buffers (up to 8 data > descriptors per packet including both the header and payload buffers). > ? The total number of data descriptors for the whole TSO (explained > later on in this chapter) is unlimited as long as each segment > within the TSO obeys the previous rule (up to 8 data descriptors > per segment for both the TSO header and the segment payload buffers)." > >>> Yes, I remember the RX side just supports 5 segments per packet receiving. > >>> But what's the possible issue you thought about? > >> Note that it's a Tx size we are talking about. > >> > >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next > repo. > >> If such a cluster arrives and you post it on the HW ring - HW will > >> shut this HW ring down permanently. The application will see that it's > >> ring is > stuck. > > That issue was because of using more than 8 descriptors for a packet for > > TSO. > > There is no problem in transmitting the TSO packet with more than 8 fragments. > On the opposite - one can't transmit a non-TSO packet with more than 8 > fragments. > One also can't transmit the TSO packet that would contain more than 8 > fragments > in a single TSO segment including the TSO headers. > > Pls., read the HW spec as I quoted above for more details. I meant a packet to be transmitted by the hardware, but not the TSO packet in memory. It could be a segment in TSO packet in memory. The linearize check in kernel driver is not for TSO only, it is for both TSO and NON-TSO cases. > > > > This means that, for instance, long cluster with small fragments > has to be linearized before it may be placed on the HW ring. > >>> What type of size of the small fragments? Basically 2KB is the > >>> default size of > >> mbuf of most > >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough > >>> for the > >> maximum > >>> packet size we supported. > >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of > packet. > >> I kinda lost u here. Again, we talk about the Tx side here and > >> buffers are not obligatory completely filled. Namely there may be a > >> cluster with > >> 15 fragments 100 bytes each. > > The root cause is using more than 8 descriptors for a packet. > > That would be if u would like to SUPER simplify the HW limitation above. > In that case u would significantly limit the different packets that may be > sent > without the linearization. > > > Linux driver can help > > on reducing number of descriptors to be used by merging small size of > > payload together, right? > > It is not for TSO, it is just for packet transmitting. 2 options in my mind: > > 1. Use should ensure it will not use more than 8 descriptors per packet for > transmitting. > > This requirement is too restricting. Pls., see above. > > > 2. DPDK driver should try to merge small packet together for such case, like > Linux kernel driver. > > I prefer to use option 1, users should ensure that in the application > > or up layer software, and keep the PMD driver as simple as possible. > > The above statement is super confusing: on the one hand u suggest the DPDK > driver to merge the small packet (fragments?) together (how?) and then u > immediately propose the user application to do that. Could u, pls., clarify > what > exactly u suggest here? > If that's to leave it to the application - note that it would demand patching > all > existing DPDK applications that send TCP packets. Those are two of obvious options. One is to do that in PMD, the other one is to do that in up layer. I did not mean it needs to do both! > > > > > But I have a thought that the maximum number of RX/TX descriptor > > should be able to be queried somewhere. > > There is no such thing as maximum number of Tx fragments in a TSO case. > It's only limited by the Tx ring s
[dpdk-dev] [PATCH v2] enic: silence log message unless debug enabled
2015-07-30 10:03, Stephen Hemminger: > --- a/drivers/net/enic/enic_compat.h > +++ b/drivers/net/enic/enic_compat.h > @@ -82,7 +82,11 @@ > #define dev_err(x, args...) dev_printk(ERR, args) > #define dev_info(x, args...) dev_printk(INFO, args) > #define dev_warning(x, args...) dev_printk(WARNING, args) > +#ifdef RTE_LIBRTE_ENIC_DEBUG > #define dev_debug(x, args...) dev_printk(DEBUG, args) > +#else > +#define dev_debug(x, args...) do { } while(0) > +#endif I don't understand why it is needed: dev_debug won't print anything if the log level is higher than DEBUG.
[dpdk-dev] i40e xmit path HW limitation
On Jul 30, 2015 22:00, "Zhang, Helin" wrote: > > > > > -Original Message- > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > > Sent: Thursday, July 30, 2015 10:56 AM > > To: Zhang, Helin; Ananyev, Konstantin > > Cc: dev at dpdk.org > > Subject: Re: i40e xmit path HW limitation > > > > > > > > On 07/30/15 20:33, Zhang, Helin wrote: > > > > > >> -Original Message- > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > > >> Sent: Thursday, July 30, 2015 9:44 AM > > >> To: Zhang, Helin; Ananyev, Konstantin > > >> Cc: dev at dpdk.org > > >> Subject: Re: i40e xmit path HW limitation > > >> > > >> > > >> > > >> On 07/30/15 19:10, Zhang, Helin wrote: > > -Original Message- > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com] > > Sent: Thursday, July 30, 2015 7:58 AM > > To: dev at dpdk.org; Ananyev, Konstantin; Zhang, Helin > > Subject: RFC: i40e xmit path HW limitation > > > > Hi, Konstantin, Helin, > > there is a documented limitation of xl710 controllers (i40e driver) > > which is not handled in any way by a DPDK driver. > > From the datasheet chapter 8.4.1: > > > > "? A single transmit packet may span up to 8 buffers (up to 8 data > > descriptors per packet including both the header and payload buffers). > > ? The total number of data descriptors for the whole TSO (explained > > later on in this chapter) is unlimited as long as each segment > > within the TSO obeys the previous rule (up to 8 data descriptors > > per segment for both the TSO header and the segment payload buffers)." > > >>> Yes, I remember the RX side just supports 5 segments per packet receiving. > > >>> But what's the possible issue you thought about? > > >> Note that it's a Tx size we are talking about. > > >> > > >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next > > repo. > > >> If such a cluster arrives and you post it on the HW ring - HW will > > >> shut this HW ring down permanently. The application will see that it's ring is > > stuck. > > > That issue was because of using more than 8 descriptors for a packet for TSO. > > > > There is no problem in transmitting the TSO packet with more than 8 fragments. > > On the opposite - one can't transmit a non-TSO packet with more than 8 > > fragments. > > One also can't transmit the TSO packet that would contain more than 8 fragments > > in a single TSO segment including the TSO headers. > > > > Pls., read the HW spec as I quoted above for more details. > I meant a packet to be transmitted by the hardware, but not the TSO packet in memory. > It could be a segment in TSO packet in memory. > The linearize check in kernel driver is not for TSO only, it is for both TSO and > NON-TSO cases. That's what i was trying to tell u. Great we are on the same page at last... ? > > > > > > > > This means that, for instance, long cluster with small fragments > > has to be linearized before it may be placed on the HW ring. > > >>> What type of size of the small fragments? Basically 2KB is the > > >>> default size of > > >> mbuf of most > > >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough > > >>> for the > > >> maximum > > >>> packet size we supported. > > >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of > > packet. > > >> I kinda lost u here. Again, we talk about the Tx side here and > > >> buffers are not obligatory completely filled. Namely there may be a > > >> cluster with > > >> 15 fragments 100 bytes each. > > > The root cause is using more than 8 descriptors for a packet. > > > > That would be if u would like to SUPER simplify the HW limitation above. > > In that case u would significantly limit the different packets that may be sent > > without the linearization. > > > > > Linux driver can help > > > on reducing number of descriptors to be used by merging small size of > > > payload together, right? > > > It is not for TSO, it is just for packet transmitting. 2 options in my mind: > > > 1. Use should ensure it will not use more than 8 descriptors per packet for > > transmitting. > > > > This requirement is too restricting. Pls., see above. > > > > > 2. DPDK driver should try to merge small packet together for such case, like > > Linux kernel driver. > > > I prefer to use option 1, users should ensure that in the application > > > or up layer software, and keep the PMD driver as simple as possible. > > > > The above statement is super confusing: on the one hand u suggest the DPDK > > driver to merge the small packet (fragments?) together (how?) and then u > > immediately propose the user application to do that. Could u, pls., clarify what > > exactly u suggest here? > > If that's to leave it to the application - note that it would demand patching all > > existing DPDK applications that send TCP packets. > Those are two of obvious options. One is to do that in PMD, the othe
[dpdk-dev] [PATCH] eal: fix compilation for x86_x32-native-linuxapp-gcc
Compiling for dpdk x86_x32 gives the following error: CC eal_common_timer.o In file included from /usr/include/sys/sysctl.h:63:0, from dpdk.org/lib/librte_eal/common/eal_common_timer.c:39: /usr/include/bits/sysctl.h:19:3: error: #error "sysctl system call is unsupported in x32 kernel" # error "sysctl system call is unsupported in x32 kernel" ^ dpdk.org/mk/internal/rte.compile-pre.mk:126: recipe for target 'eal_common_timer.o' failed make[6]: *** [eal_common_timer.o] Error 1 Including sysctl.h was added by mistake when merging bsd and linux EAL timer code. It can be safely removed in this file, fixing the compilation. Fixes: 040cf8a411 ("eal: deduplicate timer functions") Signed-off-by: Olivier Matz --- lib/librte_eal/common/eal_common_timer.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_timer.c b/lib/librte_eal/common/eal_common_timer.c index 255f995..72371b8 100644 --- a/lib/librte_eal/common/eal_common_timer.c +++ b/lib/librte_eal/common/eal_common_timer.c @@ -36,7 +36,6 @@ #include #include #include -#include #include #include -- 2.1.4
[dpdk-dev] [PATCH v2] mbuf: enforce alignment of mbuf private area
> -Original Message- > From: Olivier Matz [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 5:22 PM > To: dev at dpdk.org > Cc: Ananyev, Konstantin; olivier.matz at 6wind.com; Zhang, Helin; > martin.weiser at allegro-packets.com; thomas.monjalon at 6wind.com > Subject: [PATCH v2] mbuf: enforce alignment of mbuf private area > > It looks better to have a data buffer address that is aligned to > 8 bytes. This is the case when there is no mbuf private area, but > if there is one, the alignment depends on the size of this area > that is located between the mbuf structure and the data buffer. > > Indeed, some drivers expects to have the buffer address aligned > to an even address, and moreover an unaligned buffer may impact > the performance when accessing to network headers. > > Add a check in rte_pktmbuf_pool_create() to verify the alignment > constraint before creating the mempool. For applications that use > the alternative way (direct call to rte_mempool_create), also > add an assertion in rte_pktmbuf_init(). > > By the way, also add the MBUF log type. > > Signed-off-by: Olivier Matz > --- Acked-by: Konstantin Ananyev