[dpdk-dev] How to know corresponding device from port number
Hi, I have a question about how to know corresponding device from port number. For example, if I have 4 Ethernet devices and 2 Ring PMDs, I will get 6 ports during initialization. In the case, how can I know which port corresponds last Ring PMD? Regards, Tetsuya Mukawa
[dpdk-dev] How to know corresponding device from port number
> > Hi, > > I have a question about how to know corresponding device from port > number. > For example, if I have 4 Ethernet devices and 2 Ring PMDs, I will get 6 ports > during initialization. > In the case, how can I know which port corresponds last Ring PMD? [BR] Firstly, to identify the ring PMD's vs the ethernet device PMDs you can use the information in the rte_eth_dev structure. For each device x, (0 <= x <=5), if you check rte_eth_devices[x], the ring pmd's will have a NULL driver pointer and the pci address given in the pci_dev structure will be all-zeros. As for distinguishing two different ring ethdevs from each other, I'm not aware of any way to do this, they will just have different eth_dev indexes.
[dpdk-dev] [PATCH] compilation fixes for ICC
Compilation fixes for ICC ICC requires an initializer be given for the static variables, so adding one in cases where one wasn't previously given. --- lib/librte_pmd_e1000/igb_rxtx.c |6 -- lib/librte_pmd_ixgbe/ixgbe_rxtx.c |6 -- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c index 90c3227..68716b0 100644 --- a/lib/librte_pmd_e1000/igb_rxtx.c +++ b/lib/librte_pmd_e1000/igb_rxtx.c @@ -1134,7 +1134,8 @@ igb_reset_tx_queue_stat(struct igb_tx_queue *txq) static void igb_reset_tx_queue(struct igb_tx_queue *txq, struct rte_eth_dev *dev) { - static const union e1000_adv_tx_desc zeroed_desc; + static const union e1000_adv_tx_desc zeroed_desc = { .read = { + .buffer_addr = 0}}; struct igb_tx_entry *txe = txq->sw_ring; uint16_t i, prev; struct e1000_hw *hw; @@ -1296,7 +1297,8 @@ eth_igb_rx_queue_release(void *rxq) static void igb_reset_rx_queue(struct igb_rx_queue *rxq) { - static const union e1000_adv_rx_desc zeroed_desc; + static const union e1000_adv_rx_desc zeroed_desc = { .read = { + .pkt_addr = 0}}; unsigned i; /* Zero out HW ring memory */ diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index ae9eda8..6eda8bc 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -1799,7 +1799,8 @@ ixgbe_dev_tx_queue_release(void *txq) static void ixgbe_reset_tx_queue(struct igb_tx_queue *txq) { - static const union ixgbe_adv_tx_desc zeroed_desc; + static const union ixgbe_adv_tx_desc zeroed_desc = { .read = { + .buffer_addr = 0}}; struct igb_tx_entry *txe = txq->sw_ring; uint16_t prev, i; @@ -2094,7 +2095,8 @@ check_rx_burst_bulk_alloc_preconditions(__rte_unused struct igb_rx_queue *rxq) static void ixgbe_reset_rx_queue(struct igb_rx_queue *rxq) { - static const union ixgbe_adv_rx_desc zeroed_desc; + static const union ixgbe_adv_rx_desc zeroed_desc = { .read = { + .pkt_addr = 0}}; unsigned i; uint16_t len; -- 1.7.7.6
[dpdk-dev] Using dpdk in KVM guest with sr-iov pass-thru
Hello, I'm having trouble using l2fwd example on top of KVM guest with sr-iov. Here goes the detailed description: Symptoms: If I run l2fwd dpdk-app, this app does not receive any packets. Even worse, pass-thru-ed device in KVM guest is not receiving any interrupts and more over, PF in host-side is not also receiving any packets. If I destroy the KVM guest, then PF starts receiving packets. (which is very wired right?) Env: - Sr-iov card: Intel Corporation I350 Gigabit Network Connection (rev 01) - DPDK version: dpdk-1.5.1r1 - KVM installed with the packages in ubuntu server 64-bit n Kernel version: 3.11.0-12-generic - KVM guest (ubuntu server 64-bit; the same to host) - CPU: i7-3770 - iommu enabled Do you have any similar issues or any comments or pointers? Thanks, Jaeyong
[dpdk-dev] Question: Can't make pcap and refcnt to match
I have had stability problems when using pcap in my little application. My application is a simple benchmark applications that is trying to see how much data I can send and receive. It has one lcore per NIC, where each lcore handles transmit and receive. On the hardware, I make a loopback between two NICs, so the NICs are in practice paired. I currently use 4 NICs and therefore 4 lcores. Port 0 sends to port 1 and vice versa. Port 2 send to port 3 and vice versa. One pair is using DPDK hardware driver against a dual i350 NIC. The other pair is using pcap against two of the four on-board NICs. When enabling everything saying "DEBUG" in its name in the .config file, I get the following error: PMD: rte_eth_dev_config_restore: port 1: MAC address array not supported PMD: rte_eth_promiscuous_disable: Function not supported PMD: rte_eth_allmulticast_disable: Function not supported Speed: 1 Mbps, full duplex Port 1 up and running. PMD: e1000_put_hw_semaphore_generic(): e1000_put_hw_semaphore_generic PANIC in rte_mbuf_sanity_check(): bad ref cnt PANIC in rte_mbuf_sanity_check(): bad ref cnt PMD: e1000_release_phy_82575(): e1000_release_phy_82575 PMD: e1000_release_swfw_sync_82575(): e1000_release_swfw_sync_82575 PMD: e1000_get_hw_semaphore_generic(): e1000_get_hw_semaphore_generic PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fff776eefc0 hw_ring=0x7fff76830480 dma_addr=0x464630480 PMD: e1000_put_hw_semaphore_generic(): e1000_put_hw_semaphore_generic PMD: To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fff776ece40 hw_ring=0x7fff76840500 dma_addr=0x464640500 PMD: eth_igb_start(): >> PMD: e1000_read_phy_reg_82580(): e1000_read_phy_reg_82580 PMD: e1000_acquire_phy_82575(): e1000_acquire_phy_82575 PMD: e1000_acquire_swfw_sync_82575(): e1000_acquire_swfw_sync_82575 PMD: e1000_get_hw_semaphore_generic(): e1000_get_hw_semaphore_generic PMD: e1000_get_cfg_done_82575(): e1000_get_cfg_done_82575 PMD: e1000_put_hw_semaphore_generic(): e1000_put_hw_semaphore_generic PMD: e1000_read_phy_reg_mdic(): e1000_read_phy_reg_mdic 9: [/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x772a89cd]] 8: [/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f6e) [0x7757df6e]] 7: [/home/mlil/dpdk-demo/build/enea-demo(eal_thread_loop+0x1b9) [0x492669]] 6: [/home/mlil/dpdk-demo/build/enea-demo() [0x4150bc]] 5: [/home/mlil/dpdk-demo/build/enea-demo() [0x414d0b]] 4: [/home/mlil/dpdk-demo/build/enea-demo() [0x4116ef]] 3: [/home/mlil/dpdk-demo/build/enea-demo(rte_mbuf_sanity_check+0xa7) [0x484707]] 2: [/home/mlil/dpdk-demo/build/enea-demo(__rte_panic+0xc1) [0x40f788]] 1: [/home/mlil/dpdk-demo/build/enea-demo(rte_dump_stack+0x18) [0x493f68]] PMD: e1000_release_phy_82575(): e1000_release_phy_82575 PMD: e1000_release_swfw_sync_82575(): e1000_release_swfw_sync_82575 PMD: e1000_get_hw_semaphore_generic(): e1000_get_hw_semaphore_generic I checked the source code for pcap, and in the file rte_eth_pcap.c, function eth_pcap_rx(), I make the following observation: It pre-allocates a number of mbufs (64 to be exact). It then fills these mbufs with data and returns them. The pre-allocation seems to only be done once, and then they are re-used. This confuses me. How does this work when more than 64 packets are requested? I see no safety checks for this. Aren't application supposed to call rte_pktmbuf_free() on the returned mbufs? If so, the pre-allocated mbufs will have been free'd as far as I can see and can therefore not be re-used. What am I missing here? Regards Mats
[dpdk-dev] Question: Can't make pcap and refcnt to match
Hi Mats, yes, you are right, there is an issue in the pcap driver that it is not allocating mbufs correctly. We are working on a fix. Regards, /Bruce > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mats Liljegren > Sent: Tuesday, November 26, 2013 1:07 PM > To: dev at dpdk.org > Subject: [dpdk-dev] Question: Can't make pcap and refcnt to match > > I have had stability problems when using pcap in my little application. My > application is a simple benchmark applications that is trying to see how > much data I can send and receive. > > It has one lcore per NIC, where each lcore handles transmit and receive. On > the hardware, I make a loopback between two NICs, so the NICs are in > practice paired. I currently use 4 NICs and therefore 4 lcores. Port 0 sends > to > port 1 and vice versa. Port 2 send to port 3 and vice versa. One pair is using > DPDK hardware driver against a dual > i350 NIC. The other pair is using pcap against two of the four on-board NICs. > > When enabling everything saying "DEBUG" in its name in the .config file, I > get the following error: > > PMD: rte_eth_dev_config_restore: port 1: MAC address array not > supported > PMD: rte_eth_promiscuous_disable: Function not supported > PMD: rte_eth_allmulticast_disable: Function not supported > Speed: 1 Mbps, full duplex > Port 1 up and running. > PMD: e1000_put_hw_semaphore_generic(): > e1000_put_hw_semaphore_generic PANIC in rte_mbuf_sanity_check(): > bad ref cnt > PANIC in rte_mbuf_sanity_check(): > bad ref cnt > PMD: e1000_release_phy_82575(): e1000_release_phy_82575 > PMD: e1000_release_swfw_sync_82575(): > e1000_release_swfw_sync_82575 > PMD: e1000_get_hw_semaphore_generic(): > e1000_get_hw_semaphore_generic > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fff776eefc0 > hw_ring=0x7fff76830480 dma_addr=0x464630480 > > PMD: e1000_put_hw_semaphore_generic(): > e1000_put_hw_semaphore_generic > PMD: To improve 1G driver performance, consider setting the TX WTHRESH > value to 4, 8, or 16. > PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fff776ece40 > hw_ring=0x7fff76840500 dma_addr=0x464640500 > > PMD: eth_igb_start(): >> > PMD: e1000_read_phy_reg_82580(): e1000_read_phy_reg_82580 > PMD: e1000_acquire_phy_82575(): e1000_acquire_phy_82575 > PMD: e1000_acquire_swfw_sync_82575(): > e1000_acquire_swfw_sync_82575 > PMD: e1000_get_hw_semaphore_generic(): > e1000_get_hw_semaphore_generic > PMD: e1000_get_cfg_done_82575(): e1000_get_cfg_done_82575 > PMD: e1000_put_hw_semaphore_generic(): > e1000_put_hw_semaphore_generic > PMD: e1000_read_phy_reg_mdic(): e1000_read_phy_reg_mdic > 9: [/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x772a89cd]] > 8: [/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f6e) [0x7757df6e]] > 7: [/home/mlil/dpdk-demo/build/enea-demo(eal_thread_loop+0x1b9) > [0x492669]] > 6: [/home/mlil/dpdk-demo/build/enea-demo() [0x4150bc]] > 5: [/home/mlil/dpdk-demo/build/enea-demo() [0x414d0b]] > 4: [/home/mlil/dpdk-demo/build/enea-demo() [0x4116ef]] > 3: [/home/mlil/dpdk-demo/build/enea- > demo(rte_mbuf_sanity_check+0xa7) [0x484707]] > 2: [/home/mlil/dpdk-demo/build/enea-demo(__rte_panic+0xc1) > [0x40f788]] > 1: [/home/mlil/dpdk-demo/build/enea-demo(rte_dump_stack+0x18) > [0x493f68]] > PMD: e1000_release_phy_82575(): e1000_release_phy_82575 > PMD: e1000_release_swfw_sync_82575(): > e1000_release_swfw_sync_82575 > PMD: e1000_get_hw_semaphore_generic(): > e1000_get_hw_semaphore_generic > > I checked the source code for pcap, and in the file rte_eth_pcap.c, function > eth_pcap_rx(), I make the following observation: > > It pre-allocates a number of mbufs (64 to be exact). It then fills these mbufs > with data and returns them. The pre-allocation seems to only be done once, > and then they are re-used. > > This confuses me. How does this work when more than 64 packets are > requested? I see no safety checks for this. > > Aren't application supposed to call rte_pktmbuf_free() on the returned > mbufs? If so, the pre-allocated mbufs will have been free'd as far as I can > see and can therefore not be re-used. > > What am I missing here? > > Regards > Mats
[dpdk-dev] Question: Can't make pcap and refcnt to match
Hi Bruce, We also found buffer overflow problems with the pcap driver. 1) Frame may be longer than mbuf. 2) Caplen may be less than original packet. I've been meaning to submit a change, but I'm not familiar with the process. Here is a diff of the relevant code in rte_eth_pcap.c: if (unlikely(mbuf == NULL)) break; - rte_memcpy(mbuf->pkt.data, packet, header.len); - mbuf->pkt.data_len = (uint16_t)header.len; - mbuf->pkt.pkt_len = mbuf->pkt.data_len; + + /* + * Fix buffer overflow problems. + * 1. Frame may be longer than mbuf. + * 2. Capture length (caplen) may be less than original packet length. + */ + uint16_t len = (uint16_t)header.caplen; + uint16_t tailroom = rte_pktmbuf_tailroom(mbuf); + if (len > tailroom) + len = tailroom; + + / + RTE_LOG(INFO, PMD, "eth_pcap_rx: i=%u caplen=%u framelen=%u tail=%u len=%u\n", + i, header.caplen, header.len, tailroom, len); + / + + rte_memcpy(mbuf->pkt.data, packet, len); + mbuf->pkt.data_len = len; + mbuf->pkt.pkt_len = len; + bufs[i] = mbuf; num_rx++; } Regards, Robert On Tue, Nov 26, 2013 at 8:46 AM, Richardson, Bruce < bruce.richardson at intel.com> wrote: > Hi Mats, > > yes, you are right, there is an issue in the pcap driver that it is not > allocating mbufs correctly. We are working on a fix. > > Regards, > /Bruce > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mats Liljegren > > Sent: Tuesday, November 26, 2013 1:07 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] Question: Can't make pcap and refcnt to match > > > > I have had stability problems when using pcap in my little application. > My > > application is a simple benchmark applications that is trying to see how > > much data I can send and receive. > > > > It has one lcore per NIC, where each lcore handles transmit and receive. > On > > the hardware, I make a loopback between two NICs, so the NICs are in > > practice paired. I currently use 4 NICs and therefore 4 lcores. Port 0 > sends to > > port 1 and vice versa. Port 2 send to port 3 and vice versa. One pair is > using > > DPDK hardware driver against a dual > > i350 NIC. The other pair is using pcap against two of the four on-board > NICs. > > > > When enabling everything saying "DEBUG" in its name in the .config file, > I > > get the following error: > > > > PMD: rte_eth_dev_config_restore: port 1: MAC address array not > > supported > > PMD: rte_eth_promiscuous_disable: Function not supported > > PMD: rte_eth_allmulticast_disable: Function not supported > > Speed: 1 Mbps, full duplex > > Port 1 up and running. > > PMD: e1000_put_hw_semaphore_generic(): > > e1000_put_hw_semaphore_generic PANIC in rte_mbuf_sanity_check(): > > bad ref cnt > > PANIC in rte_mbuf_sanity_check(): > > bad ref cnt > > PMD: e1000_release_phy_82575(): e1000_release_phy_82575 > > PMD: e1000_release_swfw_sync_82575(): > > e1000_release_swfw_sync_82575 > > PMD: e1000_get_hw_semaphore_generic(): > > e1000_get_hw_semaphore_generic > > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fff776eefc0 > > hw_ring=0x7fff76830480 dma_addr=0x464630480 > > > > PMD: e1000_put_hw_semaphore_generic(): > > e1000_put_hw_semaphore_generic > > PMD: To improve 1G driver performance, consider setting the TX WTHRESH > > value to 4, 8, or 16. > > PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fff776ece40 > > hw_ring=0x7fff76840500 dma_addr=0x464640500 > > > > PMD: eth_igb_start(): >> > > PMD: e1000_read_phy_reg_82580(): e1000_read_phy_reg_82580 > > PMD: e1000_acquire_phy_82575(): e1000_acquire_phy_82575 > > PMD: e1000_acquire_swfw_sync_82575(): > > e1000_acquire_swfw_sync_82575 > > PMD: e1000_get_hw_semaphore_generic(): > > e1000_get_hw_semaphore_generic > > PMD: e1000_get_cfg_done_82575(): e1000_get_cfg_done_82575 > > PMD: e1000_put_hw_semaphore_generic(): > > e1000_put_hw_semaphore_generic > > PMD: e1000_read_phy_reg_mdic(): e1000_read_phy_reg_mdic > > 9: [/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x772a89cd]] > > 8: [/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f6e) [0x7757df6e]] > > 7: [/home/mlil/dpdk-demo/build/enea-demo(eal_thread_loop+0x1b9) > > [0x492669]] > > 6: [/home/mlil/dpdk-demo/build/enea-demo() [0x4150bc]] > > 5: [/home/mlil/dpdk-demo/build/enea-demo() [0x414d0b]] > > 4: [/home/mlil/dpdk-demo/build/enea-demo() [0x4116ef]] > > 3: [/home/mlil/dpdk-demo/build/enea- > > demo(rte_mbuf_sanity_check+0xa7) [0x484707]] > > 2: [/home/mlil/dpdk-demo/build/enea-demo(__rte_panic+0xc1) > > [0x40f788]] > > 1: [/home/mlil/dpdk-demo/build/enea-demo(rte_dump_stack+0x18) > > [0x493f68]] > > PMD: e1000_release_phy_82575(): e1000_release_phy_82575 > > PMD: e1000_release_swfw_sync_82575(): > > e1000_release_swfw_sync_82575 > > PMD: e1000_get_hw_semaphore_generic(): > > e1000_get_hw_semaphore_generic > > > > I checked the source code for pcap, and in the file rte_eth_pcap.c, > function > > eth_pcap_rx(), I make the following observation: > > > > It pre-allocates a number of mbufs (64 to be exact). It then fills these > m
[dpdk-dev] Question: Can't make pcap and refcnt to match
Hello, 26/11/2013 16:42, Robert Sanford : > I've been meaning to submit a change, but I'm not familiar with the > process. The process is to send your patch with git (format-patch + send-email). You have to set a short title and a longer commit log explaining what was the problem and how you fix it. The commit log must have a Signed-off-by line (see "Developer's Certificate of Origin" in https://www.kernel.org/doc/Documentation/SubmittingPatches > + /* > + * Fix buffer overflow problems. > + * 1. Frame may be longer than mbuf. > + * 2. Capture length (caplen) may be less than original packet length. > + */ This should be in the commit log. Keep only comments needed to understand the code. > + / > + RTE_LOG(INFO, PMD, "eth_pcap_rx: i=%u caplen=%u framelen=%u tail=%u > len=%u\n", > + i, header.caplen, header.len, tailroom, len); > + / Why it is commented out ? If it's important, it is an INFO log. If it's useful when debugging, set it to DEBUG. If it's a temporary debug, remove it. By the way, thank you for your patch. -- Thomas
[dpdk-dev] [PATCH] pmd_pcap: fixed incorrect mbuf allocation
The mbufs returned by the pcap pmd RX function were constantly reused, instead of being allocated on demand. This has been fixed. Signed-off-by: Bruce Richardson --- lib/librte_pmd_pcap/rte_eth_pcap.c | 37 +-- 1 files changed, 26 insertions(+), 11 deletions(-) diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c b/lib/librte_pmd_pcap/rte_eth_pcap.c index 19d19b3..8a98471 100644 --- a/lib/librte_pmd_pcap/rte_eth_pcap.c +++ b/lib/librte_pmd_pcap/rte_eth_pcap.c @@ -118,32 +118,47 @@ eth_pcap_rx(void *queue, struct pcap_pkthdr header; const u_char *packet; struct rte_mbuf *mbuf; - static struct rte_mbuf *mbufs[RTE_ETH_PCAP_MBUFS] = { 0 }; struct pcap_rx_queue *pcap_q = queue; + struct rte_pktmbuf_pool_private *mbp_priv; uint16_t num_rx = 0; + uint16_t buf_size; if (unlikely(pcap_q->pcap == NULL || nb_pkts == 0)) return 0; - if(unlikely(!mbufs[0])) - for (i = 0; i < RTE_ETH_PCAP_MBUFS; i++) - mbufs[i] = rte_pktmbuf_alloc(pcap_q->mb_pool); - /* Reads the given number of packets from the pcap file one by one * and copies the packet data into a newly allocated mbuf to return. */ for (i = 0; i < nb_pkts; i++) { - mbuf = mbufs[i % RTE_ETH_PCAP_MBUFS]; + /* Get the next PCAP packet */ packet = pcap_next(pcap_q->pcap, &header); if (unlikely(packet == NULL)) break; + else + mbuf = rte_pktmbuf_alloc(pcap_q->mb_pool); if (unlikely(mbuf == NULL)) break; - rte_memcpy(mbuf->pkt.data, packet, header.len); - mbuf->pkt.data_len = (uint16_t)header.len; - mbuf->pkt.pkt_len = mbuf->pkt.data_len; - bufs[i] = mbuf; - num_rx++; + + /* Now get the space available for data in the mbuf */ + mbp_priv = (struct rte_pktmbuf_pool_private *) + ((char *)pcap_q->mb_pool + sizeof(struct rte_mempool)); + buf_size = (uint16_t) (mbp_priv->mbuf_data_room_size - + RTE_PKTMBUF_HEADROOM); + + if (header.len <= buf_size) { + /* pcap packet will fit in the mbuf, go ahead and copy */ + rte_memcpy(mbuf->pkt.data, packet, header.len); + mbuf->pkt.data_len = (uint16_t)header.len; + mbuf->pkt.pkt_len = mbuf->pkt.data_len; + bufs[i] = mbuf; + num_rx++; + } else { + /* pcap packet will not fit in the mbuf, so drop packet */ + RTE_LOG(ERR, PMD, + "PCAP packet %d bytes will not fit in mbuf (%d bytes)\n", + header.len, buf_size); + rte_pktmbuf_free(mbuf); + } } pcap_q->rx_pkts += num_rx; return num_rx; -- 1.7.7.6
[dpdk-dev] l2fwd program reported 100Mbps on a 10Gbps physical port using virtio or e1000 port in CentOS guest OS using DPDK 1.3.1r2
I have a Ubuntu 12.04.3 LTS (Linux 3.2.0-53-generic) KVM host. The guest OS is a CentOS 32bit (CentOS 6.2, Linux 2.6.32-220.el6.i686). There are two 10G ports on the KVM host with the following kvm. root at openstack1:~# kvm --version QEMU emulator version 1.2.0 (qemu-kvm-1.2.0), Copyright (c) 2003-2008 Fabrice Bellard root at openstack1:~# libvirtd --version libvirtd (libvirt) 0.9.8 The DPDK l2fwd runs inside a CentOS 6.2 OS (2.6.32-220.el6.i686 32bit Linux) on a RHEL 6.1 KVM host (2.6.32-131.0.15.e16.x86_64). THE PROBLEM: Using DPDK 1.3.1r2, I got the following message to indicate the port speed of a virtio port that l2fwd listened to for received packets. It reports as 100Mbps only although the physical port is a 10G port (Intel x520 PN 49Y7960, http://www.redbooks.ibm.com/abstracts/tips0893.html?Open). Is this expected on the DPDK side ? Checking link status done Port 0 Link Up - speed *100 Mbps *- full-duplex Port statistics Statistics for port 0 -- Packets sent:0 Packets received:0 Packets dropped: 0 Aggregate statistics === Total packets sent: 0 Total packets received: 0 Total packets dropped: 0 The KVM information: [root at rh188 ~]# libvirtd --version libvirtd (libvirt) 0.10.2 [root at rh188 ~]# /usr/libexec/qemu-kvm --version QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c) 2003-2008 Fabrice Bellard -- On the KVM host, physical port eth6 is 10Gbps [root at rh188 ~]# ethtool eth6 Settings for eth6: Supported ports: [ FIBRE ] Supported link modes: 1baseT/Full Supports auto-negotiation: No Advertised link modes: 1baseT/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 1Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: external Auto-negotiation: off Supports Wake-on: d Wake-on: d Current message level: 0x0007 (7) Link detected: yes On KVM host, virtual interface used by the KVM guest to receive packets, 10Mbps [root at rh188 ~]# ethtool eth6-client6 Settings for eth6-client6: Supported ports: [ ] Supported link modes: Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 10Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Current message level: 0xffa1 (-95) Link detected: yes eth6 and eth6-client are virtually connected to the br6 bridge [root at rh188 ~]# brctl show bridge name bridge id STP enabledinterfaces br6 8000.90e2ba341e54 no eth6 eth6-client6 The l2fwd command running on the guest VM 100Mbps reported on l2fwd for a one port setup (receiving only) /root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 -b 000:00:07.0 -b 000:00:0a.0 -b 000:00:09.0 -- -q 1 -p 1 the same (100Mbps) reported on l2fwd for a two ports setup (looping back received traffic to the other port) too /root/dpdk/dpdk-1.3.1r2/ examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 -b 000:00:07.0 -b 000:00:0a.0 -- -q 2 -p 3 Thanks James
[dpdk-dev] Regarding VM live migration with SRIOV
On Wed, 27 Nov 2013 10:09:09 +0530 Prashant Upadhyaya wrote: > Hi, > > Let me be more specific. > Does DPDK support hot plugin/plugout of PCI devices ? > What typically needs to be done if this is to be achieved inside an > application. > > Typically, the NIC PF or VF appears to the DPDK application as a PCI device > which is probed at startup. > Now what happens if I insert a new VF dynamically and want to use it inside > the DPDK application (while it is already running), how should this typically > be done ? [hotplugin] > And what happens if the DPDK application is in control of a PCI device and > that PCI device is suddenly removed ? How can the application detect this and > stop doing data transfer on this and sort of unload it ? [hotplugout] > > If the above can be coded inside the DPDK app, then we can think of live VM > migration with SRIOV -- just hotplugin and plugout the VF's. > > Regards > -Prashant > The current implementation does look like it supports hotplug. All devices are discovered during rte_eal_pci_probe.
[dpdk-dev] Increasing number of txd and rxd from 256 to 1024 for virtio-net-pmd-1.1
Running one directional traffic from Spirent traffic generator to l2fwd running inside a guest OS on a RHEL 6.2 KVM host, I encountered performance issue and need to increase the number of rxd and txd from 256 to 1024. There was not enough freeslots for packets to be transmitted in this routine virtio_send_packet(){ if (tq->freeslots < nseg + 1) { return -1; } } How do I solve the performance issue by one of the following 1. increase the number of rxd and txd from 256 to 1024 This should prevent packets could not be stored into the ring due to lack of freeslots. But l2fwd fails to run and indicate the number must be equal to 256. 2. increase the MAX_PKT_BURST But this is not ideal since it will increase the delay while improving the throughput 3. other mechanism that you know can improve it ? Is there any other approach to have enough freeslots to store the packets before passing down to PCI ? Thanks James This is the performance numbers I measured on the l2fwd printout for the receiving part. I added codes inside l2fwd to do tx part. vhost-net is enabled on KVM host, # of cache buffer 4096, Ubuntu 12.04.3 LTS (3.2.0-53-generic); kvm 1.2.0, libvirtd: 0.9.8 64 Bytes/pkt from Spirent @ 223k pps, running test for 10 seconds. DPDK 1.3 + virtio + 256 txd/rxd + nice -19 priority (l2fwd, guest kvm process) bash command: nice -n -19 /root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 -b 000:00:07.0 -b 000:00:0a.0 -b 000:00:09.0 -d /root/dpdk/virtio-net-pmd-1.1/librte_pmd_virtio.so -- -q 1 -p 1 Spirent -> l2fwd (receiving 10G) (RX on KVM guest) MAX_PKT_BURST 10seconds (<1% loss) Packets Per Second --- 32 74k pps 64 80k pps 128 126kpps 256 133kpps l2fw -> Spirent (10G port) (transmitting) (using one-directional one port (port 0) setup) MAX_PKT_BURST < 1% packet loss 32 88kpp ** The same test run on e1000 ports DPDK 1.3 + e1000 + 1024 txd/rxd + nice -19 priority (l2fwd, guest kvm process) bash command: nice -n -19 /root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 -b 000:00:07.0 -b 000:00:0a.0 -b 000:00:09.0 -- -q 1 -p 1 Spirent -> l2fwd (RECEIVING 10G) MAX_PKT_BURST <= 1% packet loss 32 110k pps l2fw -> Spirent (10G port) (TRANSMITTING) (using one-directional one port (port 0) setup) MAX_PKT_BURST pkts transmitted on l2fwd 32171k pps (0% dropped) 240 203k pps (6% dropped, 130k pps received on eth6 (assumed on Spirent)) ** **: not enough freeslots in tx ring ==> this indicate the effects of small txd/rxd (256) when more traffic is generated, the packets can not be sent due to lack of freeslots in tx ring. I guess this is the symptom occurs in the virtio_net
[dpdk-dev] Regarding VM live migration with SRIOV
On Wed, 27 Nov 2013 11:39:28 +0530 Prashant Upadhyaya wrote: > Hi Stephen, > > The rte_eal_pci_probe is typically called at the startup. > > Now let's say a DPDK application is running with a PCI device (doing tx and > rx) and I remove that PCI device underneath (hot plugout) > So how does the application now know that the device is gone ? > > Is it that rte_eal_pci_probe should be called periodically from, let's say, > the slow control path of the DPDK application ? > > Regards > -Prashant > Like I said current code doesn't do hotplug. If you wanted to add it, you would have to refactor the PCI management layer.
[dpdk-dev] Increasing number of txd and rxd from 256 to 1024 for virtio-net-pmd-1.1
On Tue, 26 Nov 2013 21:15:02 -0800 James Yu wrote: > Running one directional traffic from Spirent traffic generator to l2fwd > running inside a guest OS on a RHEL 6.2 KVM host, I encountered performance > issue and need to increase the number of rxd and txd from 256 to 1024. > There was not enough freeslots for packets to be transmitted in this routine > virtio_send_packet(){ > > if (tq->freeslots < nseg + 1) { > return -1; > } > > } > > How do I solve the performance issue by one of the following > 1. increase the number of rxd and txd from 256 to 1024 > This should prevent packets could not be stored into the ring due > to lack of freeslots. But l2fwd fails to run and indicate the number must > be equal to 256. > 2. increase the MAX_PKT_BURST > But this is not ideal since it will increase the delay while > improving the throughput > 3. other mechanism that you know can improve it ? > Is there any other approach to have enough freeslots to store the > packets before passing down to PCI ? > > > Thanks > > James > > > This is the performance numbers I measured on the l2fwd printout for the > receiving part. I added codes inside l2fwd to do tx part. > > vhost-net is enabled on KVM host, # of cache buffer 4096, Ubuntu 12.04.3 > LTS (3.2.0-53-generic); kvm 1.2.0, libvirtd: 0.9.8 > 64 Bytes/pkt from Spirent @ 223k pps, running test for 10 seconds. > > DPDK 1.3 + virtio + 256 txd/rxd + nice -19 priority (l2fwd, guest kvm > process) > bash command: nice -n -19 > /root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 > -b 000:00:07.0 -b 000:00:0a.0 -b 000:00:09.0 -d > /root/dpdk/virtio-net-pmd-1.1/librte_pmd_virtio.so -- -q 1 -p 1 > > Spirent -> l2fwd (receiving 10G) (RX on KVM guest) > MAX_PKT_BURST 10seconds (<1% loss) Packets Per Second > --- > 32 74k pps > 64 80k pps > 128 126kpps > 256 133kpps > > l2fw -> Spirent (10G port) (transmitting) (using one-directional one port > (port 0) setup) > MAX_PKT_BURST < 1% packet loss > 32 88kpp > > > ** > The same test run on e1000 ports > > > DPDK 1.3 + e1000 + 1024 txd/rxd + nice -19 priority (l2fwd, guest kvm > process) > bash command: nice -n -19 > /root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd -c 3 -n 1 -b 000:00:03.0 > -b 000:00:07.0 -b 000:00:0a.0 -b 000:00:09.0 -- -q 1 -p 1 > > Spirent -> l2fwd (RECEIVING 10G) > MAX_PKT_BURST <= 1% packet loss > 32 110k pps > > l2fw -> Spirent (10G port) (TRANSMITTING) (using one-directional one port > (port 0) setup) > MAX_PKT_BURST pkts transmitted on l2fwd > 32171k pps (0% dropped) > 240 203k pps (6% dropped, 130k pps received on > eth6 (assumed on Spirent)) ** > **: not enough freeslots in tx ring > ==> this indicate the effects of small txd/rxd (256) when more traffic is > generated, the packets can not > be sent due to lack of freeslots in tx ring. I guess this is the > symptom occurs in the virtio_net The number of slots with virtio is a parameter negotiated with the host. So unless the host (KVM) gives the device more slots, then it won't work. I have a better virtio driver and one of the features being added is multiqueue and merged TX buffer support which would give a bigger queue.