[dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio ring
Hi Huawei, On 1/4/2016 10:46 PM, Huawei Xie wrote: > This patch removes the internal lockless enqueue implmentation. > DPDK doesn't support receiving/transmitting packets from/to the same > queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK > applications normally have their own lock implmentation when enqueue > packets to the same queue of a port. > > The atomic cmpset is a costly operation. This patch should help > performance a bit. > > Signed-off-by: Huawei Xie > --- > lib/librte_vhost/vhost_rxtx.c | 86 > +-- > 1 file changed, 25 insertions(+), 61 deletions(-) > > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > index bbf3fac..26a1b9c 100644 > --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c I think vhost example will not work well with this patch when vm2vm=software. Test case: Two virtio ports handled by two pmd threads. Thread 0 polls pkts from physical NIC and sends to virtio0, while thread0 receives pkts from virtio1 and routes it to virtio0. > - > *(volatile uint16_t *)&vq->used->idx += entry_success; Another unrelated question: We ever try to move this assignment out of loop to save cost as it's a data contention? Thanks, Jianfeng
[dpdk-dev] [PATCH v5 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode
On Tue, Jan 19, 2016 at 7:48 PM, Burakov, Anatoly wrote: > Hi Santosh, > >> +int >> +pci_vfio_is_noiommu(struct rte_pci_device *pci_dev) { >> + FILE *fp; >> + struct rte_pci_addr *loc; >> + const char *path = >> "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"; >> + char filename[PATH_MAX] = {0}; >> + char buf[PATH_MAX] = {0}; >> + >> + /* >> + * 1. chk vfio-noiommu mode set in kernel driver >> + * 2. verify pci device attached to vfio-noiommu driver >> + * example: >> + * cd /sys/bus/pci/drivers/vfio-pci//iommu_group >> + * > cat name >> + * > vfio-noiommu --> means virtio_dev attached to vfio-noiommu >> driver >> + */ >> + >> + fp = fopen(path, "r"); >> + if (fp == NULL) { >> + RTE_LOG(ERR, EAL, "can't open %s\n", path); >> + return -1; >> + } >> + >> + if (fread(buf, sizeof(char), 1, fp) != 1) { >> + RTE_LOG(ERR, EAL, "can't read from file %s\n", path); >> + fclose(fp); >> + return -1; >> + } >> + >> + if (strncmp(buf, "Y", 1) != 0) { >> + RTE_LOG(ERR, EAL, "[%s]: vfio: noiommu mode not set\n", >> path); >> + fclose(fp); >> + return -1; >> + } >> + >> + fclose(fp); >> + >> + /* 2. chk whether attached driver is vfio-noiommu or not */ >> + loc = &pci_dev->addr; >> + snprintf(filename, sizeof(filename), >> + SYSFS_PCI_DEVICES "/" PCI_PRI_FMT >> "/iommu_group/name", >> + loc->domain, loc->bus, loc->devid, loc->function); >> + >> + /* check for vfio-noiommu */ >> + fp = fopen(filename, "r"); >> + if (fp == NULL) { >> + RTE_LOG(ERR, EAL, "can't open %s\n", filename); >> + return -1; >> + } >> + >> + if (fread(buf, sizeof(char), sizeof("vfio-noiommu"), fp) != >> + sizeof("vfio-noiommu")) { >> + RTE_LOG(ERR, EAL, "can't read from file %s\n", filename); >> + fclose(fp); >> + return -1; >> + } >> + >> + if (strncmp(buf, "vfio-noiommu", strlen("vfio-noiommu")) != 0) { >> + RTE_LOG(ERR, EAL, "not a vfio-noiommu driver\n"); >> + fclose(fp); >> + return -1; >> + } >> + >> + fclose(fp); >> + >> + return 0; >> +} > > Since this is a public non-performance critical API, shouldn't we check if > pci_dev is NULL? Otherwise the patch-set seems fine to me as far as VFIO > parts are concerned. > pci_scan_one() uses this api for now and it populate pci_dev before pci_vfio_is_noiommu() could use. So didn't though to add a check, But you are right in case any other module want to use this api. Sending patch now. Thanks. > Thanks, > Anatoly
[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode
Adding RTE_KDRV_VFIO_NOIOMMU mode in kernel driver. Also including rte_vfio_is_noiommu() helper function. This function will parse /sys/bus/pci/device// and make sure that - vfio noiommu mode set in kernel driver - pci device attached to vfio-noiommu driver only If both condition satisfies then set drv->kdrv = RTE_KDRV_VFIO_NOIOMMU Also did similar changes in virtio_rd/wr, Changes applicable for virtio spec 0.95 only. Signed-off-by: Santosh Shukla --- v5--> v6: - Include pci_dev == NULL check in pci_vfio_is_noiommu(), suggested by Anatoly. v4--> v5: - Removed virtio_xx_init_by_vfio and added new driver mode. - Now no need to parse vfio interface in virtio. As pci_eal module will take of vfio-noiommu driver parsing for virtio or any other future device willing to use vfio-noiommu driver. drivers/net/virtio/virtio_pci.c| 12 ++--- lib/librte_eal/common/include/rte_pci.h|1 + lib/librte_eal/linuxapp/eal/eal_pci.c | 13 +++-- lib/librte_eal/linuxapp/eal/eal_pci_init.h |1 + lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 72 5 files changed, 90 insertions(+), 9 deletions(-) diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c index 0c29f1d..537c552 100644 --- a/drivers/net/virtio/virtio_pci.c +++ b/drivers/net/virtio/virtio_pci.c @@ -60,7 +60,7 @@ virtio_read_reg_1(struct virtio_hw *hw, uint64_t reg_offset) struct rte_pci_device *dev; dev = hw->dev; - if (dev->kdrv == RTE_KDRV_VFIO) + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) ioport_inb(dev, reg_offset, &ret); else ret = inb(VIRTIO_PCI_REG_ADDR(hw, reg_offset)); @@ -75,7 +75,7 @@ virtio_read_reg_2(struct virtio_hw *hw, uint64_t reg_offset) struct rte_pci_device *dev; dev = hw->dev; - if (dev->kdrv == RTE_KDRV_VFIO) + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) ioport_inw(dev, reg_offset, &ret); else ret = inw(VIRTIO_PCI_REG_ADDR(hw, reg_offset)); @@ -90,7 +90,7 @@ virtio_read_reg_4(struct virtio_hw *hw, uint64_t reg_offset) struct rte_pci_device *dev; dev = hw->dev; - if (dev->kdrv == RTE_KDRV_VFIO) + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) ioport_inl(dev, reg_offset, &ret); else ret = inl(VIRTIO_PCI_REG_ADDR(hw, reg_offset)); @@ -104,7 +104,7 @@ virtio_write_reg_1(struct virtio_hw *hw, uint64_t reg_offset, uint8_t value) struct rte_pci_device *dev; dev = hw->dev; - if (dev->kdrv == RTE_KDRV_VFIO) + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) ioport_outb_p(dev, reg_offset, value); else outb_p((unsigned char)value, @@ -117,7 +117,7 @@ virtio_write_reg_2(struct virtio_hw *hw, uint64_t reg_offset, uint16_t value) struct rte_pci_device *dev; dev = hw->dev; - if (dev->kdrv == RTE_KDRV_VFIO) + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) ioport_outw_p(dev, reg_offset, value); else outw_p((unsigned short)value, @@ -130,7 +130,7 @@ virtio_write_reg_4(struct virtio_hw *hw, uint64_t reg_offset, uint32_t value) struct rte_pci_device *dev; dev = hw->dev; - if (dev->kdrv == RTE_KDRV_VFIO) + if (dev->kdrv == RTE_KDRV_VFIO_NOIOMMU) ioport_outl_p(dev, reg_offset, value); else outl_p((unsigned int)value, diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 0c667ff..2dbc658 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -149,6 +149,7 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, + RTE_KDRV_VFIO_NOIOMMU, RTE_KDRV_NONE, }; diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index eb503f0..2936497 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -131,6 +131,7 @@ rte_eal_pci_map_device(struct rte_pci_device *dev) /* try mapping the NIC resources using VFIO if it exists */ switch (dev->kdrv) { case RTE_KDRV_VFIO: + case RTE_KDRV_VFIO_NOIOMMU: #ifdef VFIO_PRESENT if (pci_vfio_is_enabled()) ret = pci_vfio_map_resource(dev); @@ -158,6 +159,7 @@ rte_eal_pci_unmap_device(struct rte_pci_device *dev) /* try unmapping the NIC resources using VFIO if it exists */ switch (dev->kdrv) { case RTE_KDRV_VFIO: + case RTE_KDRV_VFIO_NOIOMMU: RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n"); break; case RTE_KDRV_IGB_UIO: @@ -353,9 +355,12 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t bus, } if (!ret) { - if (!strcmp(driver, "vfio-pci")) -
[dpdk-dev] [PATCH] examples/vhost: fix out of sequence packets
Issue description: when packets go through vhost example to virtio device and come back to another virtio device or physical NIC, the sequence of packets will be changed. Reported-by: Thomas Long Signed-off-by: Jianfeng Tan --- examples/vhost/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 2dcdacb..aa9aa5a 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -1336,8 +1336,8 @@ switch_worker(__attribute__((unused)) void *arg) rte_pktmbuf_free(pkts_burst[--tx_count]); } } - while (tx_count) - virtio_tx_route(vdev, pkts_burst[--tx_count], (uint16_t)dev->device_fh); + for (i = 0; i < tx_count; ++i) + virtio_tx_route(vdev, pkts_burst[i], (uint16_t)dev->device_fh); } /*move to the next device in the list*/ -- 2.1.4
[dpdk-dev] [PATCH] examples/vhost: fix out of sequence packets
On Wed, Jan 20, 2016 at 03:18:11AM +0800, Jianfeng Tan wrote: > Issue description: when packets go through vhost example to virtio > device and come back to another virtio device or physical NIC, the > sequence of packets will be changed. > > Reported-by: Thomas Long > Signed-off-by: Jianfeng Tan Acked-by: Yuanhan Liu --yliu
[dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio ring
On 1/20/2016 2:33 AM, Polehn, Mike A wrote: > SMP operations can be very expensive, sometimes can impact operations by 100s > to 1000s of clock cycles depending on what is the circumstances of the > synchronization. It is how you arrange the SMP operations within the tasks at > hand across the SMP cores that gives methods for top performance. Using > traditional general purpose SMP methods will result in traditional general > purpose performance. Migrating to general libraries (understood by most > general purpose programmers) from expert abilities (understood by much > smaller group of expert programmers focused on performance) will greatly > reduce the value of DPDK since the end result will be lower performance > and/or have less predictable operation where rate performance, > predictability, and low latency are the primary goals. > > The best method to date, is to have multiple outputs to a single port is to > use a DPDK queue with multiple producer, single consumer to do an SMP > operation for multiple sources to feed a single non SMP task to output to the > port (that is why the ports are not SMP protected). Also when considerable > contention from multiple sources occur often (data feeding at same time), > having DPDK queue with input and output variables in separate cache lines > can have a notable throughput improvement. > > Mike Mike: Thanks for detailed explanation. Do you have comment to this patch? > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei > Sent: Tuesday, January 19, 2016 8:44 AM > To: Tan, Jianfeng; dev at dpdk.org > Cc: ann.zhuangyanying at huawei.com > Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio > ring > > On 1/20/2016 12:25 AM, Tan, Jianfeng wrote: >> Hi Huawei, >> >> On 1/4/2016 10:46 PM, Huawei Xie wrote: >>> This patch removes the internal lockless enqueue implmentation. >>> DPDK doesn't support receiving/transmitting packets from/to the same >>> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK >>> applications normally have their own lock implmentation when enqueue >>> packets to the same queue of a port. >>> >>> The atomic cmpset is a costly operation. This patch should help >>> performance a bit. >>> >>> Signed-off-by: Huawei Xie >>> --- >>> lib/librte_vhost/vhost_rxtx.c | 86 >>> +-- >>> 1 file changed, 25 insertions(+), 61 deletions(-) >>> >>> diff --git a/lib/librte_vhost/vhost_rxtx.c >>> b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..26a1b9c 100644 >>> --- a/lib/librte_vhost/vhost_rxtx.c >>> +++ b/lib/librte_vhost/vhost_rxtx.c >> I think vhost example will not work well with this patch when >> vm2vm=software. >> >> Test case: >> Two virtio ports handled by two pmd threads. Thread 0 polls pkts from >> physical NIC and sends to virtio0, while thread0 receives pkts from >> virtio1 and routes it to virtio0. > vhost port will be wrapped as port, by vhost PMD. DPDK APP treats all > physical and virtual ports as ports equally. When two DPDK threads try > to enqueue to the same port, the APP needs to consider the contention. > All the physical PMDs doesn't support concurrent enqueuing/dequeuing. > Vhost PMD should expose the same behavior unless absolutely necessary > and we expose the difference of different PMD. > >>> - >>> *(volatile uint16_t *)&vq->used->idx += entry_success; >> Another unrelated question: We ever try to move this assignment out of >> loop to save cost as it's a data contention? > This operation itself is not that costly, but it has side effect on the > cache transfer. > It is outside of the loop for non-mergable case. For mergeable case, it > is inside the loop. > Actually it has pro and cons whether we do this in burst or in a smaller > step. I prefer to move it outside of the loop. Let us address this later. > >> Thanks, >> Jianfeng >> >> >
[dpdk-dev] [PATCH 0/4] virtio support for container
On 1/12/2016 1:37 PM, Tetsuya Mukawa wrote: > Hi Jianfeng and Xie, > > I guess my implementation and yours have a lot of common code, so I will > try to rebase my patch on yours. > > BTW, one thing I need to change your memory allocation way is that > mmaped address should be under 44bit(32 + PAGE_SHIFT) to work with my patch. > This is because VIRTIO_PCI_QUEUE_PFN register only accepts such address. > (I may need to add one more EAL parameter like "--mmap-under ") I believe it is OK to mmap under 44bit, but better check the user space address space layout. > > Thanks, > Tetsuya
[dpdk-dev] [PATCH] i40e: fix vlan filtering
It works as expected, thanks. Tested-by Yulong.pei at intel.com Best Regards Yulong Pei -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Julien Meunier Sent: Tuesday, January 19, 2016 1:19 AM To: Zhang, Helin Cc: dev at dpdk.org Subject: [dpdk-dev] [PATCH] i40e: fix vlan filtering VLAN filtering was always performed, even if hw_vlan_filter was disabled. During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH was applied. In this situation, all incoming VLAN frames were dropped by the card (increase of the register RUPP - Rx Unsupported Protocol). In order to restore default behavior, if HW VLAN filtering is activated, set a filter to match MAC and VLAN. If not, set a filter to only match MAC. Signed-off-by: Julien Meunier Signed-off-by: David Marchand --- drivers/net/i40e/i40e_ethdev.c | 39 ++- drivers/net/i40e/i40e_ethdev.h | 1 + 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index bf6220d..ef9d578 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, int mask) struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); struct i40e_vsi *vsi = pf->main_vsi; + if (mask & ETH_VLAN_FILTER_MASK) { + if (dev->data->dev_conf.rxmode.hw_vlan_filter) + i40e_vsi_config_vlan_filter(vsi, TRUE); + else + i40e_vsi_config_vlan_filter(vsi, FALSE); + } + if (mask & ETH_VLAN_STRIP_MASK) { /* Enable or disable VLAN stripping */ if (dev->data->dev_conf.rxmode.hw_vlan_strip) @@ -4156,6 +4163,34 @@ fail_mem: return NULL; } +/* Configure vlan filter on or off */ +int +i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on) { + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); + struct i40e_mac_filter_info filter; + int ret; + + rte_memcpy(&filter.mac_addr, + (struct ether_addr *)(hw->mac.perm_addr), ETH_ADDR_LEN); + ret = i40e_vsi_delete_mac(vsi, &filter.mac_addr); + + if (on) { + /* Filter to match MAC and VLAN */ + filter.filter_type = RTE_MACVLAN_PERFECT_MATCH; + } else { + /* Filter to match only MAC */ + filter.filter_type = RTE_MAC_PERFECT_MATCH; + } + + ret |= i40e_vsi_add_mac(vsi, &filter); + + if (ret) + PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter", + on ? "enable" : "disable"); + return ret; +} + /* Configure vlan stripping on or off */ int i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on) @@ -4203,9 +4238,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev) { struct rte_eth_dev_data *data = dev->data; int ret; + int mask = 0; /* Apply vlan offload setting */ - i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK); + mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK; + i40e_vlan_offload_set(dev, mask); /* Apply double-vlan setting, not implemented yet */ diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..5505d72 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -551,6 +551,7 @@ void i40e_vsi_queues_unbind_intr(struct i40e_vsi *vsi); int i40e_vsi_vlan_pvid_set(struct i40e_vsi *vsi, struct i40e_vsi_vlan_pvid_info *info); int i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on); +int i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on); uint64_t i40e_config_hena(uint64_t flags); uint64_t i40e_parse_hena(uint64_t flags); enum i40e_status_code i40e_fdir_setup_tx_resources(struct i40e_pf *pf); -- 2.1.4
[dpdk-dev] [PATCH] i40e: fix vlan filtering
> -Original Message- > From: Julien Meunier [mailto:julien.meunier at 6wind.com] > Sent: Tuesday, January 19, 2016 1:19 AM > To: Zhang, Helin > Cc: dev at dpdk.org > Subject: [PATCH] i40e: fix vlan filtering > > VLAN filtering was always performed, even if hw_vlan_filter was disabled. > During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH > was applied. In this situation, all incoming VLAN frames were dropped by the > card (increase of the register RUPP - Rx Unsupported Protocol). > > In order to restore default behavior, if HW VLAN filtering is activated, set a > filter to match MAC and VLAN. If not, set a filter to only match MAC. > > Signed-off-by: Julien Meunier > Signed-off-by: David Marchand > --- > drivers/net/i40e/i40e_ethdev.c | 39 > ++- > drivers/net/i40e/i40e_ethdev.h | 1 + > 2 files changed, 39 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c > index bf6220d..ef9d578 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, > int mask) > struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data- > >dev_private); > struct i40e_vsi *vsi = pf->main_vsi; > > + if (mask & ETH_VLAN_FILTER_MASK) { > + if (dev->data->dev_conf.rxmode.hw_vlan_filter) > + i40e_vsi_config_vlan_filter(vsi, TRUE); > + else > + i40e_vsi_config_vlan_filter(vsi, FALSE); > + } > + > if (mask & ETH_VLAN_STRIP_MASK) { > /* Enable or disable VLAN stripping */ > if (dev->data->dev_conf.rxmode.hw_vlan_strip) > @@ -4156,6 +4163,34 @@ fail_mem: > return NULL; > } > > +/* Configure vlan filter on or off */ > +int > +i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on) { > + struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); > + struct i40e_mac_filter_info filter; > + int ret; > + > + rte_memcpy(&filter.mac_addr, > +(struct ether_addr *)(hw->mac.perm_addr), > ETH_ADDR_LEN); > + ret = i40e_vsi_delete_mac(vsi, &filter.mac_addr); > + > + if (on) { > + /* Filter to match MAC and VLAN */ > + filter.filter_type = RTE_MACVLAN_PERFECT_MATCH; > + } else { > + /* Filter to match only MAC */ > + filter.filter_type = RTE_MAC_PERFECT_MATCH; > + } > + > + ret |= i40e_vsi_add_mac(vsi, &filter); How would it be if multiple mac addresses has been configured? I think this might be ignored in the code changes, right? Regards, Helin > + > + if (ret) > + PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter", > + on ? "enable" : "disable"); > + return ret; > +} > + > /* Configure vlan stripping on or off */ int > i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on) @@ -4203,9 > +4238,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev) { > struct rte_eth_dev_data *data = dev->data; > int ret; > + int mask = 0; > > /* Apply vlan offload setting */ > - i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK); > + mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK; > + i40e_vlan_offload_set(dev, mask); > > /* Apply double-vlan setting, not implemented yet */ > > diff --git a/drivers/net/i40e/i40e_ethdev.h > b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..5505d72 100644 > --- a/drivers/net/i40e/i40e_ethdev.h > +++ b/drivers/net/i40e/i40e_ethdev.h > @@ -551,6 +551,7 @@ void i40e_vsi_queues_unbind_intr(struct i40e_vsi > *vsi); int i40e_vsi_vlan_pvid_set(struct i40e_vsi *vsi, > struct i40e_vsi_vlan_pvid_info *info); int > i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on); > +int i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on); > uint64_t i40e_config_hena(uint64_t flags); uint64_t > i40e_parse_hena(uint64_t flags); enum i40e_status_code > i40e_fdir_setup_tx_resources(struct i40e_pf *pf); > -- > 2.1.4
[dpdk-dev] Missing Outstanding Patches (By Me) In Patchwork
Hi Matthew, I hope that is what you are looking for: http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=37&state=*&archive=both You just click on Filters and there are few options... Andriy On Wed, Jan 20, 2016 at 6:20 AM, Matthew Hall wrote: > I have some outstanding minor patches which do not appear in Patchwork > anywhere I can see but the interface is also pretty confusing. > > Is there a way to find all patches by a person throughout time so I can see > what happened to them and check why they are not listed and also not merged > (that I am aware of anyway)? > > Sincerely, > Matthew. -- Andriy Berestovskyy
[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging
On 01/20/2016 08:32 AM, Matthew Hall wrote: > Hello, > > Since the pktgen code is reindented I am finding time to read through it > and experiment and see if I can get it working. > > I have issues with the init process of pktgen. It is difficult to debug > it because the init code does a lot of very scary stuff to the terminal > control / TTY device at inconvenient times in an inconvenient order, and > in the process damages the debug output and damages the screen of your > GDB without doing weird things to run GDB on a different TTY. > > Of course I am willing to contribute patches and not just complain, but > first I need some help to follow what is going on. > > Here is the problematic call-flow with some explanation what went wrong > trying it on some community machines outside of its original environment: > > 1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); > which dumps tons of weird boilerplate of licenses, copyrights, code > creator, etc. > > It is open source and everybody that matters already knows who coded it, > so is this stuff really that important? This gets in the way when you > are trying to work on it and I just have to comment it out. > > 2) it calls wr_scrn_setw and tinkers with the windows size very early in > the init which can make your terminal weird > > 3) it calls rte_eal_init which produces a lot of nice debug output, > which is fine > > 4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls > wr_scrn_erase which destroys the valuable debug output just created in > (c) which is a bad thing > > 5) it calls wr_print_copyright and dumps more boilerplate I am not sure > is needed > > 6) it logs some helpful messages about the port / descriptor settings > which is fine > > 7) it calls the pktgen_config_ports function which can crash in ways you > need the destroyed debug output to fix. > > For example in my case that function crashes here: > > if (pktgen.nb_ports == 0) > pktgen_log_panic("*** Did not find any ports to use ***"); > > 8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). > Is this stuff really needed? This is a ton of output for just starting > up some test program. > > To fix this debug problem I propose some changes which I am happy to > help develop: > > 1) decide what of this output we really need here and greatly simplify > how much gets printed out > > 2) move wr_scrn_setw right before pktgen_init_screen and after > rte_eal_init to prevent damaging that output > > 3) consider how wr_scrn_init is called in pktgen_init_screen, because it > calls wr_scrn_erase which damages output > > 4) I think that pktgen_config_ports should be called before all this > weird screen init stuff, so that if it fails you can actually see what > happened there. > > One other random topic... on the long lines of code it looks like there > are some gigantic tab-indents pushing things off to the right still. One > example, maybe there are others or another setting which is needed to > fix all of these: > > info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, > (sizeof(pkt_seq_t) * NUM_TOTAL_PKTS), > > RTE_CACHE_LINE_SIZE, > rte_socket_id()); > > Thoughts? Just that I'm in violent agreement about the splash screens and all. Unfortunately the license explicitly forbids removal of the copyright messages (http://dpdk.org/browse/apps/pktgen-dpdk/tree/LICENSE#n18): -- # 4) The screens displayed by the application must contain the copyright notice as defined # above and can not be removed without specific prior written permission. -- Keith, any chance you could work out the details with Wind River to get the ridiculous startup messages straightened out? I dont think anybody would mind a line or two "copyright by..." kind of printf() in there if that's what it takes, but the current screen after screen after screen copyrights and advertisements are obnoxious to the point of driving potential users away. - Panu - > Matthew Hall
[dpdk-dev] [PATCH 1/3] i40e: enable DCB in VMDQ vsis
Previously, DCB is only enabled on PF, queue mapping and BW configuration is only done on PF vsi. This patch enabled DCB for VMDQ vsis by following steps: 1. Take BW and ETS configuration on VEB. 2. Take BW and ETS configuration on VMDQ vsis. 3. Update TC and queues mapping on VMDQ vsis. To enable DCB on VMDQ, the number of TCs should not be lager than the number of queues in VMDQ pools, and the number of queues per VMDQ pool is specified by CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM in config/common_* file. Signed-off-by: Jingjing Wu --- doc/guides/rel_notes/release_2_3.rst | 2 + drivers/net/i40e/i40e_ethdev.c | 153 ++- drivers/net/i40e/i40e_ethdev.h | 28 --- 3 files changed, 151 insertions(+), 32 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..cd3d391 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,8 @@ DPDK Release 2.3 New Features +* **Added i40e DCB support in VMDQ mode.** + Resolved Issues --- diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index bf6220d..fbafcc6 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -8087,6 +8087,8 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi *vsi, int i, total_tc = 0; uint16_t qpnum_per_tc, bsf, qp_idx; struct rte_eth_dev_data *dev_data = I40E_VSI_TO_DEV_DATA(vsi); + struct i40e_pf *pf = I40E_VSI_TO_PF(vsi); + uint16_t used_queues; ret = validate_tcmap_parameter(vsi, enabled_tcmap); if (ret != I40E_SUCCESS) @@ -8100,7 +8102,18 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi *vsi, total_tc = 1; vsi->enabled_tc = enabled_tcmap; - qpnum_per_tc = dev_data->nb_rx_queues / total_tc; + /* different VSI has different queues assigned */ + if (vsi->type == I40E_VSI_MAIN) + used_queues = dev_data->nb_rx_queues - + pf->nb_cfg_vmdq_vsi * RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM; + else if (vsi->type == I40E_VSI_VMDQ2) + used_queues = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM; + else { + PMD_INIT_LOG(ERR, "unsupported VSI type."); + return I40E_ERR_NO_AVAILABLE_VSI; + } + + qpnum_per_tc = used_queues / total_tc; /* Number of queues per enabled TC */ if (qpnum_per_tc == 0) { PMD_INIT_LOG(ERR, " number of queues is less that tcs."); @@ -8145,6 +8158,93 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi *vsi, } /* + * i40e_config_switch_comp_tc - Configure VEB tc setting for given TC map + * @veb: VEB to be configured + * @tc_map: enabled TC bitmap + * + * Returns 0 on success, negative value on failure + */ +static enum i40e_status_code +i40e_config_switch_comp_tc(struct i40e_veb *veb, uint8_t tc_map) +{ + struct i40e_aqc_configure_switching_comp_bw_config_data veb_bw; + struct i40e_aqc_query_switching_comp_bw_config_resp bw_query; + struct i40e_aqc_query_switching_comp_ets_config_resp ets_query; + struct i40e_hw *hw = I40E_VSI_TO_HW(veb->associate_vsi); + enum i40e_status_code ret = I40E_SUCCESS; + int i; + uint32_t bw_max; + + /* Check if enabled_tc is same as existing or new TCs */ + if (veb->enabled_tc == tc_map) + return ret; + + /* configure tc bandwidth */ + memset(&veb_bw, 0, sizeof(veb_bw)); + veb_bw.tc_valid_bits = tc_map; + /* Enable ETS TCs with equal BW Share for now across all VSIs */ + for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) { + if (tc_map & BIT_ULL(i)) + veb_bw.tc_bw_share_credits[i] = 1; + } + ret = i40e_aq_config_switch_comp_bw_config(hw, veb->seid, + &veb_bw, NULL); + if (ret) { + PMD_INIT_LOG(ERR, "AQ command Config switch_comp BW allocation" + " per TC failed = %d", + hw->aq.asq_last_status); + return ret; + } + + memset(&ets_query, 0, sizeof(ets_query)); + ret = i40e_aq_query_switch_comp_ets_config(hw, veb->seid, + &ets_query, NULL); + if (ret != I40E_SUCCESS) { + PMD_DRV_LOG(ERR, "Failed to get switch_comp ETS" +" configuration %u", hw->aq.asq_last_status); + return ret; + } + memset(&bw_query, 0, sizeof(bw_query)); + ret = i40e_aq_query_switch_comp_bw_config(hw, veb->seid, + &bw_query, NULL); + if (ret != I40E_SUCCESS) { + PMD_DRV_LOG(ERR, "Failed to get switch_comp bandwidth" +" configuration %u", hw->aq.asq_last_status); +
[dpdk-dev] [PATCH 2/3] ixgbe: add more multi queue mode checking
The multi queue mode ETH_MQ_RX_VMDQ_DCB_RSS is not supported in ixgbe driver. This patch added the checking. Signed-off-by: Jingjing Wu --- drivers/net/ixgbe/ixgbe_ethdev.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 4c4c6df..24cd30b 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1853,6 +1853,11 @@ ixgbe_check_mq_mode(struct rte_eth_dev *dev) return -EINVAL; } } else { + if (dev_conf->rxmode.mq_mode == ETH_MQ_RX_VMDQ_DCB_RSS) { + PMD_INIT_LOG(ERR, "VMDQ+DCB+RSS mq_mode is" + " not supported."); + return -EINVAL; + } /* check configuration for vmdb+dcb mode */ if (dev_conf->rxmode.mq_mode == ETH_MQ_RX_VMDQ_DCB) { const struct rte_eth_vmdq_dcb_conf *conf; -- 2.4.0
[dpdk-dev] [PATCH 3/3] examples/vmdq_dcb: extend sample for X710 supporting
Currently, the example vmdq_dcb only works on Intel? 82599 NICs. This patch extended this sample to make it work both on Intel? 82599 and X710/XL710 NICs by following changes: 1. add VMDQ base queue checking to avoid forwarding on PF queues. 2. assign each VMDQ pools with MAC address. 3. add more arguments (nb-tcs, enable-rss) to change the default setting 4. extend the max number of queues from 128 to 1024. This patch also reworked the user guide for the vmdq_dcb sample. Signed-off-by: Jingjing Wu --- doc/guides/rel_notes/release_2_3.rst | 2 + doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 169 ++ examples/vmdq_dcb/main.c | 388 ++- 3 files changed, 430 insertions(+), 129 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index cd3d391..9637bf1 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -25,6 +25,8 @@ Libraries Examples +* **vmdq_dcb: extended to support Intel XL710 series NICs.** + Other ~ diff --git a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst index 9140a22..fe717fa 100644 --- a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst +++ b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst @@ -32,8 +32,8 @@ VMDQ and DCB Forwarding Sample Application == The VMDQ and DCB Forwarding sample application is a simple example of packet processing using the DPDK. -The application performs L2 forwarding using VMDQ and DCB to divide the incoming traffic into 128 queues. -The traffic splitting is performed in hardware by the VMDQ and DCB features of the Intel? 82599 10 Gigabit Ethernet Controller. +The application performs L2 forwarding using VMDQ and DCB to divide the incoming traffic into queues. +The traffic splitting is performed in hardware by the VMDQ and DCB features of the Intel? 82599 and X710/XL710 Ethernet Controller. Overview @@ -41,28 +41,27 @@ Overview This sample application can be used as a starting point for developing a new application that is based on the DPDK and uses VMDQ and DCB for traffic partitioning. -The VMDQ and DCB filters work on VLAN traffic to divide the traffic into 128 input queues on the basis of the VLAN ID field and -VLAN user priority field. -VMDQ filters split the traffic into 16 or 32 groups based on the VLAN ID. -Then, DCB places each packet into one of either 4 or 8 queues within that group, based upon the VLAN user priority field. - -In either case, 16 groups of 8 queues, or 32 groups of 4 queues, the traffic can be split into 128 hardware queues on the NIC, -each of which can be polled individually by a DPDK application. +The VMDQ and DCB filters work on MAC and VLAN traffic to divide the traffic into input queues on the basis of the Destination MAC +address, VLAN ID and VLAN user priority fields. +VMDQ filters split the traffic into 16 or 32 groups based on the Destination MAC and VLAN ID. +Then, DCB places each packet into one of queues within that group, based upon the VLAN user priority field. All traffic is read from a single incoming port (port 0) and output on port 1, without any processing being performed. -The traffic is split into 128 queues on input, where each thread of the application reads from multiple queues. -For example, when run with 8 threads, that is, with the -c FF option, each thread receives and forwards packets from 16 queues. +Take Intel? 82599 NIC for example, the traffic is split into 128 queues on input, where each thread of the application reads from +multiple queues. When run with 8 threads, that is, with the -c FF option, each thread receives and forwards packets from 16 queues. -As supplied, the sample application configures the VMDQ feature to have 16 pools with 8 queues each as indicated in :numref:`figure_vmdq_dcb_example`. -The Intel? 82599 10 Gigabit Ethernet Controller NIC also supports the splitting of traffic into 32 pools of 4 queues each and -this can be used by changing the NUM_POOLS parameter in the supplied code. -The NUM_POOLS parameter can be passed on the command line, after the EAL parameters: +As supplied, the sample application configures the VMDQ feature to have 32 pools with 4 queues each as indicated in :numref:`figure_vmdq_dcb_example`. +The Intel? 82599 10 Gigabit Ethernet Controller NIC also supports the splitting of traffic into 16 pools of 8 queues. While the +Intel? X710 or XL710 Ethernet Controller NICs support any specified VMDQ pools of 4 or 8 queues each. For simplicity, only 16 +or 32 pools is supported in this sample. And queues numbers for each VMDQ pool can be changed by setting CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM +in config/common_* file. +The nb-pools, nb-tcs and enable-rss parameters can be passed on the command line, after the EAL parameters: .. code-bl
[dpdk-dev] [PATCH 0/3] extend vmdq_dcb sample for X710 supporting
Currently, the example vmdq_dcb only works on Intel? 82599 NICs. This patch set extended this sample to make it works both on Intel? 82599 and X710/XL710 NICs. This patch set also enabled DCB VMDQ mode in i40e driver and added unsupported mode checking in ixgbe driver. Jingjing Wu (3): i40e: enable DCB in VMDQ vsis ixgbe: add more multi queue mode checking examples/vmdq_dcb: extend sample for X710 supporting doc/guides/rel_notes/release_2_3.rst | 4 + doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 169 ++ drivers/net/i40e/i40e_ethdev.c | 153 +++-- drivers/net/i40e/i40e_ethdev.h | 28 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 + examples/vmdq_dcb/main.c | 388 ++- 6 files changed, 586 insertions(+), 161 deletions(-) -- 2.4.0
[dpdk-dev] [PATCH v6 1/2] tools: Add support for handling built-in kernel modules
From: Kamil Rytarowski Currently dpdk_nic_bind.py detects Linux kernel modules via reading /proc/modules. Built-in ones aren't listed there and therefore they are not being found by the script. Add support for checking built-in modules with parsing the sysfs files. This commit obsoletes the /proc/modules parsing approach. Signed-off-by: Kamil Rytarowski Acked-by: David Marchand Acked-by: Yuanhan Liu --- tools/dpdk_nic_bind.py | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py index f02454e..1d16d9f 100755 --- a/tools/dpdk_nic_bind.py +++ b/tools/dpdk_nic_bind.py @@ -156,22 +156,32 @@ def check_modules(): '''Checks that igb_uio is loaded''' global dpdk_drivers -fd = file("/proc/modules") -loaded_mods = fd.readlines() -fd.close() - # list of supported modules mods = [{"Name" : driver, "Found" : False} for driver in dpdk_drivers] # first check if module is loaded -for line in loaded_mods: +try: +# Get list of syfs modules, some of them might be builtin and merge with mods +sysfs_path = '/sys/module/' + +# Get the list of directories in sysfs_path +sysfs_mods = [os.path.join(sysfs_path, o) for o + in os.listdir(sysfs_path) + if os.path.isdir(os.path.join(sysfs_path, o))] + +# Extract the last element of '/sys/module/abc' in the array +sysfs_mods = [a.split('/')[-1] for a in sysfs_mods] + +# special case for vfio_pci (module is named vfio-pci, +# but its .ko is named vfio_pci) +sysfs_mods = map(lambda a: + a if a != 'vfio_pci' else 'vfio-pci', sysfs_mods) + for mod in mods: -if line.startswith(mod["Name"]): -mod["Found"] = True -# special case for vfio_pci (module is named vfio-pci, -# but its .ko is named vfio_pci) -elif line.replace("_", "-").startswith(mod["Name"]): +if mod["Found"] == False and (mod["Name"] in sysfs_mods): mod["Found"] = True +except: +pass # check if we have at least one loaded module if True not in [mod["Found"] for mod in mods] and b_flag is not None: -- 1.9.1
[dpdk-dev] [PATCH v6 2/2] eal/linux: Add support for handling built-in kernel modules
From: Kamil Rytarowski Currently rte_eal_check_module() detects Linux kernel modules via reading /proc/modules. Built-in ones aren't listed there and therefore they are not being found by the script. Add support for checking built-in modules with parsing the sysfs files This commit obsoletes the /proc/modules parsing approach. Signed-off-by: Kamil Rytarowski Acked-by: David Marchand Acked-by: Yuanhan Liu --- lib/librte_eal/linuxapp/eal/eal.c | 34 -- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 635ec36..21a4a32 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -901,27 +901,33 @@ int rte_eal_has_hugepages(void) int rte_eal_check_module(const char *module_name) { - char mod_name[30]; /* Any module names can be longer than 30 bytes? */ - int ret = 0; + char sysfs_mod_name[PATH_MAX]; + struct stat st; int n; if (NULL == module_name) return -1; - FILE *fd = fopen("/proc/modules", "r"); - if (NULL == fd) { - RTE_LOG(ERR, EAL, "Open /proc/modules failed!" - " error %i (%s)\n", errno, strerror(errno)); + /* Check if there is sysfs mounted */ + if (stat("/sys/module", &st) != 0) { + RTE_LOG(DEBUG, EAL, "sysfs is not mounted! error %i (%s)\n", + errno, strerror(errno)); return -1; } - while (!feof(fd)) { - n = fscanf(fd, "%29s %*[^\n]", mod_name); - if ((n == 1) && !strcmp(mod_name, module_name)) { - ret = 1; - break; - } + + /* A module might be built-in, therefore try sysfs */ + n = snprintf(sysfs_mod_name, PATH_MAX, "/sys/module/%s", module_name); + if (n < 0 || n > PATH_MAX) { + RTE_LOG(DEBUG, EAL, "Could not format module path\n"); + return -1; } - fclose(fd); - return ret; + if (stat(sysfs_mod_name, &st) != 0) { + RTE_LOG(DEBUG, EAL, "Module %s not found! error %i (%s)\n", + sysfs_mod_name, errno, strerror(errno)); + return 0; + } + + /* Module has been found */ + return 1; } -- 1.9.1
[dpdk-dev] [PATCH] ip_pipeline: fix cpu socket-id error
This patch fixes the socket-id error in ip_pipeline sample application running over uni-processor systems. Signed-off-by: Jasvinder Singh Acked-by: Cristian Dumitrescu --- examples/ip_pipeline/init.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c index 186ca03..86aa378 100644 --- a/examples/ip_pipeline/init.c +++ b/examples/ip_pipeline/init.c @@ -835,6 +835,17 @@ app_init_link_frag_ras(struct app_params *app) } } +static inline int +app_get_cpu_socket_id(uint32_t pmd_id) +{ + int status = rte_eth_dev_socket_id(pmd_id); + + if (status == -1) + return 0; + + return status; +} + static void app_init_link(struct app_params *app) { @@ -890,7 +901,7 @@ app_init_link(struct app_params *app) p_link->pmd_id, rxq_queue_id, p_rxq->size, - rte_eth_dev_socket_id(p_link->pmd_id), + app_get_cpu_socket_id(p_link->pmd_id), &p_rxq->conf, app->mempool[p_rxq->mempool_id]); if (status < 0) @@ -917,7 +928,7 @@ app_init_link(struct app_params *app) p_link->pmd_id, txq_queue_id, p_txq->size, - rte_eth_dev_socket_id(p_link->pmd_id), + app_get_cpu_socket_id(p_link->pmd_id), &p_txq->conf); if (status < 0) rte_panic("%s (%" PRIu32 "): " @@ -989,7 +1000,7 @@ app_init_tm(struct app_params *app) /* TM */ p_tm->sched_port_params.name = p_tm->name; p_tm->sched_port_params.socket = - rte_eth_dev_socket_id(p_link->pmd_id); + app_get_cpu_socket_id(p_link->pmd_id); p_tm->sched_port_params.rate = (uint64_t) link_eth_params.link_speed * 1000 * 1000 / 8; -- 2.5.0
[dpdk-dev] [PATCH v2] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...
Patch rework based on feedback, only x86 specific functions left under lib/librte_eal/common/include/arch/x86/. Signed-off-by: Wojciech Andralojc --- lib/librte_eal/common/include/arch/x86/rte_msr.h | 158 +++ 1 file changed, 158 insertions(+) create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h diff --git a/lib/librte_eal/common/include/arch/x86/rte_msr.h b/lib/librte_eal/common/include/arch/x86/rte_msr.h new file mode 100644 index 000..9d16633 --- /dev/null +++ b/lib/librte_eal/common/include/arch/x86/rte_msr.h @@ -0,0 +1,158 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_MSR_X86_64_H_ +#define _RTE_MSR_X86_64_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include //O_RDONLY +#include //pread + +#include +#include + +#define CPU_MSR_PATH "/dev/cpu/%u/msr" +#define CPU_MSR_PATH_MAX_LEN 32 + +/** + * This function should not be called directly. + * Function to open CPU's MSR file + */ +static int +__msr_open_file(const unsigned lcore, int flags) +{ + char fname[CPU_MSR_PATH_MAX_LEN] = {0}; + int fd = -1; + + snprintf(fname, sizeof(fname) - 1, CPU_MSR_PATH, lcore); + + fd = open(fname, flags); + + if (fd < 0) + RTE_LOG(ERR, PQOS, "Error opening file '%s'!\n", fname); + + return fd; +} + +/** + * Function to read CPU's MSR + * + * @param [in] lcore + * CPU logical core id + * + * @param [in] reg + * MSR reg to read + * + * @param [out] value + * Read value of MSR reg + * + * @return + * Operations status +*/ + +static inline int +rte_msr_read(const unsigned lcore, const uint32_t reg, uint64_t *value) +{ + int fd = -1; + int ret = -1; + + RTE_VERIFY(value != NULL); + if (value == NULL) + return -1; + + fd = __msr_open_file(lcore, O_RDONLY); + + if (fd >= 0) { + ssize_t read_ret = 0; + + read_ret = pread(fd, value, sizeof(value[0]), (off_t)reg); + + if (read_ret != sizeof(value[0])) { + RTE_LOG(ERR, PQOS, "RDMSR failed for reg[0x%x] on lcore %u\n", + (unsigned)reg, lcore); + } else + ret = 0; + + close(fd); + } + + return ret; +} + +/** + * Function to write CPU's MSR + * + * @param [in] lcore + * CPU logical core id + * + * @param [in] reg + * MSR reg to write + * + * @param [in] value + * Value to be written to MSR reg + * + * @return + * Operations status +*/ +static inline int +rte_msr_write(const unsigned lcore, const uint32_t reg, const uint64_t value) +{ + int fd = -1; + int ret = -1; + + fd = __msr_open_file(lcore, O_WRONLY); + + if (fd >= 0) { + ssize_t write_ret = 0; + + write_ret = pwrite(fd, &value, sizeof(value), (off_t)reg); + if (write_ret != sizeof(value)) { + RTE_LOG(ERR, PQOS, "WRMSR failed for reg[0x%x] <- value[0x%llx] on " + "lcore %u\n", (unsigned)reg, (unsigned long long)value, lcore); + } else + ret = 0; + + close(fd); + } + + return ret; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_MSR_X86_64_H_ */ -- 1.9.3
[dpdk-dev] [PATCH v2] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...
Hi Wojciech, Couple of nits, see below. Konstantin > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wojciech Andralojc > Sent: Wednesday, January 20, 2016 10:57 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2] Patch introducing API to read/write Intel > Architecture Model Specific Registers (MSR)... > > Patch rework based on feedback, only x86 specific functions left under > lib/librte_eal/common/include/arch/x86/. > > Signed-off-by: Wojciech Andralojc > --- > lib/librte_eal/common/include/arch/x86/rte_msr.h | 158 > +++ > 1 file changed, 158 insertions(+) > create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_msr.h > b/lib/librte_eal/common/include/arch/x86/rte_msr.h > new file mode 100644 > index 000..9d16633 > --- /dev/null > +++ b/lib/librte_eal/common/include/arch/x86/rte_msr.h > + > +#ifndef _RTE_MSR_X86_64_H_ > +#define _RTE_MSR_X86_64_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include //O_RDONLY > +#include //pread Pls remove '//' comments here. > + > +#include > +#include > + > +#define CPU_MSR_PATH "/dev/cpu/%u/msr" > +#define CPU_MSR_PATH_MAX_LEN 32 > + > +/** > + * This function should not be called directly. > + * Function to open CPU's MSR file > + */ > +static int > +__msr_open_file(const unsigned lcore, int flags) > +{ > + char fname[CPU_MSR_PATH_MAX_LEN] = {0}; Why not just use PATH_MAX here? > + int fd = -1; > + > + snprintf(fname, sizeof(fname) - 1, CPU_MSR_PATH, lcore); > + > + fd = open(fname, flags); > + > + if (fd < 0) > + RTE_LOG(ERR, PQOS, "Error opening file '%s'!\n", fname); > + > + return fd; > +} > + > +/** > + * Function to read CPU's MSR > + * > + * @param [in] lcore > + * CPU logical core id Hmm, are you aware that DPDK lcore id != CPU lcore id? Might be better to use 'cpuid' name here? Just to avoid confusion. > + * > + * @param [in] reg > + * MSR reg to read > + * > + * @param [out] value > + * Read value of MSR reg > + * > + * @return > + * Operations status > +*/ > + > +static inline int > +rte_msr_read(const unsigned lcore, const uint32_t reg, uint64_t *value) I don't think there is a need to put rte_msr_read/rte_msr_write() Definition into a header file and make them static inline. Just normal external function definition seems sufficient here. > +{ > + int fd = -1; > + int ret = -1; > + > + RTE_VERIFY(value != NULL); That's a a public API. No need to coredump if one of the input parameters is invalid. > + if (value == NULL) > + return -1; Might be better -EINVAL; > + > + fd = __msr_open_file(lcore, O_RDONLY); > + > + if (fd >= 0) { > + ssize_t read_ret = 0; > + > + read_ret = pread(fd, value, sizeof(value[0]), (off_t)reg); > + > + if (read_ret != sizeof(value[0])) { > + RTE_LOG(ERR, PQOS, "RDMSR failed for reg[0x%x] on lcore > %u\n", > + (unsigned)reg, lcore); > + } else > + ret = 0; > + > + close(fd); > + } > + > + return ret; > +}
[dpdk-dev] How classification happens in scheduling ?
Hi all, Could someone explain me how this code snippet determining subport,pipe,traffic_class,queue,color. uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *); //points to the start of the data in the mbuf *subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF) &(port_params.n_subports_per_port - 1); /* Outer VLAN ID*/ *pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) & (port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */ *traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) & (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */ *queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) & (RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */ *color = pdata[COLOR_OFFSET] & 0x03;/* Destination IP */ Thanks & Regards, Uday The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
[dpdk-dev] [PATCH] eal: add function to check if primary proc alive
This patch adds a new function to the EAL API: int rte_eal_primary_proc_alive(const char *path); The function indicates if a primary process is alive right now. This functionality is implemented by testing for a write- lock on the config file, and the function tests for a lock. The use case for this functionality is that a secondary process can wait until a primary process starts by polling the function and waiting. When the primary is running, the secondary continues to poll to detect if the primary process has quit unexpectedly, the secondary process can detect this. The RTE_MAGIC number is written to the shared config by the primary process, this is the signal to the secondary process that the EAL is set up, and ready to be used. The function rte_eal_mcfg_complete() writes RTE_MAGIC. This has been delayed in the EAL init proceedure, as the PCI probing in the primary process can interfere with the secondary running. Signed-off-by: Harry van Haaren --- doc/guides/rel_notes/release_2_3.rst| 7 +++ lib/librte_eal/bsdapp/eal/rte_eal_version.map | 8 lib/librte_eal/common/include/rte_eal.h | 19 +++ lib/librte_eal/linuxapp/eal/eal.c | 18 -- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 7 +++ 5 files changed, 57 insertions(+), 2 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..14b5b06 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -11,6 +11,13 @@ Resolved Issues EAL ~~~ +* **Added rte_eal_primary_proc_alive() function** + + A new function ``rte_eal_primary_proc_alive()`` has been added + to allow the user to detect if a primary process is running. + Use cases for this feature include fault detection, and monitoring + using secondary processes. + Drivers ~~~ diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 9d7adf1..0e28017 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -135,3 +135,11 @@ DPDK_2.2 { rte_xen_dom0_supported; } DPDK_2.1; + + +DPDK_2.3 { + global: + + rte_eal_primary_proc_alive; + +} DPDK_2.2; diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index d2816a8..6eb65f9 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -156,6 +156,25 @@ int rte_eal_iopl_init(void); * - On failure, a negative error value. */ int rte_eal_init(int argc, char **argv); + +/** + * Check if a primary process is currently alive + * + * This function returns true when a primary process is currently + * active. + * + * @param config_file_path + * The config_file_path argument provided should point at the location + * that the primary process will create its config file. By default, + * /var/run/.rte_config is used. This path can be customized when starting + * a primary process using --file-prefix=custom_path + * + * @return + * - If alive, returns one. + * - If dead, returns zero. + */ +int rte_eal_primary_proc_alive(const char *config_file_path); + /** * Usage function typedef used by the application usage function. * diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 635ec36..b419066 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -818,8 +818,6 @@ rte_eal_init(int argc, char **argv) eal_check_mem_on_local_socket(); - rte_eal_mcfg_complete(); - if (eal_plugins_init() < 0) rte_panic("Cannot init plugins\n"); @@ -877,9 +875,25 @@ rte_eal_init(int argc, char **argv) if (rte_eal_pci_probe()) rte_panic("Cannot probe PCI\n"); + rte_eal_mcfg_complete(); + return fctret; } +int +rte_eal_primary_proc_alive(const char *config_file_path) +{ + int config_fd; + config_fd = open(config_file_path, O_RDONLY); + if (config_fd < 0) + return 0; + + int ret = lockf(config_fd, F_TEST, 0); + close(config_fd); + + return !!ret; +} + /* get core role */ enum rte_lcore_role_t rte_eal_lcore_role(unsigned lcore_id) diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map index cbe175f..7a8c530 100644 --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map @@ -138,3 +138,10 @@ DPDK_2.2 { rte_xen_dom0_supported; } DPDK_2.1; + +DPDK_2.3 { + global: + + rte_eal_primary_proc_alive; + +} DPDK_2.2; -- 2.5.0
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
This patch exposes link duplex, speed, and status via the existing xstats API. Signed-off-by: Harry van Haaren --- doc/guides/rel_notes/release_2_3.rst | 1 + lib/librte_ether/rte_ethdev.c| 29 ++--- 2 files changed, 27 insertions(+), 3 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..c3449dc 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -19,6 +19,7 @@ Drivers Libraries ~ +* **Link Status added to extended statistics in ethdev** Examples diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ed971b4..3c35e1b 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2016 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -83,6 +83,15 @@ struct rte_eth_xstats_name_off { unsigned offset; }; +/* Link Status display in xstats */ +static const char * const rte_eth_duplex_strings[] = { + "link_duplex_autonegotiate", + "link_duplex_half", + "link_duplex_full" +}; + +#define RTE_NB_LINK_STATUS_STATS 3 + static const struct rte_eth_xstats_name_off rte_stats_strings[] = { {"rx_good_packets", offsetof(struct rte_eth_stats, ipackets)}, {"tx_good_packets", offsetof(struct rte_eth_stats, opackets)}, @@ -94,7 +103,10 @@ static const struct rte_eth_xstats_name_off rte_stats_strings[] = { rx_nombuf)}, }; -#define RTE_NB_STATS (sizeof(rte_stats_strings) / sizeof(rte_stats_strings[0])) +#define RTE_GENERIC_STATS (sizeof(rte_stats_strings) / \ + sizeof(rte_stats_strings[0])) + +#define RTE_NB_STATS (RTE_NB_LINK_STATUS_STATS + RTE_GENERIC_STATS) static const struct rte_eth_xstats_name_off rte_rxq_stats_strings[] = { {"packets", offsetof(struct rte_eth_stats, q_ipackets)}, @@ -1466,6 +1478,7 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats *xstats, { struct rte_eth_stats eth_stats; struct rte_eth_dev *dev; + struct rte_eth_link link; unsigned count = 0, i, q; signed xcount = 0; uint64_t val, *stats_ptr; @@ -1497,8 +1510,18 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats *xstats, count = 0; rte_eth_stats_get(port_id, ð_stats); + /* link status */ + rte_eth_link_get_nowait(port_id, &link); + snprintf(xstats[count].name, sizeof(xstats[count].name), "link_status"); + xstats[count++].value = link.link_status; + snprintf(xstats[count].name, sizeof(xstats[count].name), "link_speed"); + xstats[count++].value = link.link_speed; + snprintf(xstats[count].name, sizeof(xstats[count].name), +"%s", rte_eth_duplex_strings[link.link_duplex]); + xstats[count++].value = 1; + /* global stats */ - for (i = 0; i < RTE_NB_STATS; i++) { + for (i = 0; i < RTE_GENERIC_STATS; i++) { stats_ptr = RTE_PTR_ADD(ð_stats, rte_stats_strings[i].offset); val = *stats_ptr; -- 2.5.0
[dpdk-dev] How classification happens in scheduling ?
Hi Uday, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of > ravulakollu.kumar at wipro.com > Sent: Wednesday, January 20, 2016 12:06 PM > To: dev at dpdk.org > Subject: [dpdk-dev] How classification happens in scheduling ? > > Hi all, > > Could someone explain me how this code snippet determining > subport,pipe,traffic_class,queue,color. > > uint16_t *pdata = rte_pktmbuf_mtod(m, uint16_t *); //points to the > start of the data in the mbuf > > *subport = (rte_be_to_cpu_16(pdata[SUBPORT_OFFSET]) & 0x0FFF) > &(port_params.n_subports_per_port - 1); /* Outer VLAN ID*/ > *pipe = (rte_be_to_cpu_16(pdata[PIPE_OFFSET]) & 0x0FFF) & > (port_params.n_pipes_per_subport - 1); /* Inner VLAN ID */ > *traffic_class = (pdata[QUEUE_OFFSET] & 0x0F) & > (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1); /* Destination IP */ > *queue = ((pdata[QUEUE_OFFSET] >> 8) & 0x0F) & > (RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS - 1) ; /* Destination IP */ > *color = pdata[COLOR_OFFSET] & 0x03;/* Destination IP */ > > Thanks & Regards, > Uday To understand this, please refer to explanation (23.4) at http://dpdk.org/doc/guides/sample_app_ug/qos_scheduler.html The above code snippet is about classifying the incoming traffic packets based on their QinQ double VLAN tags and the IP destination address. The subport ID and pipe ID are determined by reading 12 bits svlan field at SUBPORT_OFFSET and 12 bits cvlan field at PIPE_OFFSET from the packet header. Traffic Class, pipe queue and color are determined by reading specific fields at offset QUEUE_OFFSET (Destination IP), QUEUE_OFFSET (Destination IP) and COLOR_OFFSET (Destination IP) from the packet's header. To read all these values from the packet header, first packet header fields need to be converted from big endian to CPU order. Since these values should not exceed their maximum values determined from configuration file, therefore, "&" operation with parameters such as port_params.n_subports_per_port, port_params.n_pipes_per_subport etc is performed to upper limit them. For these kind of queries, please use users at dpdk.org. Thanks, Jasvinder
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
On Wed, Jan 20, 2016 at 9:28 AM, Harry van Haaren wrote: > This patch exposes link duplex, speed, and status via the > existing xstats API. > I'm slightly confused by this. Why are we exposing operational properties of the chip through an API which I thought was primarily targeting statistics? When I think of statistics and a NIC, I think of values which are monotonically increasing. I think of values that are derived primary from the packets flowing through the system. I do not think of link state, link speed and duplex, which have nothing to do with packets, and are not monotonic. Should we not have a separate API to get this type information? I mean, just because we have a generic "string to uint64_t" map doesn't mean we should toss in anything that can fit into a uin64_t. Would you want to see the MAC address in here as well? If we put in link speed/etc. it seems like we may as well! Thanks, Kyle
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
Hi Kyle, > From: Kyle Larose [mailto:eomereadig at gmail.com] > On Wed, Jan 20, 2016 at 9:28 AM, Harry van Haaren > wrote: > > This patch exposes link duplex, speed, and status via the > > existing xstats API. > > I'm slightly confused by this. Why are we exposing operational > properties of the chip through an API which I thought was primarily > targeting statistics? In a fault-detection situation, link state is a good item to monitor - just like the rest of the statistics on the NIC. > When I think of statistics and a NIC, I think of > values which are monotonically increasing. I think of values that are > derived primary from the packets flowing through the system. I do not > think of link state, link speed and duplex, which have nothing to do > with packets, and are not monotonic. Link state, and speed seem a good fit to me. I'll admit I'm not sure about duplex, and would be happy to respin the patch without duplex if the community would prefer that. > Should we not have a separate API to get this type information? I > mean, just because we have a generic "string to uint64_t" map doesn't > mean we should toss in anything that can fit into a uin64_t. In theory we could create a new API for this, but I think the current xstats API is a good fit for exposing this info, so why create extra APIs? As a client of the DPDK API, I would prefer more statistics in a single API than have to research and implement two or more APIs to retrieve the information to monitor. I'm working on exposing keep-alive statistics using an xstats style API, I'll the patches later today so we can discuss them too. Regards, -Harry
[dpdk-dev] [PATCH] i40e: fix vlan filtering
Hello, Yes, you are right. Even if VLAN filtering is configured most of the time during initialization, we should managed the case of multiple MAC addresses already configured. I will send you a v2 patch with this modification, use ether_addr_copy and add additional debug messages. Regards, On 01/20/2016 06:00 AM, Zhang, Helin wrote: >> -Original Message- >> From: Julien Meunier [mailto:julien.meunier at 6wind.com] >> Sent: Tuesday, January 19, 2016 1:19 AM >> To: Zhang, Helin >> Cc:dev at dpdk.org >> Subject: [PATCH] i40e: fix vlan filtering >> >> VLAN filtering was always performed, even if hw_vlan_filter was disabled. >> During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH >> was applied. In this situation, all incoming VLAN frames were dropped by the >> card (increase of the register RUPP - Rx Unsupported Protocol). >> >> In order to restore default behavior, if HW VLAN filtering is activated, set >> a >> filter to match MAC and VLAN. If not, set a filter to only match MAC. >> >> Signed-off-by: Julien Meunier >> Signed-off-by: David Marchand >> --- >> drivers/net/i40e/i40e_ethdev.c | 39 >> ++- >> drivers/net/i40e/i40e_ethdev.h | 1 + >> 2 files changed, 39 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c >> index bf6220d..ef9d578 100644 >> --- a/drivers/net/i40e/i40e_ethdev.c >> +++ b/drivers/net/i40e/i40e_ethdev.c >> @@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, >> int mask) >> struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data- >>> dev_private); >> struct i40e_vsi *vsi = pf->main_vsi; >> >> +if (mask & ETH_VLAN_FILTER_MASK) { >> +if (dev->data->dev_conf.rxmode.hw_vlan_filter) >> +i40e_vsi_config_vlan_filter(vsi, TRUE); >> +else >> +i40e_vsi_config_vlan_filter(vsi, FALSE); >> +} >> + >> if (mask & ETH_VLAN_STRIP_MASK) { >> /* Enable or disable VLAN stripping */ >> if (dev->data->dev_conf.rxmode.hw_vlan_strip) >> @@ -4156,6 +4163,34 @@ fail_mem: >> return NULL; >> } >> >> +/* Configure vlan filter on or off */ >> +int >> +i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on) { >> +struct i40e_hw *hw = I40E_VSI_TO_HW(vsi); >> +struct i40e_mac_filter_info filter; >> +int ret; >> + >> +rte_memcpy(&filter.mac_addr, >> + (struct ether_addr *)(hw->mac.perm_addr), >> ETH_ADDR_LEN); >> +ret = i40e_vsi_delete_mac(vsi, &filter.mac_addr); >> + >> +if (on) { >> +/* Filter to match MAC and VLAN */ >> +filter.filter_type = RTE_MACVLAN_PERFECT_MATCH; >> +} else { >> +/* Filter to match only MAC */ >> +filter.filter_type = RTE_MAC_PERFECT_MATCH; >> +} >> + >> +ret |= i40e_vsi_add_mac(vsi, &filter); > How would it be if multiple mac addresses has been configured? > I think this might be ignored in the code changes, right? > > Regards, > Helin > >> + >> +if (ret) >> +PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter", >> +on ? "enable" : "disable"); >> +return ret; >> +} >> + >> /* Configure vlan stripping on or off */ int >> i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on) @@ -4203,9 >> +4238,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev) { >> struct rte_eth_dev_data *data = dev->data; >> int ret; >> +int mask = 0; >> >> /* Apply vlan offload setting */ >> -i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK); >> +mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK; >> +i40e_vlan_offload_set(dev, mask); >> >> /* Apply double-vlan setting, not implemented yet */ >> >> diff --git a/drivers/net/i40e/i40e_ethdev.h >> b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..5505d72 100644 >> --- a/drivers/net/i40e/i40e_ethdev.h >> +++ b/drivers/net/i40e/i40e_ethdev.h >> @@ -551,6 +551,7 @@ void i40e_vsi_queues_unbind_intr(struct i40e_vsi >> *vsi); int i40e_vsi_vlan_pvid_set(struct i40e_vsi *vsi, >> struct i40e_vsi_vlan_pvid_info *info); int >> i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on); >> +int i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on); >> uint64_t i40e_config_hena(uint64_t flags); uint64_t >> i40e_parse_hena(uint64_t flags); enum i40e_status_code >> i40e_fdir_setup_tx_resources(struct i40e_pf *pf); >> -- >> 2.1.4 -- Julien MEUNIER 6WIND
[dpdk-dev] where to find ethernet CRC when stripping is off
Hi all, I need to get access to the Ethernet CRC of received packets. To do this, I'm configuring: port_conf.rxmode.hw_strip_crc = 0; Now my question is: how am I supposed to access the Ethernet CRC from a DPDK mbuf? Is the CRC just the 4 final bytes of the packets? Is this correct: uint32_t crc = rte_pktmbuf_mtod_offset (mymbuf, uint32_t*, mymbuf->pkt_len) ; ? Thanks, Francesco Montorsi
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
Hi Harry, On Wed, Jan 20, 2016 at 9:45 AM, Van Haaren, Harry wrote: > Hi Kyle, > > In theory we could create a new API for this, but I think the current xstats > API is a good fit for exposing this info, so why create extra APIs? As a > client of the DPDK API, I would prefer more statistics in a single API than > have to research and implement two or more APIs to retrieve the information > to monitor. > You create new APIs for many reasons: modularity, simplicitly within the API, consistency, etc. My main concern with this proposed change relates to consistency. Previously, each stat had similar semantics. It was a number, representing the amount of times something had occurred on a chip. This fact allows you to perform operations like addition, subtraction/etc and expect that the result will be meaningful for every value in the array. For example, suppose I wrote a tool to give the "rate" for each of the stats. We could sample these stats periodically, then output the difference between the two samples divided by the time between samples for each stat. A naive implementation, but quite simple. However, if we start adding values like link speed and state, which are not really numerical, or not monotonic, you can no longer apply the same mathematical operations on them and expect them to be meaningful. For example, suppose a link went down. The "rate" for that stat would be -1. Does that really make sense? Anyone using this API would need to explicitly filter out the non-stats, or risk nonsensical output. Let's also consider how to interpret the value. When I look at a stat, there's usually one of two meanings: it's either a number of packets, or it's a number of bytes. We're now adding exceptions to that rule. Link state is a boolean. Link speed is a value in mbps. Duplex is pretty much an enum. We already have the rte_eth_link_get function. Why not let users continue to use that? It's well defined, it is simple, and it is consistent. Thanks, Kyle
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
2016-01-20 10:03, Kyle Larose: > Hi Harry, > > On Wed, Jan 20, 2016 at 9:45 AM, Van Haaren, Harry > wrote: > > Hi Kyle, > > > > > In theory we could create a new API for this, but I think the current > > xstats API is a good fit for exposing this info, so why create extra APIs? > > As a client of the DPDK API, I would prefer more statistics in a single API > > than have to research and implement two or more APIs to retrieve the > > information to monitor. > > > > You create new APIs for many reasons: modularity, simplicitly within > the API, consistency, etc. My main concern with this proposed change > relates to consistency. Previously, each stat had similar semantics. > It was a number, representing the amount of times something had > occurred on a chip. This fact allows you to perform operations like > addition, subtraction/etc and expect that the result will be > meaningful for every value in the array. > > For example, suppose I wrote a tool to give the "rate" for each of the > stats. We could sample these stats periodically, then output the > difference between the two samples divided by the time between samples > for each stat. A naive implementation, but quite simple. > > However, if we start adding values like link speed and state, which > are not really numerical, or not monotonic, you can no longer apply > the same mathematical operations on them and expect them to be > meaningful. For example, suppose a link went down. The "rate" for that > stat would be -1. Does that really make sense? Anyone using this API > would need to explicitly filter out the non-stats, or risk nonsensical > output. > > Let's also consider how to interpret the value. When I look at a stat, > there's usually one of two meanings: it's either a number of packets, > or it's a number of bytes. We're now adding exceptions to that rule. > Link state is a boolean. Link speed is a value in mbps. Duplex is > pretty much an enum. > > We already have the rte_eth_link_get function. Why not let users > continue to use that? It's well defined, it is simple, and it is > consistent. +1 Please also consider this work in progress about link speed information: http://dpdk.org/dev/patchwork/patch/7995/
[dpdk-dev] [PATCH 0/4] virtio support for container
Hello, > For this case, please use --single-file option because it creates much more > than 8 fds, which can be handled by vhost-user sendmsg(). Thanks, I'm able to verify it by sending ARP packet from container to host on arm64. But sometimes, I do see following message while running l2fwd in container(pointed by Rich). EAL: Master lcore 0 is ready (tid=8a7a3000;cpuset=[0]) EAL: lcore 1 is ready (tid=89cdf050;cpuset=[1]) Notice: odd number of ports in portmask. Lcore 0: RX port 0 Initializing port 0... PANIC in kick_all_vq(): TUNSETVNETHDRSZ failed: Inappropriate ioctl for device How it could be avoided? Thanks, Amit.
[dpdk-dev] [PATCH] rte.extvars.mk: allow overriding RTE_SDK_BIN from the environment
Hi Matthew, RTE_SDK_BIN is an internal variable and should not be overriden. 2016-01-19 21:30, Matthew Hall: > Currently pktgen-dpdk and many other external apps will fail to compile > if the build output directory name is not equal to the target name. > > This causes problems if you used an alternative build output directory. Have you installed DPDK somewhere? Example: make install O=mybuild DESTDIR=mylocalinstall Then you should build your app like this: make RTE_SDK=$(readlink -e ../dpdk/mylocalinstall/usr/local/share/dpdk)
[dpdk-dev] [PATCH v2 10/10] pci: place all uio pci device ids in a dedicated section
On Tue, Jan 19, 2016 at 01:35:14PM -0800, Stephen Hemminger wrote: > On Tue, 19 Jan 2016 15:56:14 -0500 > Neil Horman wrote: > > > On Tue, Jan 19, 2016 at 08:10:19AM -0800, Stephen Hemminger wrote: > > > On Tue, 19 Jan 2016 09:29:31 -0500 > > > Neil Horman wrote: > > > > > > > On Tue, Jan 19, 2016 at 08:30:40AM +0100, Thomas Monjalon wrote: > > > > > 2016-01-18 13:30, David Marchand: > > > > > > We could do something ? la modinfo, but let's keep it simple for > > > > > > now. > > > > > > > > > > > > With this, you can extract the devices that need to be bound to uio > > > > > > / vfio > > > > > > with tools like objdump : > > > > > > > > > > > > $ objdump -j rte_pci_id_uio -s build/lib/librte_pmd_fm10k.so > > > > > > > > > > > > Contents of section rte_pci_id_uio: > > > > > > 15760 8680a415 8680d015 > > > > > > 15770 8680a515 > > > > > > > > > > Yes we need a modinfo-like tool. > > > > > Currently, the UIO/VFIO binding can be done after parsing the PCI > > > > > device list. > > > > > It is better to define the device ids locally to their drivers but it > > > > > must > > > > > be integrated with an appropriate parsing tool at the same time. > > > > > And more importantly than any tool, the format of these ELF data must > > > > > be > > > > > properly defined, documented and extensible. > > > > > > > > > > Is there someone experimented with such format definition? > > > > > Stephen, you were asking for this change, what is your opinion? > > > > > I remember that Neil was also interested in this change: > > > > > http://dpdk.org/ml/archives/dev/2015-January/012115.html > > > > > Panu, Christian, this change could be related to distribution > > > > > packaging. > > > > > Thanks for helping to move this change forward. > > > > > > > > Yes, I would be interested in seeing this. Is the ask here that > > > > someone do it? > > > > As I recall from the last thread that you reference, I thought David M > > > > was > > > > interested in writing it and soliciting for ideas. If thats no longer > > > > the case, > > > > I can take a stab at writing it. > > > > > > > > Neil > > > > > > > > > > If these are libraries is there a way to have a real entry point > > > to dump PCI id's. > > > > > Sure, you could write a method that could be dlsym-ed easily enough to > > fetch an > > array of pci ids, or just print stuff the console. Not sure thats the best > > way, > > but definately an option > > Neil > > It is just that reading data with objdump is a kludge likely to get broken. > Not suggesting that we rely on objdump in perpituity, only that we export the data, rather than a method to access it so that it can be reached via libelf. Using a function to return the information has implicit issues at the moment (specifically if you dlopen a dpdk driver, its constructor will attempt to register it with the core libraries). While thats not catastrophic, it means more stuff than you expect gets loaded, which might have wierd side effects. Adding a separate section that you could reach via libelf would be nice I think Neil
[dpdk-dev] where to find ethernet CRC when stripping is off
On 01/20/2016 04:02 PM, Montorsi, Francesco wrote: > Hi all, > > I need to get access to the Ethernet CRC of received packets. > To do this, I'm configuring: > > port_conf.rxmode.hw_strip_crc = 0; > > Now my question is: how am I supposed to access the Ethernet CRC from a DPDK > mbuf? > Is the CRC just the 4 final bytes of the packets? > > Is this correct: > > uint32_t crc = rte_pktmbuf_mtod_offset (mymbuf, uint32_t*, > mymbuf->pkt_len) ; > > ? > > Thanks, > Francesco Montorsi > Hi Francesco, You would be right... if the PMDs did not transparently strip the CRC in software when hardware CRC stripping is disabled at port configuration (as described above). See for instance how the function ixgbe_recv_pkts_lro() in file drivers/net/ixgbe/ixgbe_rxtx.c deals with crc_len. Considering your need, I think now that PMDs should keep the CRC that are stored in received packets when hardware CRC stripping is disabled by the application, so that the application can access it as needed. Note that this would impose that the input packet processing of such DPDK applications be aware of the CRC presence (+4 in the packet length , for instance). Let's see what others, if any, that might care think about such a change into the CRC stripping semantics. Ivan -- Ivan Boule 6WIND Development Engineer
[dpdk-dev] [PATCH 0/3] Keep-alive stats and doc fixes
This patchset contains: 1. Fix variable naming consistency in sample guide 2. Set last_seen time on core when it gets registered 3. An xstats implementation for last-seen and current core status Harry van Haaren (3): doc: fix keepalive sample app guide eal: add keepalive core register timestamp keepalive: add rte_keepalive_xstats() and example doc/guides/rel_notes/release_2_3.rst| 6 +++ doc/guides/sample_app_ug/keep_alive.rst | 30 +- examples/l2fwd-keepalive/main.c | 22 -- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 7 lib/librte_eal/common/include/rte_keepalive.h | 17 +++- lib/librte_eal/common/rte_keepalive.c | 53 - lib/librte_eal/linuxapp/eal/rte_eal_version.map | 7 7 files changed, 127 insertions(+), 15 deletions(-) -- 2.5.0
[dpdk-dev] [PATCH 1/3] doc: fix keepalive sample app guide
This patch fixes some mismatches between the keepalive code and the docs. Struct names, and descriptions are not in line with the codebase. Fixes: e64833f2273a ("examples/l2fwd-keepalive: add sample application") Signed-off-by: Harry van Haaren --- doc/guides/sample_app_ug/keep_alive.rst | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst index 080811b..1478faf 100644 --- a/doc/guides/sample_app_ug/keep_alive.rst +++ b/doc/guides/sample_app_ug/keep_alive.rst @@ -1,6 +1,6 @@ .. BSD LICENSE -Copyright(c) 2015 Intel Corporation. All rights reserved. +Copyright(c) 2015-2016 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without @@ -143,17 +143,17 @@ The Keep-Alive/'Liveliness' conceptual scheme: The following sections provide some explanation of the code aspects that are specific to the Keep Alive sample application. -The heartbeat functionality is initialized with a struct -rte_heartbeat and the callback function to invoke in the +The keepalive functionality is initialized with a struct +rte_keepalive and the callback function to invoke in the case of a timeout. .. code-block:: c rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL); -if (rte_global_hbeat_info == NULL) +if (rte_global_keepalive_info == NULL) rte_exit(EXIT_FAILURE, "keepalive_create() failed"); -The function that issues the pings hbeat_dispatch_pings() +The function that issues the pings keepalive_dispatch_pings() is configured to run every check_period milliseconds. .. code-block:: c @@ -162,7 +162,8 @@ is configured to run every check_period milliseconds. (check_period * rte_get_timer_hz()) / 1000, PERIODICAL, rte_lcore_id(), -&hbeat_dispatch_pings, rte_global_keepalive_info +&rte_keepalive_dispatch_pings, +rte_global_keepalive_info ) != 0 ) rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n"); @@ -173,7 +174,7 @@ functionality and the example random failures. .. code-block:: c -rte_keepalive_mark_alive(&rte_global_hbeat_info); +rte_keepalive_mark_alive(&rte_global_keepalive_info); cur_tsc = rte_rdtsc(); /* Die randomly within 7 secs for demo purposes.. */ @@ -185,7 +186,7 @@ The rte_keepalive_mark_alive function simply sets the core state to alive. .. code-block:: c static inline void -rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg) +rte_keepalive_mark_alive(struct rte_keepalive *keepcfg) { -keepcfg->state_flags[rte_lcore_id()] = 1; +keepcfg->state_flags[rte_lcore_id()] = ALIVE; } -- 2.5.0
[dpdk-dev] [PATCH 2/3] eal: add keepalive core register timestamp
This patch sets a timestamp on each lcore when it is registered for keepalive. This causes the first values read by the monitor to show time since the core was registered, instead of the delta between 0 and the timestamp counter. Signed-off-by: Harry van Haaren --- lib/librte_eal/common/rte_keepalive.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/rte_keepalive.c b/lib/librte_eal/common/rte_keepalive.c index 736fd0f..5358322 100644 --- a/lib/librte_eal/common/rte_keepalive.c +++ b/lib/librte_eal/common/rte_keepalive.c @@ -38,6 +38,7 @@ #include #include #include +#include static void print_trace(const char *msg, struct rte_keepalive *keepcfg, int idx_core) @@ -108,6 +109,8 @@ rte_keepalive_create(rte_keepalive_failure_callback_t callback, void rte_keepalive_register_core(struct rte_keepalive *keepcfg, const int id_core) { - if (id_core < RTE_KEEPALIVE_MAXCORES) + if (id_core < RTE_KEEPALIVE_MAXCORES) { keepcfg->active_cores[id_core] = 1; + keepcfg->last_alive[id_core] = rte_rdtsc(); + } } -- 2.5.0
[dpdk-dev] [PATCH 3/3] keepalive: add rte_keepalive_xstats() and example
This patch adds a function that exposes keepalive statistics re-using the existing rte_eth_xstats struct. The function provides the client API the opportunity to read last-seen and status of each core. Signed-off-by: Harry van Haaren --- doc/guides/rel_notes/release_2_3.rst| 6 doc/guides/sample_app_ug/keep_alive.rst | 11 ++ examples/l2fwd-keepalive/main.c | 22 ++-- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 7 lib/librte_eal/common/include/rte_keepalive.h | 17 - lib/librte_eal/common/rte_keepalive.c | 48 - lib/librte_eal/linuxapp/eal/rte_eal_version.map | 7 7 files changed, 113 insertions(+), 5 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..9e33aa2 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,12 @@ DPDK Release 2.3 New Features +* **Keep Alive xstats** + + A function ``rte_keepalive_xstats()`` has been added to the + keepalive header, allowing the retrieval of keepalive statistics + such as last-alive-time and the status of each core registered + for monitoring. The API reflects that of the existing xstats API. Resolved Issues --- diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst index 1478faf..839e29c 100644 --- a/doc/guides/sample_app_ug/keep_alive.rst +++ b/doc/guides/sample_app_ug/keep_alive.rst @@ -190,3 +190,14 @@ The rte_keepalive_mark_alive function simply sets the core state to alive. { keepcfg->state_flags[rte_lcore_id()] = ALIVE; } + +Keepalive exposes its statistics using an API very similar to the xstats API. +This allows client code to call the function and retrieve the current status +of keepalive, providing information like last-alive time and status per-core +that has keepalive enabled. + +.. code-block:: c + +nstats = rte_keepalive_xstats(rte_global_keepalive_info, xstats, nstats); +for (i = 0; i < nstats; i++) +printf("%s : %lu\n", xstats[i].name, xstats[i].value); diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c index f4d52f2..a8f2ba4 100644 --- a/examples/l2fwd-keepalive/main.c +++ b/examples/l2fwd-keepalive/main.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2016 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -66,6 +66,7 @@ #include #include #include +#include #include #include #include @@ -139,7 +140,7 @@ struct l2fwd_port_statistics port_statistics[RTE_MAX_ETHPORTS]; /* A tsc-based timer responsible for triggering statistics printout */ #define TIMER_MILLISECOND 1 #define MAX_TIMER_PERIOD 86400 /* 1 day max */ -static int64_t timer_period = 10 * TIMER_MILLISECOND * 1000; /* 10 seconds */ +static int64_t timer_period = 1 * TIMER_MILLISECOND * 1000; /* 1 second */ static int64_t check_period = 5; /* default check cycle is 5ms */ /* Keepalive structure */ @@ -189,7 +190,22 @@ print_stats(__attribute__((unused)) struct rte_timer *ptr_timer, total_packets_tx, total_packets_rx, total_packets_dropped); - printf("\n\n"); + printf("\nKeep Alive xstats ==\n"); + + /* Keepalive Xstats */ + unsigned nstats = rte_keepalive_xstats(rte_global_keepalive_info, 0, 0); + struct rte_eth_xstats *xstats = rte_zmalloc( "RTE_KEEPALIVE_XSTATS", + sizeof( struct rte_eth_xstats) * nstats, + RTE_CACHE_LINE_SIZE); + + nstats = rte_keepalive_xstats(rte_global_keepalive_info, xstats, + nstats); + unsigned i; + for (i = 0; i < nstats; i++) + printf("%s\t%lu\n", xstats[i].name, + xstats[i].value); + printf("\n"); + rte_free(xstats); } /* Send the burst of packets on an output interface */ diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 9d7adf1..f5e16a7 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -135,3 +135,10 @@ DPDK_2.2 { rte_xen_dom0_supported; } DPDK_2.1; + +DPDK_2.3 { + global: + + rte_keepalive_xstats; + +} DPDK_2.2; diff --git a/lib/librte_eal/common/include/rte_keepalive.h b/lib/librte_eal/common/include/rte_keepalive.h index 02472c0..352dd17 100644 --- a/lib/librte_eal/common/include/rte_keepalive.h +++ b/lib/librte_eal/common/include/rte_kee
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > 2016-01-20 10:03, Kyle Larose: > > We already have the rte_eth_link_get function. Why not let users > > continue to use that? It's well defined, it is simple, and it is > > consistent. > > +1 Ok, no problem. I'll mark the link-status patch rejected in Patchwork. I've just sent the Keepalive patchset, patch #3 is of interest regarding this discussion: http://dpdk.org/dev/patchwork/patch/10003/ It adds a function to the API for collecting xstats, meaning it doesn't pollute the rte_eth_xstats_get() output. I'm interested to hear the communities view of this approach. Regards, -Harry
[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging
On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall" wrote: Hi Matthew, I have some comments below, but will address your full email later when I have a bit more time. >Hello, > >Since the pktgen code is reindented I am finding time to read through it >and experiment and see if I can get it working. > >I have issues with the init process of pktgen. It is difficult to debug >it because the init code does a lot of very scary stuff to the terminal >control / TTY device at inconvenient times in an inconvenient order, and >in the process damages the debug output and damages the screen of your >GDB without doing weird things to run GDB on a different TTY. > >Of course I am willing to contribute patches and not just complain, but >first I need some help to follow what is going on. > >Here is the problematic call-flow with some explanation what went wrong >trying it on some community machines outside of its original environment: > >1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); >which dumps tons of weird boilerplate of licenses, copyrights, code >creator, etc. > >It is open source and everybody that matters already knows who coded it, >so is this stuff really that important? This gets in the way when you >are trying to work on it and I just have to comment it out. > >2) it calls wr_scrn_setw and tinkers with the windows size very early in >the init which can make your terminal weird > >3) it calls rte_eal_init which produces a lot of nice debug output, >which is fine > >4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls >wr_scrn_erase which destroys the valuable debug output just created in >(c) which is a bad thing > >5) it calls wr_print_copyright and dumps more boilerplate I am not sure >is needed > >6) it logs some helpful messages about the port / descriptor settings >which is fine > >7) it calls the pktgen_config_ports function which can crash in ways you >need the destroyed debug output to fix. > >For example in my case that function crashes here: > > if (pktgen.nb_ports == 0) > pktgen_log_panic("*** Did not find any ports to use ***"); This problem is DPDK did not find any ports to use for Pktgen. Please check to make sure you have the right ports attached to gib_uio and they are usable by DPDK. > >8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). >Is this stuff really needed? This is a ton of output for just starting >up some test program. > >To fix this debug problem I propose some changes which I am happy to >help develop: > >1) decide what of this output we really need here and greatly simplify >how much gets printed out > >2) move wr_scrn_setw right before pktgen_init_screen and after >rte_eal_init to prevent damaging that output > >3) consider how wr_scrn_init is called in pktgen_init_screen, because it >calls wr_scrn_erase which damages output > >4) I think that pktgen_config_ports should be called before all this >weird screen init stuff, so that if it fails you can actually see what >happened there. > >One other random topic... on the long lines of code it looks like there >are some gigantic tab-indents pushing things off to the right still. One >example, maybe there are others or another setting which is needed to >fix all of these: Please use tab stops of 4 instead of 8. > > info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, >(sizeof(pkt_seq_t) * NUM_TOTAL_PKTS), > > RTE_CACHE_LINE_SIZE, >rte_socket_id()); > >Thoughts? >Matthew Hall > Regards, Keith
[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging
On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall" wrote: >Hello, > >Since the pktgen code is reindented I am finding time to read through it >and experiment and see if I can get it working. > >I have issues with the init process of pktgen. It is difficult to debug >it because the init code does a lot of very scary stuff to the terminal >control / TTY device at inconvenient times in an inconvenient order, and >in the process damages the debug output and damages the screen of your >GDB without doing weird things to run GDB on a different TTY. > >Of course I am willing to contribute patches and not just complain, but >first I need some help to follow what is going on. > >Here is the problematic call-flow with some explanation what went wrong >trying it on some community machines outside of its original environment: > >1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); >which dumps tons of weird boilerplate of licenses, copyrights, code >creator, etc. > >It is open source and everybody that matters already knows who coded it, >so is this stuff really that important? This gets in the way when you >are trying to work on it and I just have to comment it out. One problem is a number of people wanted to steal the code and use in a paid application, so the copyright is some what a requirement. As you may know I do a lot of debugging on Pktgen and I feel they are a nuisance. I can try to see if we can clean up these messages, but do not hold your breath on getting them to be removed. > >2) it calls wr_scrn_setw and tinkers with the windows size very early in >the init which can make your terminal weird > >3) it calls rte_eal_init which produces a lot of nice debug output, >which is fine IMO most of the information from DPDK is not very useful as why do I need to see every lcore line, plus a lot of more useless information. Most of the information could be reduced a couple of lines or only report issues not just a bunch of useless information. > >4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls >wr_scrn_erase which destroys the valuable debug output just created in >(c) which is a bad thing The screen init should be scrolling the information off the screen to preserve that info, unless it was changed by mistake. > >5) it calls wr_print_copyright and dumps more boilerplate I am not sure >is needed > >6) it logs some helpful messages about the port / descriptor settings >which is fine > >7) it calls the pktgen_config_ports function which can crash in ways you >need the destroyed debug output to fix. > >For example in my case that function crashes here: > > if (pktgen.nb_ports == 0) > pktgen_log_panic("*** Did not find any ports to use ***"); > >8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). >Is this stuff really needed? This is a ton of output for just starting >up some test program. > >To fix this debug problem I propose some changes which I am happy to >help develop: > >1) decide what of this output we really need here and greatly simplify >how much gets printed out > >2) move wr_scrn_setw right before pktgen_init_screen and after >rte_eal_init to prevent damaging that output > >3) consider how wr_scrn_init is called in pktgen_init_screen, because it >calls wr_scrn_erase which damages output Again it could be scrolling that information off the screen, just need a large screen scroll buffer. > >4) I think that pktgen_config_ports should be called before all this >weird screen init stuff, so that if it fails you can actually see what >happened there. > >One other random topic... on the long lines of code it looks like there >are some gigantic tab-indents pushing things off to the right still. One >example, maybe there are others or another setting which is needed to >fix all of these: Please use tab stop of 4 instead of 8. IMO tab stop of 8 is so 1970?s and we should not need tab stop of 8 as any system today will work. :-) > > info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, >(sizeof(pkt_seq_t) * NUM_TOTAL_PKTS), > > RTE_CACHE_LINE_SIZE, >rte_socket_id()); > >Thoughts? >Matthew Hall Improvement to Pktgen is always welcome and the copyright info is going to be a bit hard to remove as that was one of the requirements when I open sourced the code. I understand it maybe a bit of output. I do not think it is really a user issue causing users to stop using it as startup is only down once, in my case I may start Pktgen a few times a day for development and it does not seem to slow me down much. :-) > Regards, Keith
[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging
On 1/20/16, 10:26 AM, "dev on behalf of Wiles, Keith" wrote: >On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall" on behalf of mhall at mhcomputing.net> wrote: > >>Hello, Please try modifying pktgen-main.c:main() at the top of the function to this: wr_scrn_setw(1);/* Reset the window size, from possible crash run. */ wr_scrn_pos(100, 1);/* Move the cursor to the bottom of the screen again */ printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); fflush(stdout); /* call before the rte_eal_init() */ (void)rte_set_application_usage_hook(pktgen_usage); Maybe this will fix up most of your issues with DPDK output. I normally set the log-level to 7 to remove most of the DPDK messages. >> >>Since the pktgen code is reindented I am finding time to read through it >>and experiment and see if I can get it working. >> >>I have issues with the init process of pktgen. It is difficult to debug >>it because the init code does a lot of very scary stuff to the terminal >>control / TTY device at inconvenient times in an inconvenient order, and >>in the process damages the debug output and damages the screen of your >>GDB without doing weird things to run GDB on a different TTY. >> >>Of course I am willing to contribute patches and not just complain, but >>first I need some help to follow what is going on. >> >>Here is the problematic call-flow with some explanation what went wrong >>trying it on some community machines outside of its original environment: >> >>1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by()); >>which dumps tons of weird boilerplate of licenses, copyrights, code >>creator, etc. >> >>It is open source and everybody that matters already knows who coded it, >>so is this stuff really that important? This gets in the way when you >>are trying to work on it and I just have to comment it out. > >One problem is a number of people wanted to steal the code and use in a paid >application, so the copyright is some what a requirement. As you may know I do >a lot of debugging on Pktgen and I feel they are a nuisance. I can try to see >if we can clean up these messages, but do not hold your breath on getting them >to be removed. >> >>2) it calls wr_scrn_setw and tinkers with the windows size very early in >>the init which can make your terminal weird >> >>3) it calls rte_eal_init which produces a lot of nice debug output, >>which is fine > >IMO most of the information from DPDK is not very useful as why do I need to >see every lcore line, plus a lot of more useless information. Most of the >information could be reduced a couple of lines or only report issues not just >a bunch of useless information. >> >>4) it calls pktgen_init_screen, which calls wr_scrn_init, which calls >>wr_scrn_erase which destroys the valuable debug output just created in >>(c) which is a bad thing > >The screen init should be scrolling the information off the screen to preserve >that info, unless it was changed by mistake. >> >>5) it calls wr_print_copyright and dumps more boilerplate I am not sure >>is needed >> >>6) it logs some helpful messages about the port / descriptor settings >>which is fine >> >>7) it calls the pktgen_config_ports function which can crash in ways you >>need the destroyed debug output to fix. >> >>For example in my case that function crashes here: >> >> if (pktgen.nb_ports == 0) >> pktgen_log_panic("*** Did not find any ports to use ***"); >> >>8) Later it makes a logo and a splash screen (wr_log, wr_splash_screen). >>Is this stuff really needed? This is a ton of output for just starting >>up some test program. >> >>To fix this debug problem I propose some changes which I am happy to >>help develop: >> >>1) decide what of this output we really need here and greatly simplify >>how much gets printed out >> >>2) move wr_scrn_setw right before pktgen_init_screen and after >>rte_eal_init to prevent damaging that output >> >>3) consider how wr_scrn_init is called in pktgen_init_screen, because it >>calls wr_scrn_erase which damages output > >Again it could be scrolling that information off the screen, just need a large >screen scroll buffer. >> >>4) I think that pktgen_config_ports should be called before all this >>weird screen init stuff, so that if it fails you can actually see what >>happened there. >> >>One other random topic... on the long lines of code it looks like there >>are some gigantic tab-indents pushing things off to the right still. One >>example, maybe there are others or another setting which is needed to >>fix all of these: > >Please use tab stop of 4 instead of 8. IMO tab stop of 8 is so 1970?s and we >should not need tab stop of 8 as any system today will work. :-) >> >> info->seq_pkt = (pkt_seq_t *)rte_zmalloc_socket(buff, >>(sizeof(pkt_seq_t) * NUM_TOTAL_PKTS), >> >> RTE_CACHE_LINE_SIZE, >>rte_socket_id()); >> >
[dpdk-dev] where to find ethernet CRC when stripping is off
Hi Ivan, > -Original Message- > You would be right... if the PMDs did not transparently strip the CRC in > software when hardware CRC stripping is disabled at port configuration (as > described above). > See for instance how the function ixgbe_recv_pkts_lro() in file > drivers/net/ixgbe/ixgbe_rxtx.c deals with crc_len. Yeah, I see. However, I wonder what's the utility of the hw_strip_crc feature if finally it is completely masked to the mbuf user. However, to my understanding, looking at that ixgbe code, I think that what I wrote before: uint32_t crc = *(rte_pktmbuf_mtod_offset (mymbuf, uint32_t*, mymbuf->pkt_len)) ; should work, since the pkt_len and data_len has the "crc_len" removed, but the CRC itself should be there. I know it is kind of an hack, but at least for ixgbe that sounds like a possible (temporary) solution for me > Considering your need, I think now that PMDs should keep the CRC that are > stored in received packets when hardware CRC stripping is disabled by the > application, so that the application can access it as needed. > Yes, that would be very useful. > Note that this would impose that the input packet processing of such DPDK > applications be aware of the CRC presence (+4 in the packet length , for > instance). Or perhaps, to maintain backward compatibility, just a flag inside the mbuf could be set that informs the user that at the end of the mbuf packet, you can find 4 bytes with the CRC. > > Let's see what others, if any, that might care think about such a change into > the CRC stripping semantics. Thanks! Francesco
[dpdk-dev] Future Direction for rte_eth_stats_get()
I see that some of the rte_eth_stats have been marked deprecated in 2.2 that are returned by rte_eth_stats_get(). Applications that utilize any number of device types rely on functions like this one to debug I/O issues. Is there a reason the stats have been deprecated? Why not keep the stats in line with the standard linux practices such as rtnl_link_stats64? Note, using rte_eth_xstats_get() does not help for this particular scenario because a common binary API is needed to communicate through various layers and also provide a consistent view/meaning to users. The xstats is excellent for debugging device specific scenarios but can't help in scenarios where a static view is expected. Thanks, Dave
[dpdk-dev] [PATCH] ip_pipeline: fix cpu socket-id error
On Wed, 20 Jan 2016 11:01:17 + Jasvinder Singh wrote: > +static inline int > +app_get_cpu_socket_id(uint32_t pmd_id) > +{ > + int status = rte_eth_dev_socket_id(pmd_id); > + > + if (status == -1) > + return 0; > + > + return status; > + Why not: return (status != SOCKET_ID_ANY) ? status : 0;
[dpdk-dev] [PATCH] ethdev: expose link status and speed using xstats
On Wed, 20 Jan 2016 16:13:34 +0100 Thomas Monjalon wrote: > > We already have the rte_eth_link_get function. Why not let users > > continue to use that? It's well defined, it is simple, and it is > > consistent. +1 API's should not duplicate results (DRY) That said, it would be useful to have some way to get statistics on the number of link transitions and time since last change. But this ideally should be in rte_eth_link_get() but that wouldn't be ABI compatiable.
[dpdk-dev] [PATCH 2/4] i40e: split function for input set change of hash and fdir
Hi Jingjing, As I can see this patch not only splits fdir functionality from common fdir/hash code but also removes compatibility with DPDK 2.2 as it deletes I40E_INSET_FLEX_PAYLOAD from valid fdir input set values. Yes, flexible payload configuration can be set for fdir separately at the port initialization, but this is more legacy from the previous generations of NICs which did not support dynamic input set configuration. I believe it would better to have I40E_INSET_FLEX_PAYLOAD valid for fdir input set same as in DPDK 2.2. So in legacy mode, when application has to run on an old NIC and on a new one, only legacy configuration would be used, but for applications targeting new HW single point of configuration would be used instead of mix of two. Regards, Andrey > -Original Message- > From: Wu, Jingjing > Sent: Friday, December 25, 2015 8:30 AM > To: dev at dpdk.org > Cc: Wu, Jingjing; Zhang, Helin; Chilikin, Andrey; Pei, Yulong > Subject: [PATCH 2/4] i40e: split function for input set change of hash and > fdir > > This patch splited function for input set change of hash and fdir, and added a > new function to set the input set to default when initialization. > > Signed-off-by: Jingjing Wu > --- > drivers/net/i40e/i40e_ethdev.c | 330 > + > drivers/net/i40e/i40e_ethdev.h | 11 +- > drivers/net/i40e/i40e_fdir.c | 5 +- > 3 files changed, 180 insertions(+), 166 deletions(-) > > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c > index bf6220d..b919aac 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -262,7 +262,8 @@ > #define I40E_REG_INSET_FLEX_PAYLOAD_WORD7 > 0x0080ULL > /* 8th word of flex payload */ > #define I40E_REG_INSET_FLEX_PAYLOAD_WORD8 > 0x0040ULL > - > +/* all 8 words flex payload */ > +#define I40E_REG_INSET_FLEX_PAYLOAD_WORDS > 0x3FC0ULL > #define I40E_REG_INSET_MASK_DEFAULT 0xULL > > #define I40E_TRANSLATE_INSET 0 > @@ -373,6 +374,7 @@ static int i40e_dev_udp_tunnel_add(struct rte_eth_dev > *dev, > struct rte_eth_udp_tunnel *udp_tunnel); static > int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev, > struct rte_eth_udp_tunnel *udp_tunnel); > +static void i40e_filter_input_set_init(struct i40e_pf *pf); > static int i40e_ethertype_filter_set(struct i40e_pf *pf, > struct rte_eth_ethertype_filter *filter, > bool add); > @@ -787,6 +789,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) >* It should be removed once issues are fixed in NVM. >*/ > i40e_flex_payload_reg_init(hw); > + /* Initialize the input set for filters (hash and fd) to default value > */ > + i40e_filter_input_set_init(pf); > > /* Initialize the parameters for adminq */ > i40e_init_adminq_parameter(hw); > @@ -6545,43 +6549,32 @@ i40e_get_valid_input_set(enum i40e_filter_pctype > pctype, >*/ > static const uint64_t valid_fdir_inset_table[] = { > [I40E_FILTER_PCTYPE_FRAG_IPV4] = > - I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST | > - I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST, > [I40E_FILTER_PCTYPE_NONF_IPV4_UDP] = > I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST | > - I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT | > - I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT, > [I40E_FILTER_PCTYPE_NONF_IPV4_TCP] = > - I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST | > - I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT | > - I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST, > [I40E_FILTER_PCTYPE_NONF_IPV4_SCTP] = > I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST | > I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT | > - I40E_INSET_SCTP_VT | I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_SCTP_VT, > [I40E_FILTER_PCTYPE_NONF_IPV4_OTHER] = > - I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST | > - I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_IPV4_SRC | I40E_INSET_IPV4_DST, > [I40E_FILTER_PCTYPE_FRAG_IPV6] = > - I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST | > - I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST, > [I40E_FILTER_PCTYPE_NONF_IPV6_UDP] = > - I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST | > - I40E_INSET_SRC_PORT | I40E_INSET_DST_PORT | > - I40E_INSET_FLEX_PAYLOAD, > + I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST, > [I40E_FILTER_PCTYPE_NONF_IPV6_TCP] = > - I40E_INSET_IPV6_SRC | I40E_INSET_IPV6_DST | > - I40E_INS
[dpdk-dev] Problem with Intel i40e XL710 dpdk driver
I found DMAR errors while bringing up other ports except port 0. So rebooting the kernel with intel_iommu=off fixes it and dpdk i40e initializes fine for all ports. For some reason, the intel qcu64e mode change utility doesn't work if I want to change back the mode from 4x10 to 2x40. Tried several times but never reverts back to 2x40 from 4x10 though that is a different issue. Regards, -Karthick On Thu, Jan 14, 2016 at 6:26 PM, Karthick, A.R. wrote: > Hi, > I am seeing a "Failed to init adminq: -54" or admin queue timeouts > while initializing the admin queue for i40e xl710 intel nic. > (Intel server is a E5-2670) > > First things first. > I am running the latest firmware. > The kernel module is not loaded and yes, it works with the i40e kernel > driver. (latest or otherwise) > And this problem comes even with dpdk 2.0/2.1 or the latest stable. So > there's that. > > I have done a bunch of debugging and here are my findings. > With the card configured in 2x40g or 4x10g mode, it _ALWAYS_ works with > successfully initializing pci function 0 or port 0. > It always fails to subsequently initialize the rest. > Even if unbind the igb uio for port 0 and bind only port 1 or port 2,3,4 > in 4x10g mode, > it fails. > > Since it works with the kernel driver, I tried to see if there were > differences in the way registers are setup for i40e driver in kernel and > dpdk. > They look mostly to be the same but obviously there were subtle > differences. > From what I could fathom, I couldn't see much and whatever little was > caught, I tried to keep the dpdk code in sync and it still failed. > > While stepping through gdb all the way from eal pci to pci uio map to > eth_i40e_dev_init, > to the failure in obtaining the firmware revision for port1 during > i40e_init_adminq, > I did confirm that the memory map was right for the pci. > > So the hw->hw_addr looks correct for port 1 correlating it to the uio1 > map or the physical address from lspci or kernel driver when using the > kernel driver which works. > > However the admin queue seems to be not processing any request for port 1. > Note that port 0 always works and its the same code for others with a > different eal dev/hw instance. > > But for other ports like port1, after correctly setting up the adminq > registers and memory map, > it always fails to obtain the firmware revision since the i40e_asq_done > is returning 0 for the > head register at 0x80300 and doesn't match the next_in_use when starting > at 1. > So it always returns pending or false in i40e_asq_done which is retried a > certain times after resetting the aq by i40e_init_adminq but ultimately > gives up. > > Thoughts and wondering if you guys have seen this and have a fix or patch > that is not in upstream yet. > > Failure enclosed below as mentioned above in detail: (with a 4x10g mode > for the card but same failure with 2x40g mode as well. No difference. Port > 0 always succeeds but subsequent ports fail. > And same result even with port 0 not bound and starting with the > initialization of port 2,3,4 which always fails. > > EAL: lcore 1 is ready (tid=6bd30700;cpuset=[1]) > EAL: PCI device :01:00.0 on NUMA socket 0 > EAL: probe driver: 8086:1521 rte_igb_pmd > EAL: Not managed by a supported kernel driver, skipped > EAL: PCI device :01:00.1 on NUMA socket 0 > EAL: probe driver: 8086:1521 rte_igb_pmd > EAL: Not managed by a supported kernel driver, skipped > EAL: PCI device :83:00.0 on NUMA socket 1 > EAL: probe driver: 8086:1583 rte_i40e_pmd > EAL: PCI memory mapped at 0x7f2f8000 > EAL: PCI memory mapped at 0x7f2f8080 > PMD: eth_i40e_dev_init(): FW 4.40 API 1.4 NVM 04.05.03 eetrack 80001dca > PMD: i40e_pf_parameter_init(): Max supported VSIs:34 > PMD: i40e_pf_parameter_init(): PF queue pairs:64 > PMD: i40e_pf_parameter_init(): Max VMDQ VSI num:34 > PMD: i40e_pf_parameter_init(): VMDQ queue pairs:4 > EAL: PCI device :83:00.1 on NUMA socket 1 > EAL: probe driver: 8086:1583 rte_i40e_pmd > EAL: PCI memory mapped at 0x7f2f80808000 > EAL: PCI memory mapped at 0x7f2f81008000 > PMD: eth_i40e_dev_init(): Failed to init adminq: -54 > EAL: Error - exiting with code: 1 > Cause: Requested device :83:00.1 cannot be used > > Regards, > -Karthick > > > >
[dpdk-dev] L3 Forwarding performance of DPDK on virtio
I am running dpdk within a virtual guest as a L3 forwarder. The VM has two ports connecting to two linux bridges (in turn connecting two physical ports). DPDK is used to forward between these two ports (one port connected to traffic generator and the other connected to sink). I used iperf to test the throughput. If the VM/DPDK is running on passthrough, it can achieve around 10G end-to-end (from traffic generator to sink) throughput. However if the VM/DPDK is running on virtio (virtio-net-pmd), it achieves just 150M throughput, which is a huge degrade. On the virtio, I also measured the throughput between the traffic generator and its connected port on VM, as well as throughput between the sink and it's VM port. Both legs show around 7.5G throughput. So I guess forwarding within the VM (from one port to the other) would be a big killer of the performance. Any suggestion on how I can root cause the poor performance issue, or any idea on performance tuning techniques for virtio? thanks a lot!
[dpdk-dev] Status of Linux Foundation
Hello, I was just reading the following blog post about downscaling community involvement at the Linux Foundation. http://mjg59.dreamwidth.org/39546.html I wondered if any of issues discussed there this might be relevant for the governance efforts moving forward on DPDK? Sincerely, Matthew.
[dpdk-dev] L3 Forwarding performance of DPDK on virtio
Sorry. It's L2 forwarding. I used testpmd with forwarding mode, like testpmd --pci-blacklist :00:05.0 -c f -n 4 -- --portmask 3 -i --total-num-mbufs=2 --nb-cores=3 --mbcache=512 --burst=512 --forward-mode=mac --eth-peer=0,90:e2:ba:9f:95:94 --eth-peer=1,90:e2:ba:9f:95:95 On Wed, Jan 20, 2016 at 5:25 PM, Tan, Jianfeng wrote: > > Hello! > > > On 1/21/2016 7:51 AM, Clarylin L wrote: > >> I am running dpdk within a virtual guest as a L3 forwarder. >> >> >> The VM has two ports connecting to two linux bridges (in turn connecting >> two physical ports). DPDK is used to forward between these two ports (one >> port connected to traffic generator and the other connected to sink). I >> used iperf to test the throughput. >> >> >> If the VM/DPDK is running on passthrough, it can achieve around 10G >> end-to-end (from traffic generator to sink) throughput. However if the >> VM/DPDK is running on virtio (virtio-net-pmd), it achieves just 150M >> throughput, which is a huge degrade. >> >> >> On the virtio, I also measured the throughput between the traffic >> generator >> and its connected port on VM, as well as throughput between the sink and >> it's VM port. Both legs show around 7.5G throughput. So I guess forwarding >> within the VM (from one port to the other) would be a big killer of the >> performance. >> >> >> Any suggestion on how I can root cause the poor performance issue, or any >> idea on performance tuning techniques for virtio? thanks a lot! >> > > The L3 forwarder, you mentioned, is the l3fwd example in DPDK? If so, I > doubt it can work well with virtio, see another thread "Add API to get > packet type info". > > Thanks, > Jianfeng >
[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging
On 1/20/16 8:26 AM, Wiles, Keith wrote: > One problem is a number of people wanted to steal the code and use in a paid > application, so the copyright is some what a requirement. As you may know I > do a lot of debugging on Pktgen and I feel they are a nuisance. I can try to > see if we can clean up these messages, but do not hold your breath on getting > them to be removed. Understood, I am just providing some usability feedback from the community. Any cleanup, however partial it may be for other reasons, will personally aid me in simplicity of debugging and using the pktgen to find performance improvements in other community applications and DPDK itself, which is my true end goal here. In particular I need it for all the changes I posted at various points for librte_lpm so I can test all this stuff to make sure it really works. > IMO most of the information from DPDK is not very useful as why do I need to > see every lcore line, plus a lot of more useless information. Most of the > information could be reduced a couple of lines or only report issues not just > a bunch of useless information. DPDK's messages might not be helpful for you, but in my case, the temporary hostile modifications I made based on the writeup sent previously, in order to make these messages visible again, is what allowed me to find and fix the root causes of my inactive port issues, because I have been working with DPDK's messages since 2011 and am very familiar with what they mean inside DPDK itself, so they were the only UI of Pktgen familiar to me at all compared to the rest which is custom stuff I didn't use before. > The screen init should be scrolling the information off the screen to > preserve that info, unless it was changed by mistake. I found a lot of info is being overwritten or lost due to the complex sequence of all these calls. This is what led to my email of questions for you. > Please use tab stop of 4 instead of 8. IMO tab stop of 8 is so 1970?s and we > should not need tab stop of 8 as any system today will work. :-) OK. But do note that this convention is different from every other project I've coded on before. Sincerely, Matthew.
[dpdk-dev] [PATCH] rte.extvars.mk: allow overriding RTE_SDK_BIN from the environment
On 1/20/16 7:27 AM, Thomas Monjalon wrote: > Hi Matthew, > > RTE_SDK_BIN is an internal variable and should not be overriden. > > Have you installed DPDK somewhere? Example: > make install O=mybuild DESTDIR=mylocalinstall > > Then you should build your app like this: > make RTE_SDK=$(readlink -e ../dpdk/mylocalinstall/usr/local/share/dpdk) Hello Thomas, Is the way the make install target really works documented somewhere? This target did not exist when I first used DPDK in 2011, and since then I saw various documentation on building DPDK in various places, but not that much explanation what make install actually does. I recall various list threads about changing its behavior as well. For example, if I look at this apparently most official document: http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html It has build examples such as: make install T=x86_64-native-linuxapp-gcc But it does not discuss "O=" or "DESTDIR=" or any other additional options. From some experiments on my machine, it looks like maybe I could do this: make install "T=${RTE_TARGET}" "O=build" "DESTDIR=build" Is that a valid possibility, to keep it all in one easy directory? Thanks, Matthew.
[dpdk-dev] [PKTGEN] [PATCH 1/2] usage_pktgen.rst: multiple instances: clean up section intro
Signed-off-by: Matthew Hall --- docs/source/usage_pktgen.rst | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/source/usage_pktgen.rst b/docs/source/usage_pktgen.rst index 20bd314..efe8aa4 100644 --- a/docs/source/usage_pktgen.rst +++ b/docs/source/usage_pktgen.rst @@ -103,15 +103,15 @@ Multiple Instances of Pktgen or other application = One possible solution I use and if you have enough ports available to use. -Lets say you need two ports for your application, but you have 4 ports in -your system. I physically loop back the cables to have port 0 connect to -port 2 and port 1 connected to port 3. Now I can give two ports to my -application and two ports to Pktgen. - -Setup if pktgen and your application you have to startup each one a bit -differently to make sure they share the resources like memory and the -ports. I will use two Pktgen running on the same machine, which just means -you have to setup your application as one of the applications. +Let's say you need two ports for your application, but you have 4 ports in +your system. I physically loop back the cables to have port 0 connect to port +2 and port 1 connected to port 3. Now I can give two ports to my application +and two ports to Pktgen. + +If you are running pktgen and your application together, you have to start up +each one a bit differently to make sure they share the resources like memory +and the ports. I will use two Pktgens running on the same machine, which just +means you have imagine your application as one of the applications. In my machine I have 8 10G ports and 72 lcores between 2 sockets. Plus I have 1024 hugepages per socket for a total of 2048. -- 2.5.0
[dpdk-dev] [PKTGEN] [PATCH 2/2] usage_pktgen.rst: multiple instances: clarify EAL options needed
Signed-off-by: Matthew Hall --- docs/source/usage_pktgen.rst | 15 +++ 1 file changed, 15 insertions(+) diff --git a/docs/source/usage_pktgen.rst b/docs/source/usage_pktgen.rst index efe8aa4..223d033 100644 --- a/docs/source/usage_pktgen.rst +++ b/docs/source/usage_pktgen.rst @@ -157,4 +157,19 @@ The -m option then assigns lcores to the ports. The information from above is taken from two new files pktgen-master.sh and pktgen-slave.sh, have a look at them and adjust as you need. +The following DPDK / EAL options must be configured correctly as well: + +* ``-l lcore_id_list``: non-conflicting list of lcores for each app + +* ``--master-lcore lcore_id``: non-conflicting master lcore for each app + +* ``-m hugepage_mb / --socket-mem hugepage_mb_list``: non-conflicting amount +of hugepage memory for each app, or for each app on each CPU socket + +* ``--no-shconf``: prevents DPDK from claiming a lockfile that breaks +concurrent use of multiple apps + +* ``--file-prefix``: assigns a unique name to the hugepage mmap() files for +each app + Pktgen can also be configured using the :ref:`commands`. -- 2.5.0
[dpdk-dev] [PKTGEN] additional terminal IO question
If I try using pktgen theme mode (-T) or unmodified, without commenting out some of the stuff I mentioned I disabled for debugging in the previous thread, it seems like it sets the pktgen prompt to be invisible (black text on black??? or I'm not sure just want) on my TTY which has a black background. If you quit the app it does not reset the colors so my shell is also invisible, until I blindly run the reset command. Did anybody else try it on a black background? Did anybody else see these issues with it as well? Matthew.
[dpdk-dev] [PKTGEN] additional terminal IO question
On 1/20/16 10:00 PM, Arnon Warshavsky wrote: > Black background gets me to the blind reset as well. > Pktgen is the only tab I keep with non black background.. Thanks for confirming. Never had this many termio issues before so I was wondering if I just went totally crazy! Matthew.
[dpdk-dev] [PKTGEN] dumb question: how to start packet TX and set the payload
Hello, I was trying to just use the default PKT file, test/set_seq.pkt, like so: sudo "./app/app/${RTE_TARGET}/pktgen" \ -l 2,3 \ --master-lcore 2 \ -n 2 \ -m 1024 \ -w 0a:00.1 \ --no-shconf \ --file-prefix pktgen \ -- \ -P \ -m 2.0 \ -f test/set_seq.pkt After pktgen loaded, the port 0 is marked as UP. So I typed "start all" and also tried "str". Sadly, so far, it seems like I could not get this to actually begin sending any packets. At least, no counters are incrementing in the pktgen UI. So I wasn't sure how to make sure it is really sending or not. The documentation talked about many different commands available, but it didn't specifically say how to start transmitting the packets based on the content of your *.pkt script file. I'm just trying to figure out what I messed up so that I can write (another) doc patch besides the one I just sent a moment ago. I was also curious about putting some specific payloads into the packets in pktgen. There are many ways of configuring the packet size, but it doesn't talk about how and where to set the packet content. This is important for my app as its performance will go up and down depending on if the L4-L7 data has "interesting" content inside or not. Sincerely, Matthew.
[dpdk-dev] [PATCH] eal: add function to check if primary proc alive
On 1/20/16 10:14 PM, Qiu, Michael wrote: > As we could start up many primaries, how does your secondary process > work with them? I just worked on this tonight myself. When doing > 1 primary (for example pktgen and app), I had to specify: --no-shconf --file-prefix pktgen --file-prefix app Or you get a panic and RTE fails to init, but the file-prefix seems to get applied both to the hugepage mmap() files and also to the lockfiles in /var/run: $ ls -a /var/run | egrep -i '^\.' . .. .pktgen_hugepage_info .rte_config .rte_hugepage_info .sdn_sensor_hugepage_info So I think you have to keep the different primary-secondary sets separate using --file-prefix . Matthew.