[dpdk-dev] [PATCH v3 17/25] virtio: Use port IO to get PCI resource.
Hi Changchun, 2015-01-29 15:24, Ouyang Changchun: > Make virtio not require UIO for some security reasons, this is to match > 6Wind's virtio-net-pmd. Thanks for your effort. I think port IO is a really interesting option but it needs more EAL rework to be correctly integrated. Then virtio-net-pmd (http://dpdk.org/browse/virtio-net-pmd/) will be obsolete and moved in a deprecated area. > --- a/config/common_linuxapp > +++ b/config/common_linuxapp > +# Only for VIRTIO PMD currently > +CONFIG_RTE_EAL_PORT_IO=n This is the first problem. We must stop adding new build-time options. We should be able to choose between PCI mapping and port IO at runtime. > +/** Device needs port IO(done with /proc/ioports) */ > +#ifdef RTE_EAL_PORT_IO > +#define RTE_PCI_DRV_PORT_IO 0x0002 > +#endif A flag should never be ifdef'ed. > @@ -574,7 +574,10 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, > struct rte_pci_device *d > /* map resources for devices that use igb_uio */ > ret = pci_map_device(dev); > if (ret != 0) > - return ret; > +#ifdef RTE_EAL_PORT_IO > + if ((dr->drv_flags & RTE_PCI_DRV_PORT_IO) == 0) > +#endif > + return ret; > } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && > rte_eal_process_type() == RTE_PROC_PRIMARY) { > /* unbind current driver */ Why do you need this ugly return? > --- a/lib/librte_pmd_virtio/virtio_ethdev.c > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c > @@ -961,6 +961,71 @@ static int virtio_resource_init(struct rte_pci_device > *pci_dev) >start, size); > return 0; > } > + > +#ifdef RTE_EAL_PORT_IO > +/* Extract port I/O numbers from proc/ioports */ > +static int virtio_resource_init_by_portio(struct rte_pci_device *pci_dev) > +{ > + uint16_t start, end; > + int size; > + FILE *fp; > + char *line = NULL; > + char pci_id[16]; > + int found = 0; > + size_t linesz; > + > + snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT, > + pci_dev->addr.domain, > + pci_dev->addr.bus, > + pci_dev->addr.devid, > + pci_dev->addr.function); > + > + fp = fopen("/proc/ioports", "r"); > + if (fp == NULL) { > + PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__); > + return -1; > + } > + > + while (getdelim(&line, &linesz, '\n', fp) > 0) { > + char *ptr = line; > + char *left; > + int n; > + > + n = strcspn(ptr, ":"); > + ptr[n] = 0; > + left = &ptr[n+1]; > + > + while (*left && isspace(*left)) > + left++; > + > + if (!strncmp(left, pci_id, strlen(pci_id))) { > + found = 1; > + > + while (*ptr && isspace(*ptr)) > + ptr++; > + > + sscanf(ptr, "%04hx-%04hx", &start, &end); > + size = end - start + 1; > + > + break; > + } > + } > + > + free(line); > + fclose(fp); > + > + if (!found) > + return -1; > + > + pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start; > + pci_dev->mem_resource[0].len = (uint64_t)size; > + PMD_INIT_LOG(DEBUG, > + "PCI Port IO found start=0x%lx with size=0x%lx", > + start, size); > + return 0; > +} > +#endif This part should be a Linux EAL service. > +#ifdef RTE_EAL_PORT_IO > +static struct eth_driver rte_virtio_pmd = { > + { > + .name = "rte_virtio_pmd", > + .id_table = pci_id_virtio_map, > + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_PORT_IO | Why does it need PCI mapping in port IO mode? > + RTE_PCI_DRV_INTR_LSC, > + }, > + .eth_dev_init = eth_virtio_dev_init, > + .dev_private_size = sizeof(struct virtio_hw), > +}; > +#else > static struct eth_driver rte_virtio_pmd = { > { > .name = "rte_virtio_pmd", This is the biggest problem. You are defining port IO as a different driver instead of providing a way to choose the method for each virtio device. I think that you should use devargs to configure the pci device. Thanks -- Thomas
[dpdk-dev] [PATCH] ixgbe: forbid building vpmd without Rx bulk alloc
CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is a prerequisite of CONFIG_RTE_IXGBE_INC_VECTOR. Reported-by: Alexander Belyakov Signed-off-by: Thomas Monjalon --- lib/librte_pmd_ixgbe/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/lib/librte_pmd_ixgbe/Makefile b/lib/librte_pmd_ixgbe/Makefile index 3588047..33c17db 100644 --- a/lib/librte_pmd_ixgbe/Makefile +++ b/lib/librte_pmd_ixgbe/Makefile @@ -114,4 +114,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += lib/librte_eal lib/librte_ether DEPDIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += lib/librte_mempool lib/librte_mbuf DEPDIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += lib/librte_net lib/librte_malloc +ifeq ($(CONFIG_RTE_IXGBE_INC_VECTOR)$(CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC),yn) +$(error The ixgbe vpmd depends on Rx bulk alloc) +endif + include $(RTE_SDK)/mk/rte.lib.mk -- 2.2.2
[dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
Hi PawelX > -Original Message- > From: Wodkowski, PawelX > Sent: Friday, January 30, 2015 12:14 AM > To: Ouyang, Changchun; Thomas Monjalon; Richardson, Bruce > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang, > Changchun > > Sent: Wednesday, January 28, 2015 2:35 AM > > To: Thomas Monjalon > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf > > rss > > > > Hi Thomas, > > > > > -Original Message- > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > Sent: Tuesday, January 27, 2015 8:13 PM > > > To: Ouyang, Changchun > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in > > > vf rss > > > > > > > To follow up the comments from Wodkowski, PawelX, remove this > > > > unnecessary check, as check_mq_mode has already check the queue > > > number > > > > in device configure stage, if the queue number of vf is not > > > > correct, it will return error code and exit, so it doesn't need > > > > check again here in device start stage(note: pf_host_configure is > > > > called in device start > > > stage). > > > > > > > > This fixes commit 42d2f78abcb77ecb769be4149df550308169ef0f > > > > > > > > Signed-off-by: Changchun Ouyang > > > > > > Suggested-by: Pawel Wodkowski > > > Fixes: 42d2f78abcb77 ("configure VF RSS") > > > > > > Applied > > > > > > > Thanks very much for the applying! > > > > > Changchun, as you are working on ixgbe, maybe you would like to > > > review some ixgbe patches from others? > > > > > > > No problem, I will try to do it when my bandwidth allows me to do it, > > :-) Thanks Changchun > > Actually I was suggesting exactly opposite direction. Main issue is that the > sriov field in rte_eth_dev_data is only used by igb and ixgbe drivers. In > addition > rte_eth_dev_check_mq_mode() is specialized for ixgbe driver. > > I am thinking about moving sriov from rte_eth_dev_data to driver's private > structure or at least move rte_eth_dev_check_mq_mode() to struct > eth_dev_ops as optional driver configuration step. > > What do you think about both steps? Good opinion! I prefer to move rte_eth_dev_check_mq_mode to eth_dev_ops as optional driver configure, The reason is that in future other eth type may also need such kind of check or even refine some queue number values by their own way, I can help review your patch after you send out. Thanks for your enhancing that. Changchun
[dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and several other enhancements.
Tested-by: Jingguo Fu - Tested Commit: 17f520d2cff8d69962824f28810f36e949a7184d - OS: Ubuntu14.04 3.13.0-24-generic - GCC: gcc version 4.8.2 - CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz - NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 01) - Default x86_64-native-linuxapp-gcc configuration - Total 5 cases, 5 passed, 0 failed - Case: l3fwdACL_ACL_rule Description: l3fwd Access Control match ACL rule test Command / instruction: Add ACL rules: echo '' > /root/rule_ipv4.db echo 'R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv4.db echo '' > /root/rule_ipv6.db echo 'R0:0:0:0:0:0:0:0/0 0:0:0:0:0:0:0:0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv6.db echo '' > /root/rule_ipv4.db echo @200.10.0.1/32 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 >> /root/rule_ipv4.db echo R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1 >> /root/rule_ipv4.db Start l3fwd-ACL with rule_ipv4 and rule_ipv6 config # ./examples/l3fwd-ACL/build/l3fwd-ACL -c 0x3c1e03c1e -n 4 -- -p 0x3 --config="(0,0,2),(1,0,3)" --rule_ipv4="/root/rule_ipv4.db" --rule_ipv6="/root/rule_ipv6.db" Send packets by Scapy according to ACL rule Expected result: Application can filter packets by ACL rules Test Result: PASSED - Case: l3fwdACL_exact_route Description: l3fwd Access Control match Exact route rule test Command / instruction: Add ACL rules: echo '' > /root/rule_ipv4.db echo 'R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv4.db echo '' > /root/rule_ipv6.db echo 'R0:0:0:0:0:0:0:0/0 0:0:0:0:0:0:0:0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv6.db echo '' > /root/rule_ipv4.db echo R200.10.0.1/32 100.10.0.1/32 11 : 11 101 : 101 0x06/0xff 0 >> /root/rule_ipv4.db echo R200.20.0.1/32 100.20.0.1/32 12 : 12 102 : 102 0x06/0xff 1 >> /root/rule_ipv4.db Start l3fwd-ACL with rule_ipv4 and rule_ipv6 config # ./examples/l3fwd-ACL/build/l3fwd-ACL -c 0x3c1e03c1e -n 4 -- -p 0x3 --config="(0,0,2),(1,0,3)" --rule_ipv4="/root/rule_ipv4.db" --rule_ipv6="/root/rule_ipv6.db" Send packets by Scapy according to route rule Expected result: ACL rule can filter packets Test Result: PASSED - Case: l3fwdACL_invalid Description: l3fwd Access Control handle Invalid rule test Command / instruction: Add ACL rules: echo '' > /root/rule_ipv4.db echo 'R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv4.db echo '' > /root/rule_ipv6.db echo 'R0:0:0:0:0:0:0:0/0 0:0:0:0:0:0:0:0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv6.db echo '' > /root/rule_ipv4.db echo R0.0.0.0/0 0.0.0.0/0 12 : 11 0 : 65535 0x00/0x00 0 >> /root/rule_ipv4.db echo R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1 >> /root/rule_ipv4.db Start l3fwd-ACL with rule_ipv4 and rule_ipv6 config # ./examples/l3fwd-ACL/build/l3fwd-ACL -c 0x3c1e03c1e -n 4 -- -p 0x3 --config="(0,0,2),(1,0,3)" --rule_ipv4="/root/rule_ipv4.db" --rule_ipv6="/root/rule_ipv6.db" Send packets by Scapy according to invalid rule Expected result: ACL rule can filter packets Test Result: PASSED - Case: l3fwdACL_lpm_route Description: l3fwd Access Control match Lpm route rule test Command / instruction: Add ACL rules: echo '' > /root/rule_ipv4.db echo 'R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv4.db echo '' > /root/rule_ipv6.db echo 'R0:0:0:0:0:0:0:0/0 0:0:0:0:0:0:0:0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv6.db echo '' > /root/rule_ipv4.db echo R0.0.0.0/0 1.1.1.0/24 0 : 65535 0 : 65535 0x00/0x00 0 >> /root/rule_ipv4.db echo R0.0.0.0/0 2.1.1.0/24 0 : 65535 0 : 65535 0x00/0x00 1 >> /root/rule_ipv4.db Start l3fwd-ACL with rule_ipv4 and rule_ipv6 config # ./examples/l3fwd-ACL/build/l3fwd-ACL -c 0x3c1e03c1e -n 4 -- -p 0x3 --config="(0,0,2),(1,0,3)" --rule_ipv4="/root/rule_ipv4.db" --rule_ipv6="/root/rule_ipv6.db" Send packets by Scapy according to lpm route rule Expected result: ACL rule can filter packets Test Result: PASSED - Case: l3fwdACL_scalar Description: l3fwd Access Control match with Scalar function test Command / instruction: Add ACL rules: echo '' > /root/rule_ipv4.db echo 'R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv4.db echo '' > /root/rule_ipv6.db echo 'R0:0:0:0:0:0:0:0/0 0:0:0:0:0:0:0:0/0 0 : 65535 0 : 65535 0x00/0x00 1' >> /root/rule_ipv6.db echo '' > /root/rule_ipv4.db echo @200.10.0
[dpdk-dev] [PATCH] igb: integrate flex filter to new API
changes: igb: remove old functions that deal with flex filter igb: add new functions that deal with flex filter(fit for new API) testpmd: change the entry for flex filter in cmdline testpmd: change function call to get flex filter in config doc: change doc that describes how to use flex filter related functions Signed-off-by: Zhida Zang --- app/test-pmd/cmdline.c | 244 ++-- app/test-pmd/config.c | 37 ++-- app/test-pmd/testpmd.h | 2 +- doc/guides/testpmd_app_ug/testpmd_funcs.rst | 56 ++--- lib/librte_ether/rte_eth_ctrl.h | 20 ++ lib/librte_pmd_e1000/e1000_ethdev.h | 20 ++ lib/librte_pmd_e1000/igb_ethdev.c | 331 +--- 7 files changed, 410 insertions(+), 300 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 4beb404..2f99d23 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -691,14 +691,19 @@ static void cmd_help_long_parsed(void *parsed_result, "get_syn_filter (port_id) " "get syn filter info.\n\n" - "add_flex_filter (port_id) len (len_value) bytes (bytes_string) mask (mask_value)" - " priority (prio_value) queue (queue_id) index (idx)\n" + "flex_filter add (port_id) len (len_value)" + " bytes (bytes_string) mask (mask_value)" + " priority (prio_value) queue (queue_id)\n" "add a flex filter.\n\n" - "remove_flex_filter (port_id) index (idx)\n" - "remove a flex filter.\n\n" + "flex_filter del (port_id) len (len_value)" + " bytes (bytes_string) mask (mask_value)" + " priority (prio_value)\n" + "del a flex filter.\n\n" - "get_flex_filter (port_id) index (idx)\n" + "flex_filter get (port_id) len (len_value)" + " bytes (bytes_string) mask (mask_value)" + "priority (prio_value)\n" "get info of a flex filter.\n\n" "flow_director_filter (port_id) (add|del)" @@ -7723,9 +7728,10 @@ cmdline_parse_inst_t cmd_get_5tuple_filter = { }, }; -/* *** ADD/REMOVE A flex FILTER *** */ +/* *** Add/Del/Get flex filter *** */ struct cmd_flex_filter_result { cmdline_fixed_string_t filter; + cmdline_fixed_string_t ops; uint8_t port_id; cmdline_fixed_string_t len; uint8_t len_value; @@ -7737,8 +7743,6 @@ struct cmd_flex_filter_result { uint8_t priority_value; cmdline_fixed_string_t queue; uint16_t queue_id; - cmdline_fixed_string_t index; - uint16_t index_value; }; static int xdigit2val(unsigned char c) @@ -7759,105 +7763,101 @@ cmd_flex_filter_parsed(void *parsed_result, __attribute__((unused)) void *data) { int ret = 0; - struct rte_flex_filter filter; + struct rte_eth_flex_filter filter; struct cmd_flex_filter_result *res = parsed_result; char *bytes_ptr, *mask_ptr; uint16_t len, i, j; char c; - int val, mod = 0; - uint32_t dword = 0; + int val; uint8_t byte = 0; uint8_t hex = 0; - if (!strcmp(res->filter, "add_flex_filter")) { - if (res->len_value > 128) { - printf("the len exceed the max length 128\n"); - return; - } - memset(&filter, 0, sizeof(struct rte_flex_filter)); - filter.len = res->len_value; - filter.priority = res->priority_value; - bytes_ptr = res->bytes_value; - mask_ptr = res->mask_value; - - j = 0; -/* translate bytes string to uint_32 array. */ - if (bytes_ptr[0] == '0' && ((bytes_ptr[1] == 'x') || - (bytes_ptr[1] == 'X'))) - bytes_ptr += 2; - len = strnlen(bytes_ptr, res->len_value * 2); - if (len == 0 || (len % 8 != 0)) { - printf("please check len and bytes input\n"); + if (res->len_value > 128) { + printf("the len exceed the max length 128\n"); + return; + } + memset(&filter, 0, sizeof(struct rte_eth_flex_filter)); + filter.len = res->len_value; + filter.priority = res->priority_value; + filter.queue = res->queue_id; + bytes_ptr = res->bytes_value; + mask_ptr = res->mask_value; + + j = 0; +/* translate bytes string to uint_32 array. */ + if (bytes_ptr[0] == '0' && ((bytes_ptr[1] == 'x') || + (bytes_ptr[1] == 'X'))) + bytes_ptr += 2; + len = strnlen
[dpdk-dev] [PATCH] scripts: enable extended tag of PCIe
As 'extended tag' of PCIe needs to be enabled for i40e high performance, Linux command of 'setpci' can be used to check and set the corresponding bit of 'extended tag' of PCIe configuration space. The script is to check and set the right bit in PCIe configuration space to enable 'extended tag'. Signed-off-by: Zhida Zang --- tools/set_pci.py | 124 +++ 1 file changed, 124 insertions(+) create mode 100755 tools/set_pci.py diff --git a/tools/set_pci.py b/tools/set_pci.py new file mode 100755 index 000..e242efb --- /dev/null +++ b/tools/set_pci.py @@ -0,0 +1,124 @@ +#! /usr/bin/python +import sys +import os +import subprocess +import getopt +from os.path import basename + +# The register to check if extended tag is supported or not. +PCI_DEV_CAP_REG = 0xA4 +# The control register which contains the bit to enable/disable 'extended tag'. +PCI_DEV_CTRL_REG = 0xA8 +# The mask of 'extended tag' in capability register. +PCI_DEV_CAP_EXT_TAG_MASK = 0x20 +# The mask of 'extended tag' in control register. +PCI_DEV_CTRL_EXT_TAG_MASK = 0x100 + +dev_ids = {} +flag = "Set" + + +def usage(): +'''Print usage information for the program''' +argv0 = basename(sys.argv[0]) +print """ +Usage: +-- + +%(argv0)s [options] DEVICE1 DEVICE2 + +where DEVICE1, DEVICE2 etc, are specified via PCI "domain:bus:slot.func" syntax +or "bus:slot.func" syntax. For devices bound to Linux kernel drivers, they may +also be referred to by Linux interface name e.g. eth0, eth1, em0, em1, etc. + +Options: +--help, --usage: +Display usage information and quit + +-s --set: +Set the following pci device + +-u --Unset: +Unset the following pci device + +Examples: +- +To set pci 0a:00.0 +%(argv0)s -s 0a:00.0 +%(argv0)s --set 0a:00.0 + +To unset :01:00.0 +%(argv0)s -u :01:00.0 +%(argv0)s --unset :01:00.0 + +To set :02:00.0 and :02:00.1 +%(argv0)s -s 02:00.0 02:00.1 + +""" % locals() # replace items from local variables + + +def parse_args(): +global flag +global dev_ids +if len(sys.argv) <= 1: +usage() +sys.exit(0) +try: +opts, dev_ids = getopt.getopt( +sys.argv[1:], +"su", +["help", "usage", "set", "unset"] +) +except getopt.GetoptError, error: +print str(error) +print "Run '%s --usage' for further information" % sys.argv[0] +sys.exit(1) + +for opt, arg in opts: +if opt == "--help" or opt == "--usage": +usage() +sys.exit(0) +if opt == "-s" or opt == "--set": +flag = "Set" +if opt == "-u" or opt == "--unset": +flag = "Unset" + + +def check_output(args, stderr=None): +'''Run a command and capture its output''' +return subprocess.Popen( +args, +stdout=subprocess.PIPE, +stderr=stderr +).communicate()[0] + + +def set_pci(): +if len(dev_ids) == 0: +print "Error: No devices specified." +print "Run '%s --usage' for further information" % sys.argv[0] +sys.exit(1) +param_cap = "%x.W" % PCI_DEV_CAP_REG +for k in range(len(dev_ids)): +val = check_output(["setpci", "-s", dev_ids[k], param_cap]) +if (not (int(val, 16) & PCI_DEV_CAP_EXT_TAG_MASK)): +print dev_ids[k], "Not supported" +continue +if (int(val, 16) & PCI_DEV_CTRL_EXT_TAG_MASK): +continue +param_ctrl = "%x.W" % PCI_DEV_CTRL_REG +val = check_output(["setpci", "-s", dev_ids[k], param_ctrl]) +if flag == "Set": +val = int(val, 16) | PCI_DEV_CTRL_EXT_TAG_MASK +else: +val = int(val, 16) & ~PCI_DEV_CTRL_EXT_TAG_MASK +param_ctrl = "%x.W=%x" % (PCI_DEV_CTRL_REG, val) +check_output(["setpci", "-s", dev_ids[k], param_ctrl]) + + +def main(): +parse_args() +set_pci() + +if __name__ == "__main__": +main() -- 1.9.3
[dpdk-dev] [PATCH 00/18] lib/librte_pmd_fm10k : fm10k pmd driver
From: "Chen Jing D(Mark)" The patch set add poll mode driver for the host interface of Intel Red Rock Canyon silicon, which integrates NIC and switch functionalities. The patch set include below features: 1. Basic RX/TX functions for PF/VF. 2. Interrupt handling mechanism for PF/VF. 3. per queue start/stop functions for PF/VF. 4. Mailbox handling between PF/VF and PF/Switch Manager. 5. Receive Side Scaling (RSS) for PF/VF. 6. Scatter receive function for PF/VF. 7. reta update/query for PF/VF. 8. VLAN filter set for PF. 9. Link status query for PF/VF. Jeff Shaw (18): fm10k: add base driver Change config/ files to add macros for fm10k fm10k: Add empty fm10k files fm10k: add fm10k device id fm10k: Add code to register fm10k pmd PF driver fm10k: add reta update/requery functions fm10k: add rx_queue_setup/release function fm10k: add tx_queue_setup/release function fm10k: add RX/TX single queue start/stop function fm10k: add dev start/stop functions fm10k: add receive and tranmit function fm10k: add PF RSS support fm10k: Add scatter receive function fm10k: add function to set vlan fm10k: Add SRIOV-VF support fm10k: add PF and VF interrupt handling function Change lib/Makefile to add fm10k driver into compile list. Change mk/rte.app.mk to add fm10k lib into link config/common_bsdapp|9 + config/common_linuxapp |9 + lib/Makefile|1 + lib/librte_eal/common/include/rte_pci_dev_ids.h | 22 + lib/librte_pmd_fm10k/Makefile | 96 + lib/librte_pmd_fm10k/SHARED/fm10k_api.c | 327 lib/librte_pmd_fm10k/SHARED/fm10k_api.h | 60 + lib/librte_pmd_fm10k/SHARED/fm10k_common.c | 573 ++ lib/librte_pmd_fm10k/SHARED/fm10k_common.h | 52 + lib/librte_pmd_fm10k/SHARED/fm10k_mbx.c | 2186 +++ lib/librte_pmd_fm10k/SHARED/fm10k_mbx.h | 329 lib/librte_pmd_fm10k/SHARED/fm10k_osdep.h | 116 ++ lib/librte_pmd_fm10k/SHARED/fm10k_pf.c | 1877 +++ lib/librte_pmd_fm10k/SHARED/fm10k_pf.h | 152 ++ lib/librte_pmd_fm10k/SHARED/fm10k_tlv.c | 914 ++ lib/librte_pmd_fm10k/SHARED/fm10k_tlv.h | 199 ++ lib/librte_pmd_fm10k/SHARED/fm10k_type.h| 925 ++ lib/librte_pmd_fm10k/SHARED/fm10k_vf.c | 586 ++ lib/librte_pmd_fm10k/SHARED/fm10k_vf.h | 91 + lib/librte_pmd_fm10k/fm10k.h| 293 +++ lib/librte_pmd_fm10k/fm10k_ethdev.c | 1846 +++ lib/librte_pmd_fm10k/fm10k_logs.h | 66 + lib/librte_pmd_fm10k/fm10k_rxtx.c | 427 + mk/rte.app.mk |4 + 24 files changed, 11160 insertions(+), 0 deletions(-) create mode 100644 lib/librte_pmd_fm10k/Makefile create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_api.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_api.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_common.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_common.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_mbx.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_mbx.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_osdep.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_pf.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_pf.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_tlv.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_tlv.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_type.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_vf.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_vf.h create mode 100644 lib/librte_pmd_fm10k/fm10k.h create mode 100644 lib/librte_pmd_fm10k/fm10k_ethdev.c create mode 100644 lib/librte_pmd_fm10k/fm10k_logs.h create mode 100644 lib/librte_pmd_fm10k/fm10k_rxtx.c -- 1.7.7.6
[dpdk-dev] [PATCH 02/18] Change config/ files to add macros for fm10k
From: Jeff Shaw Change config/common_bsdapp and config/common_linuxapp, add macros to control fm10k pmd driver compile for linux and bsd. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- config/common_bsdapp |9 + config/common_linuxapp |9 + 2 files changed, 18 insertions(+), 0 deletions(-) diff --git a/config/common_bsdapp b/config/common_bsdapp index 9177db1..b2fa259 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -182,6 +182,15 @@ CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4 CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1 # +# Compile burst-oriented FM10K PMD +# +CONFIG_RTE_LIBRTE_FM10K_PMD=y +CONFIG_RTE_LIBRTE_FM10K_DEBUG=n +CONFIG_RTE_LIBRTE_FM10K_DEBUG_RX=n +CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n +CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y + +# # Compile burst-oriented Cisco ENIC PMD driver # CONFIG_RTE_LIBRTE_ENIC_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 2f9643b..3705e80 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -180,6 +180,15 @@ CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4 CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1 # +# Compile burst-oriented FM10K PMD +# +CONFIG_RTE_LIBRTE_FM10K_PMD=y +CONFIG_RTE_LIBRTE_FM10K_DEBUG=n +CONFIG_RTE_LIBRTE_FM10K_DEBUG_RX=n +CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n +CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y + +# # Compile burst-oriented Cisco ENIC PMD driver # CONFIG_RTE_LIBRTE_ENIC_PMD=y -- 1.7.7.6
[dpdk-dev] [PATCH 04/18] fm10k: add fm10k device id
From: Jeff Shaw Add fm10k device ID list into rte_pci_dev_ids.h. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_eal/common/include/rte_pci_dev_ids.h | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/lib/librte_eal/common/include/rte_pci_dev_ids.h b/lib/librte_eal/common/include/rte_pci_dev_ids.h index c922de9..f54800e 100644 --- a/lib/librte_eal/common/include/rte_pci_dev_ids.h +++ b/lib/librte_eal/common/include/rte_pci_dev_ids.h @@ -132,6 +132,14 @@ #define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) #endif +#ifndef RTE_PCI_DEV_ID_DECL_FM10K +#define RTE_PCI_DEV_ID_DECL_FM10K(vend, dev) +#endif + +#ifndef RTE_PCI_DEV_ID_DECL_FM10KVF +#define RTE_PCI_DEV_ID_DECL_FM10KVF(vend, dev) +#endif + #ifndef PCI_VENDOR_ID_INTEL /** Vendor ID used by Intel devices */ #define PCI_VENDOR_ID_INTEL 0x8086 @@ -474,6 +482,12 @@ RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_QSFP_B) RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_QSFP_C) RTE_PCI_DEV_ID_DECL_I40E(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_10G_BASE_T) +/*** Physical FM10K devices from fm10k_type.h ***/ + +#define FM10K_DEV_ID_PF 0x15A4 + +RTE_PCI_DEV_ID_DECL_FM10K(PCI_VENDOR_ID_INTEL, FM10K_DEV_ID_PF) + /** Virtual IGB devices from e1000_hw.h **/ #define E1000_DEV_ID_82576_VF 0x10CA @@ -526,6 +540,12 @@ RTE_PCI_DEV_ID_DECL_VIRTIO(PCI_VENDOR_ID_QUMRANET, QUMRANET_DEV_ID_VIRTIO) RTE_PCI_DEV_ID_DECL_VMXNET3(PCI_VENDOR_ID_VMWARE, VMWARE_DEV_ID_VMXNET3) +/*** Virtual FM10K devices from fm10k_type.h ***/ + +#define FM10K_DEV_ID_VF 0x15A5 + +RTE_PCI_DEV_ID_DECL_FM10KVF(PCI_VENDOR_ID_INTEL, FM10K_DEV_ID_VF) + /* * Undef all RTE_PCI_DEV_ID_DECL_* here. */ @@ -538,3 +558,5 @@ RTE_PCI_DEV_ID_DECL_VMXNET3(PCI_VENDOR_ID_VMWARE, VMWARE_DEV_ID_VMXNET3) #undef RTE_PCI_DEV_ID_DECL_I40EVF #undef RTE_PCI_DEV_ID_DECL_VIRTIO #undef RTE_PCI_DEV_ID_DECL_VMXNET3 +#undef RTE_PCI_DEV_ID_DECL_FM10K +#undef RTE_PCI_DEV_ID_DECL_FM10KVF \ No newline at end of file -- 1.7.7.6
[dpdk-dev] [PATCH 05/18] fm10k: Add code to register fm10k pmd PF driver
From: Jeff Shaw 1. Add function to scan and initialize fm10k PF device. 2. Add implementation to register fm10k pmd PF driver. 3. Add 3 functions fm10k_dev_configure, fm10k_stats_get and fm10k_stats_get. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 339 +++ 1 files changed, 339 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index e69de29..400d841 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -0,0 +1,339 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2013-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include + +#include "fm10k.h" +#include "SHARED/fm10k_api.h" + +/* Default delay to acquire mailbox lock */ +#define FM10K_MBXLOCK_DELAY_US 20 + +static void +fm10k_mbx_initlock(struct fm10k_hw *hw) +{ + rte_spinlock_init(FM10K_DEV_PRIVATE_TO_MBXLOCK(hw->back)); +} + +static void +fm10k_mbx_lock(struct fm10k_hw *hw) +{ + while (!rte_spinlock_trylock(FM10K_DEV_PRIVATE_TO_MBXLOCK(hw->back))) + rte_delay_us(FM10K_MBXLOCK_DELAY_US); +} + +static void +fm10k_mbx_unlock(struct fm10k_hw *hw) +{ + rte_spinlock_unlock(FM10K_DEV_PRIVATE_TO_MBXLOCK(hw->back)); +} + +static int +fm10k_dev_configure(struct rte_eth_dev *dev) +{ + PMD_FUNC_TRACE(); + + if (dev->data->dev_conf.rxmode.hw_strip_crc == 0) + PMD_LOG(WARNING, "fm10k always strip CRC"); + + return 0; +} + +static void +fm10k_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + uint64_t ipackets, opackets, ibytes, obytes; + struct fm10k_hw *hw = + FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct fm10k_hw_stats *hw_stats = + FM10K_DEV_PRIVATE_TO_STATS(dev->data->dev_private); + int i; + PMD_FUNC_TRACE(); + + fm10k_update_hw_stats(hw, hw_stats); + + ipackets = opackets = ibytes = obytes = 0; + for (i = 0; (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) && + (i < FM10K_MAX_QUEUES_PF); ++i) { + stats->q_ipackets[i] = hw_stats->q[i].rx_packets.count; + stats->q_opackets[i] = hw_stats->q[i].tx_packets.count; + stats->q_ibytes[i] = hw_stats->q[i].rx_bytes.count; + stats->q_obytes[i] = hw_stats->q[i].tx_bytes.count; + ipackets += stats->q_ipackets[i]; + opackets += stats->q_opackets[i]; + ibytes += stats->q_ibytes[i]; + obytes += stats->q_obytes[i]; + } + stats->ipackets = ipackets; + stats->opackets = opackets; + stats->ibytes = ibytes; + stats->obytes = obytes; +} + +static void +fm10k_stats_reset(struct rte_eth_dev *dev) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct fm10k_hw_stats *hw_stats = + FM10K_DEV_PRIVATE_TO_STATS(dev->data->dev_private); + PMD_FUNC_TRACE(); + + memset(hw_stats, 0, sizeof(*hw_stats)); + fm10k_rebind_hw_stats(hw, hw_stats); +} + +/* Mailbox message handler in VF */ +static const struct fm10k_msg_data fm10k_msgdata_vf[] = { + FM10K_TLV_MSG_TEST_HANDLER(fm10k_tlv_msg_test), + FM1
[dpdk-dev] [PATCH 01/18] fm10k: add base driver
From: Jeff Shaw Base driver is developped and maintained by Intel ND team, includes basic functional service to Intel Red Rock Canyon silicon. Any suggestion on bug fix and improvement within this directory is welcome, but need this team to change and update. Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/SHARED/fm10k_api.c| 327 + lib/librte_pmd_fm10k/SHARED/fm10k_api.h| 60 + lib/librte_pmd_fm10k/SHARED/fm10k_common.c | 573 lib/librte_pmd_fm10k/SHARED/fm10k_common.h | 52 + lib/librte_pmd_fm10k/SHARED/fm10k_mbx.c| 2186 lib/librte_pmd_fm10k/SHARED/fm10k_mbx.h| 329 + lib/librte_pmd_fm10k/SHARED/fm10k_osdep.h | 116 ++ lib/librte_pmd_fm10k/SHARED/fm10k_pf.c | 1877 lib/librte_pmd_fm10k/SHARED/fm10k_pf.h | 152 ++ lib/librte_pmd_fm10k/SHARED/fm10k_tlv.c| 914 lib/librte_pmd_fm10k/SHARED/fm10k_tlv.h| 199 +++ lib/librte_pmd_fm10k/SHARED/fm10k_type.h | 925 lib/librte_pmd_fm10k/SHARED/fm10k_vf.c | 586 lib/librte_pmd_fm10k/SHARED/fm10k_vf.h | 91 ++ 14 files changed, 8387 insertions(+), 0 deletions(-) create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_api.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_api.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_common.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_common.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_mbx.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_mbx.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_osdep.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_pf.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_pf.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_tlv.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_tlv.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_type.h create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_vf.c create mode 100644 lib/librte_pmd_fm10k/SHARED/fm10k_vf.h diff --git a/lib/librte_pmd_fm10k/SHARED/fm10k_api.c b/lib/librte_pmd_fm10k/SHARED/fm10k_api.c new file mode 100644 index 000..56f2914 --- /dev/null +++ b/lib/librte_pmd_fm10k/SHARED/fm10k_api.c @@ -0,0 +1,327 @@ +/*** + +Copyright (c) 2013 - 2014, Intel Corporation +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright notice, +this list of conditions and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + + 3. Neither the name of the Intel Corporation nor the names of its +contributors may be used to endorse or promote products derived from +this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +***/ + +#include "fm10k_api.h" +#include "fm10k_common.h" + +/** + * fm10k_set_mac_type - Sets MAC type + * @hw: pointer to the HW structure + * + * This function sets the mac type of the adapter based on the + * vendor ID and device ID stored in the hw structure. + **/ +s32 fm10k_set_mac_type(struct fm10k_hw *hw) +{ + s32 ret_val = FM10K_SUCCESS; + + DEBUGFUNC("fm10k_set_mac_type"); + + if (hw->vendor_id != FM10K_INTEL_VENDOR_ID) { + ERROR_REPORT2(FM10K_ERROR_UNSUPPORTED, +"Unsupported vendor id: %x\n", hw->vendor_id); + return FM10K_ERR_DEVICE_NOT_SUPPORTED; + } + + switch (hw->device_id) { + case FM10K_DEV_ID_PF: + hw->mac.type = fm10k_mac_pf; + break; + case FM10K_DEV_ID_VF: + hw->mac.type = fm10k_mac_vf; + break; + default: + ret_val = FM10K_ERR_DEVICE_NOT_SUPPORTED; + ERROR_REPORT2(FM10K_ERROR_UNSUPPORTED, +
[dpdk-dev] [PATCH 03/18] fm10k: Add empty fm10k files
From: Jeff Shaw Define macros and basic data structure. Define rte_log wrapper functions. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/Makefile | 96 lib/librte_pmd_fm10k/fm10k.h | 224 + lib/librte_pmd_fm10k/fm10k_logs.h | 66 +++ 3 files changed, 386 insertions(+), 0 deletions(-) create mode 100644 lib/librte_pmd_fm10k/Makefile create mode 100644 lib/librte_pmd_fm10k/fm10k.h create mode 100644 lib/librte_pmd_fm10k/fm10k_ethdev.c create mode 100644 lib/librte_pmd_fm10k/fm10k_logs.h create mode 100644 lib/librte_pmd_fm10k/fm10k_rxtx.c diff --git a/lib/librte_pmd_fm10k/Makefile b/lib/librte_pmd_fm10k/Makefile new file mode 100644 index 000..3d76387 --- /dev/null +++ b/lib/librte_pmd_fm10k/Makefile @@ -0,0 +1,96 @@ +# BSD LICENSE +# +# Copyright(c) 2013-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_fm10k.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +ifeq ($(CC), icc) +# +# CFLAGS for icc +# +CFLAGS_BASE_DRIVER = -wd174 -wd593 -wd869 -wd981 -wd2259 + +else ifeq ($(CC), clang) +# +## CFLAGS for clang +# +CFLAGS_BASE_DRIVER = -Wno-unused-parameter -Wno-unused-value +CFLAGS_BASE_DRIVER += -Wno-strict-aliasing -Wno-format-extra-args +CFLAGS_BASE_DRIVER += -Wno-unused-variable -Wno-unused-but-set-variable +CFLAGS_BASE_DRIVER += -Wno-missing-field-initializers + +else +# +# CFLAGS for gcc +# +ifneq ($(shell test $(GCC_MAJOR_VERSION) -le 4 -a $(GCC_MINOR_VERSION) -le 3 && echo 1), 1) +CFLAGS += -Wno-deprecated +endif +CFLAGS_BASE_DRIVER = -Wno-unused-parameter -Wno-unused-value +CFLAGS_BASE_DRIVER += -Wno-strict-aliasing -Wno-format-extra-args +CFLAGS_BASE_DRIVER += -Wno-unused-variable -Wno-unused-but-set-variable +CFLAGS_BASE_DRIVER += -Wno-missing-field-initializers +endif + +# +# Add extra flags for base driver source files to disable warnings in them +# +BASE_DRIVER_OBJS=$(patsubst %.c,%.o,$(notdir $(wildcard $(RTE_SDK)/lib/librte_pmd_fm10k/SHARED/*.c))) +$(foreach obj, $(BASE_DRIVER_OBJS), $(eval CFLAGS_$(obj)+=$(CFLAGS_BASE_DRIVER))) + +VPATH += $(RTE_SDK)/lib/librte_pmd_fm10k/SHARED + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_ethdev.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx.c + +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_pf.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_tlv.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c + +# this lib depends upon: +DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether +DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_mempool lib/librte_mbuf +DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_net lib/librte_malloc + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_pmd_fm10k/fm10k.h b/lib/librte_pmd_fm10k/fm10k.h new file mode 100644 index 000..9b2d3da --- /dev/null +++ b/lib/librte_pmd_fm10k/fm10k.h @@ -0,0 +1,224 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2013-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, ar
[dpdk-dev] [PATCH 06/18] fm10k: add reta update/requery functions
From: Jeff Shaw 1. Add fm10k_reta_update and fm10k_reta_query functions. 2. Add fm10k_link_update and fm10k_dev_infos_get functions. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 161 +++ 1 files changed, 161 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 400d841..991d6ee 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -44,6 +44,10 @@ /* Default delay to acquire mailbox lock */ #define FM10K_MBXLOCK_DELAY_US 20 +/* Number of chars per uint32 type */ +#define CHARS_PER_UINT32 (sizeof(uint32_t)) +#define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1) + static void fm10k_mbx_initlock(struct fm10k_hw *hw) { @@ -74,6 +78,22 @@ fm10k_dev_configure(struct rte_eth_dev *dev) return 0; } +static int +fm10k_link_update(struct rte_eth_dev *dev, + __rte_unused int wait_to_complete) +{ + PMD_FUNC_TRACE(); + + /* The host-interface link is always up. The speed is ~50Gbps per Gen3 +* x8 PCIe interface. For now, we leave the speed undefined since there +* is no 50Gbps Ethernet. */ + dev->data->dev_link.link_speed = 0; + dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX; + dev->data->dev_link.link_status = 1; + + return 0; +} + static void fm10k_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) { @@ -117,6 +137,143 @@ fm10k_stats_reset(struct rte_eth_dev *dev) fm10k_rebind_hw_stats(hw, hw_stats); } +static void +fm10k_dev_infos_get(struct rte_eth_dev *dev, + struct rte_eth_dev_info *dev_info) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + PMD_FUNC_TRACE(); + + dev_info->min_rx_bufsize = FM10K_MIN_RX_BUF_SIZE; + dev_info->max_rx_pktlen = FM10K_MAX_PKT_SIZE; + dev_info->max_rx_queues = hw->mac.max_queues; + dev_info->max_tx_queues = hw->mac.max_queues; + dev_info->max_mac_addrs = 1; + dev_info->max_hash_mac_addrs = 0; + dev_info->max_vfs= FM10K_MAX_VF_NUM; + dev_info->max_vmdq_pools = ETH_64_POOLS; + dev_info->rx_offload_capa = + DEV_RX_OFFLOAD_IPV4_CKSUM | + DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM; + dev_info->tx_offload_capa= 0; + dev_info->reta_size = FM10K_MAX_RSS_INDICES; + + dev_info->default_rxconf = (struct rte_eth_rxconf) { + .rx_thresh = { + .pthresh = FM10K_DEFAULT_RX_PTHRESH, + .hthresh = FM10K_DEFAULT_RX_HTHRESH, + .wthresh = FM10K_DEFAULT_RX_WTHRESH, + }, + .rx_free_thresh = FM10K_RX_FREE_THRESH_DEFAULT(0), + .rx_drop_en = 0, + }; + + dev_info->default_txconf = (struct rte_eth_txconf) { + .tx_thresh = { + .pthresh = FM10K_DEFAULT_TX_PTHRESH, + .hthresh = FM10K_DEFAULT_TX_HTHRESH, + .wthresh = FM10K_DEFAULT_TX_WTHRESH, + }, + .tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0), + .tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0), + .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | + ETH_TXQ_FLAGS_NOOFFLOADS, + }; + +} + +static int +fm10k_reta_update(struct rte_eth_dev *dev, + struct rte_eth_rss_reta_entry64 *reta_conf, + uint16_t reta_size) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint16_t i, j, idx, shift; + uint8_t mask; + uint32_t reta; + + PMD_FUNC_TRACE(); + + if (reta_size > FM10K_MAX_RSS_INDICES) { + PMD_LOG(ERR, "The size of hash lookup table configured " + "(%d) doesn't match the number hardware can supported " + "(%d)\n", reta_size, FM10K_MAX_RSS_INDICES); + return -EINVAL; + } + + /* +* Update Redirection Table RETA[n], n=0..31. The redirection table has +* 128-entries in 32 registers +*/ + for (i = 0; i < FM10K_MAX_RSS_INDICES; i += CHARS_PER_UINT32) { + idx = i / RTE_RETA_GROUP_SIZE; + shift = i % RTE_RETA_GROUP_SIZE; + mask = (uint8_t)((reta_conf[idx].mask >> shift) & + BIT_MASK_PER_UINT32); + if (mask == 0) + continue; + + reta = 0; + if (mask != BIT_MASK_PER_UINT32) + reta = FM10K_READ_REG(hw, FM10K_RETA(0, i >> 2)); + + for (j = 0; j < CHARS_PER_UINT32; j++) { + if (mask & (0x1 << j)) { + if (mask != 0xF) +
[dpdk-dev] [PATCH 07/18] fm10k: add rx_queue_setup/release function
From: Jeff Shaw Add fm10k_rx_queue_setup and fm10k_rx_queue_release functions. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 253 +++ 1 files changed, 253 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 991d6ee..32388cd 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -41,6 +41,7 @@ #include "fm10k.h" #include "SHARED/fm10k_api.h" +#define FM10K_RX_BUFF_ALIGN 512 /* Default delay to acquire mailbox lock */ #define FM10K_MBXLOCK_DELAY_US 20 @@ -67,6 +68,46 @@ fm10k_mbx_unlock(struct fm10k_hw *hw) rte_spinlock_unlock(FM10K_DEV_PRIVATE_TO_MBXLOCK(hw->back)); } +/* + * clean queue, descriptor rings, free software buffers used when stopping + * device. + */ +static inline void +rx_queue_clean(struct fm10k_rx_queue *q) +{ + union fm10k_rx_desc zero = {.q = {0, 0, 0, 0} }; + uint32_t i; + PMD_FUNC_TRACE(); + + /* zero descriptor rings */ + for (i = 0; i < q->nb_desc; ++i) + q->hw_ring[i] = zero; + + /* free software buffers */ + for (i = 0; i < q->nb_desc; ++i) { + if (q->sw_ring[i]) { + rte_pktmbuf_free_seg(q->sw_ring[i]); + q->sw_ring[i] = NULL; + } + } +} + +/* + * free all queue memory used when releasing the queue (i.e. configure) + */ +static inline void +rx_queue_free(struct fm10k_rx_queue *q) +{ + PMD_FUNC_TRACE(); + if (q) { + PMD_LOG(DEBUG, "Freeing rx queue %p", q); + rx_queue_clean(q); + if (q->sw_ring) + rte_free(q->sw_ring); + rte_free(q); + } +} + static int fm10k_dev_configure(struct rte_eth_dev *dev) { @@ -183,6 +224,216 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev, } +static inline int +check_nb_desc(uint16_t min, uint16_t max, uint16_t mult, uint16_t request) +{ + if ((request < min) || (request > max) || ((request % mult) != 0)) + return -1; + else + return 0; +} + +/* + * Create a memzone for hardware descriptor rings. Malloc cannot be used since + * the physical address is required. If the memzone is already created, then + * this function returns a pointer to the existing memzone. + */ +static inline const struct rte_memzone * +allocate_hw_ring(const char *driver_name, const char *ring_name, + uint8_t port_id, uint16_t queue_id, int socket_id, + uint32_t size, uint32_t align) +{ + char name[RTE_MEMZONE_NAMESIZE]; + const struct rte_memzone *mz; + + snprintf(name, sizeof(name), "%s_%s_%d_%d_%d", +driver_name, ring_name, port_id, queue_id, socket_id); + + /* return the memzone if it already exists */ + mz = rte_memzone_lookup(name); + if (mz) + return mz; + +#ifdef RTE_LIBRTE_XEN_DOM0 + return rte_memzone_reserve_bounded(name, size, socket_id, 0, align, + RTE_PGSIZE_2M); +#else + return rte_memzone_reserve_aligned(name, size, socket_id, 0, align); +#endif +} + +static inline int +check_thresh(uint16_t min, uint16_t max, uint16_t div, uint16_t request) +{ + if ((request < min) || (request > max) || ((div % request) != 0)) + return -1; + else + return 0; +} + +static inline int +handle_rxconf(struct fm10k_rx_queue *q, const struct rte_eth_rxconf *conf) +{ + uint16_t rx_free_thresh; + + if (conf->rx_free_thresh == 0) + rx_free_thresh = FM10K_RX_FREE_THRESH_DEFAULT(q); + else + rx_free_thresh = conf->rx_free_thresh; + + /* make sure the requested threshold satisfies the constraints */ + if (check_thresh(FM10K_RX_FREE_THRESH_MIN(q), + FM10K_RX_FREE_THRESH_MAX(q), + FM10K_RX_FREE_THRESH_DIV(q), + rx_free_thresh)) { + PMD_LOG(ERR, "rx_free_thresh (%u) must be " + "less than or equal to %u, " + "greater than or equal to %u, " + "and a divisor of %u", + rx_free_thresh, FM10K_RX_FREE_THRESH_MAX(q), + FM10K_RX_FREE_THRESH_MIN(q), + FM10K_RX_FREE_THRESH_DIV(q)); + return (-EINVAL); + } + + q->alloc_thresh = rx_free_thresh; + q->drop_en = conf->rx_drop_en; + q->rx_deferred_start = conf->rx_deferred_start; + + return 0; +} + +/* + * Hardware requires specific alignment for Rx packet buffers. At + * least one of the following two conditions must be satisfied. + * 1. Address is 512B aligned + * 2. Address is 8B aligned and buffer does not cross 4K boundary. + * + * As such, the driver may need to adjust the DMA address within the +
[dpdk-dev] [PATCH 08/18] fm10k: add tx_queue_setup/release function
From: Jeff Shaw Add fm10k_tx_queue_setup and fm10k_tx_queue_release functions. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 203 +++ 1 files changed, 203 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 32388cd..2decf30 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -108,6 +108,48 @@ rx_queue_free(struct fm10k_rx_queue *q) } } +/* + * clean queue, descriptor rings, free software buffers used when stopping + * device + */ +static inline void +tx_queue_clean(struct fm10k_tx_queue *q) +{ + struct fm10k_tx_desc zero = {0, 0, 0, 0, 0, 0}; + uint32_t i; + PMD_FUNC_TRACE(); + + /* zero descriptor rings */ + for (i = 0; i < q->nb_desc; ++i) + q->hw_ring[i] = zero; + + /* free software buffers */ + for (i = 0; i < q->nb_desc; ++i) { + if (q->sw_ring[i]) { + rte_pktmbuf_free_seg(q->sw_ring[i]); + q->sw_ring[i] = NULL; + } + } +} + +/* + * free all queue memory used when releasing the queue (i.e. configure) + */ +static inline void +tx_queue_free(struct fm10k_tx_queue *q) +{ + PMD_FUNC_TRACE(); + if (q) { + PMD_LOG(DEBUG, "Freeing tx queue %p", q); + tx_queue_clean(q); + if (q->rs_tracker.list) + rte_free(q->rs_tracker.list); + if (q->sw_ring) + rte_free(q->sw_ring); + rte_free(q); + } +} + static int fm10k_dev_configure(struct rte_eth_dev *dev) { @@ -434,6 +476,165 @@ fm10k_rx_queue_release(void *queue) rx_queue_free(queue); } +static inline int +handle_txconf(struct fm10k_tx_queue *q, const struct rte_eth_txconf *conf) +{ + uint16_t tx_free_thresh; + uint16_t tx_rs_thresh; + + /* constraint MACROs require that tx_free_thresh is configured +* before tx_rs_thresh */ + if (conf->tx_free_thresh == 0) + tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(q); + else + tx_free_thresh = conf->tx_free_thresh; + + /* make sure the requested threshold satisfies the constraints */ + if (check_thresh(FM10K_TX_FREE_THRESH_MIN(q), + FM10K_TX_FREE_THRESH_MAX(q), + FM10K_TX_FREE_THRESH_DIV(q), + tx_free_thresh)) { + PMD_LOG(ERR, "tx_free_thresh (%u) must be " + "less than or equal to %u, " + "greater than or equal to %u, " + "and a divisor of %u", + tx_free_thresh, FM10K_TX_FREE_THRESH_MAX(q), + FM10K_TX_FREE_THRESH_MIN(q), + FM10K_TX_FREE_THRESH_DIV(q)); + return (-EINVAL); + } + + q->free_thresh = tx_free_thresh; + + if (conf->tx_rs_thresh == 0) + tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(q); + else + tx_rs_thresh = conf->tx_rs_thresh; + + q->tx_deferred_start = conf->tx_deferred_start; + + /* make sure the requested threshold satisfies the constraints */ + if (check_thresh(FM10K_TX_RS_THRESH_MIN(q), + FM10K_TX_RS_THRESH_MAX(q), + FM10K_TX_RS_THRESH_DIV(q), + tx_rs_thresh)) { + PMD_LOG(ERR, "tx_rs_thresh (%u) must be " + "less than or equal to %u, " + "greater than or equal to %u, " + "and a divisor of %u", + tx_rs_thresh, FM10K_TX_RS_THRESH_MAX(q), + FM10K_TX_RS_THRESH_MIN(q), + FM10K_TX_RS_THRESH_DIV(q)); + return (-EINVAL); + } + + q->rs_thresh = tx_rs_thresh; + + return 0; +} + +static int +fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, + uint16_t nb_desc, unsigned int socket_id, + const struct rte_eth_txconf *conf) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct fm10k_tx_queue *q; + const struct rte_memzone *mz; + PMD_FUNC_TRACE(); + + /* make sure a valid number of descriptors have been requested */ + if (check_nb_desc(FM10K_MIN_TX_DESC, FM10K_MAX_TX_DESC, + FM10K_MULT_TX_DESC, nb_desc)) { + PMD_LOG(ERR, "Number of Tx descriptors (%u) must be " + "less than or equal to %lu, " + "greater than or equal to %u, " + "and a multiple of %u", + nb_desc, FM10K_MAX_TX_DESC, FM10K_MIN_TX_DESC, + FM10K_MULT_TX_DESC); + return (-EINVAL); + } + + /* +* if
[dpdk-dev] [PATCH 09/18] fm10k: add RX/TX single queue start/stop function
From: Jeff Shaw 1. Add 4 functions fm10k_dev_rx_queue_start, fm10k_dev_rx_queue_stop, fm10k_dev_tx_queue_start, and fm10k_dev_tx_queue_stop. 2. verify Rx packet buffer alignment is valid. Hardware requires specific alignment for Rx packet buffers. At least one of the following two conditions must be satisfied. 1. Address is 512B aligned 2. Address is 8B aligned and buffer does not cross 4K boundary. Alignment is checked by the driver when the Rx queue is reset. It is assumed that if an entire descriptor ring can be filled with buffers containing valid alignment, then all buffers in that mempool have valid address alignment. It is the responsibility of the user to ensure all buffers have valid alignment, as it is the user who creates the mempool. It is assumed the buffer needs only to store a maximum size Ethernet frame. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k.h| 59 ++ lib/librte_pmd_fm10k/fm10k_ethdev.c | 214 +++ 2 files changed, 273 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k.h b/lib/librte_pmd_fm10k/fm10k.h index 9b2d3da..6e9effa 100644 --- a/lib/librte_pmd_fm10k/fm10k.h +++ b/lib/librte_pmd_fm10k/fm10k.h @@ -221,4 +221,63 @@ static inline uint16_t fifo_remove(struct fifo *fifo) fifo->tail = fifo->list; return val; } + +static inline void +fm10k_pktmbuf_reset(struct rte_mbuf *mb, uint8_t in_port) +{ + rte_mbuf_refcnt_set(mb, 1); + mb->next = NULL; + mb->nb_segs = 1; + + /* enforce 512B alignment on default Rx virtual addresses */ + mb->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb->buf_addr + + RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN) + - (char *)mb->buf_addr); + mb->port = in_port; +} + +/* + * Verify Rx packet buffer alignment is valid. + * + * Hardware requires specific alignment for Rx packet buffers. At + * least one of the following two conditions must be satisfied. + * 1. Address is 512B aligned + * 2. Address is 8B aligned and buffer does not cross 4K boundary. + * + * Return 1 if buffer alignment satisfies at least one condition, + * otherwise return 0. + * + * Note: Alignment is checked by the driver when the Rx queue is reset. It + * is assumed that if an entire descriptor ring can be filled with + * buffers containing valid alignment, then all buffers in that mempool + * have valid address alignment. It is the responsibility of the user + * to ensure all buffers have valid alignment, as it is the user who + * creates the mempool. + * Note: It is assumed the buffer needs only to store a maximum size Ethernet + * frame. + */ +static inline int +fm10k_addr_alignment_valid(struct rte_mbuf *mb) +{ + uint64_t addr = MBUF_DMA_ADDR_DEFAULT(mb); + uint64_t boundary1, boundary2; + + /* 512B aligned? */ + if (RTE_ALIGN(addr, 512) == addr) + return 1; + + /* 8B aligned, and max Ethernet frame would not cross a 4KB boundary? */ + if (RTE_ALIGN(addr, 8) == addr) { + boundary1 = RTE_ALIGN_FLOOR(addr, 4096); + boundary2 = RTE_ALIGN_FLOOR(addr + ETHER_MAX_VLAN_FRAME_LEN, + 4096); + if (boundary1 == boundary2) + return 1; + } + + /* use RTE_LOG directly to make sure this error is seen */ + RTE_LOG(ERR, PMD, "%s(): Error: Invalid buffer alignment\n", __func__); + + return 0; +} #endif diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 2decf30..b4b49cd 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -69,6 +69,43 @@ fm10k_mbx_unlock(struct fm10k_hw *hw) } /* + * reset queue to initial state, allocate software buffers used when starting + * device. + * return 0 on success + * return -ENOMEM if buffers cannot be allocated + * return -EINVAL if buffers do not satisfy alignment condition + */ +static inline int +rx_queue_reset(struct fm10k_rx_queue *q) +{ + uint64_t dma_addr; + int i, diag; + PMD_FUNC_TRACE(); + + diag = rte_mempool_get_bulk(q->mp, (void **)q->sw_ring, q->nb_desc); + if (diag != 0) + return -ENOMEM; + + for (i = 0; i < q->nb_desc; ++i) { + fm10k_pktmbuf_reset(q->sw_ring[i], q->port_id); + if (!fm10k_addr_alignment_valid(q->sw_ring[i])) { + rte_mempool_put_bulk(q->mp, (void **)q->sw_ring, + q->nb_desc); + return -EINVAL; + } + dma_addr = MBUF_DMA_ADDR_DEFAULT(q->sw_ring[i]); + q->hw_ring[i].q.pkt_addr = dma_addr; + q->hw_ring[i].q.hdr_addr = dma_addr; + } + + q->next_dd = 0; +
[dpdk-dev] [PATCH 10/18] fm10k: add dev start/stop functions
From: Jeff Shaw 1. Add function to initialize single RX queue. 2. Add function to initialize single TX queue. 3. Add fm10k_dev_start, fm10k_dev_stop and fm10k_dev_close functions. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 220 +++ 1 files changed, 220 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index b4b49cd..3cf5e25 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -49,6 +49,8 @@ #define CHARS_PER_UINT32 (sizeof(uint32_t)) #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1) +static void fm10k_close_mbx_service(struct fm10k_hw *hw); + static void fm10k_mbx_initlock(struct fm10k_hw *hw) { @@ -268,6 +270,98 @@ fm10k_dev_configure(struct rte_eth_dev *dev) } static int +fm10k_dev_tx_init(struct rte_eth_dev *dev) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + int i, ret; + struct fm10k_tx_queue *txq; + uint64_t base_addr; + uint32_t size; + + /* Disable TXINT to avoid possible interrupt */ + for (i = 0; i < hw->mac.max_queues; i++) + FM10K_WRITE_REG(hw, FM10K_TXINT(i), + 3 << FM10K_TXINT_TIMER_SHIFT); + + /* Setup TX queue */ + for (i = 0; i < dev->data->nb_tx_queues; ++i) { + txq = dev->data->tx_queues[i]; + base_addr = txq->hw_ring_phys_addr; + size = txq->nb_desc * sizeof(struct fm10k_tx_desc); + + /* disable queue to avoid issues while updating state */ + ret = tx_queue_disable(hw, i); + if (ret) { + PMD_LOG(ERR, "failed to disable queue %d\n", i); + return -1; + } + + /* set location and size for descriptor ring */ + FM10K_WRITE_REG(hw, FM10K_TDBAL(i), + base_addr & 0xULL); + FM10K_WRITE_REG(hw, FM10K_TDBAH(i), + base_addr >> (CHAR_BIT * sizeof(uint32_t))); + FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size); + } + return 0; +} + +static int +fm10k_dev_rx_init(struct rte_eth_dev *dev) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + int i, ret; + struct fm10k_rx_queue *rxq; + uint64_t base_addr; + uint32_t size; + uint32_t rxdctl = FM10K_RXDCTL_WRITE_BACK_MIN_DELAY; + uint16_t buf_size; + struct rte_pktmbuf_pool_private *mbp_priv; + + /* Disable RXINT to avoid possible interrupt */ + for (i = 0; i < hw->mac.max_queues; i++) + FM10K_WRITE_REG(hw, FM10K_RXINT(i), + 3 << FM10K_RXINT_TIMER_SHIFT); + + /* Setup RX queues */ + for (i = 0; i < dev->data->nb_rx_queues; ++i) { + rxq = dev->data->rx_queues[i]; + base_addr = rxq->hw_ring_phys_addr; + size = rxq->nb_desc * sizeof(union fm10k_rx_desc); + + /* disable queue to avoid issues while updating state */ + ret = rx_queue_disable(hw, i); + if (ret) { + PMD_LOG(ERR, "failed to disable queue %d\n", i); + return -1; + } + + /* Setup the Base and Length of the Rx Descriptor Ring */ + FM10K_WRITE_REG(hw, FM10K_RDBAL(i), + base_addr & 0xULL); + FM10K_WRITE_REG(hw, FM10K_RDBAH(i), + base_addr >> (CHAR_BIT * sizeof(uint32_t))); + FM10K_WRITE_REG(hw, FM10K_RDLEN(i), size); + + /* Configure the Rx buffer size for one buff without split */ + mbp_priv = rte_mempool_get_priv(rxq->mp); + buf_size = (uint16_t) (mbp_priv->mbuf_data_room_size - + RTE_PKTMBUF_HEADROOM); + FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), + buf_size >> FM10K_SRRCTL_BSIZEPKT_SHIFT); + + /* Enable drop on empty, it's RO for VF */ + if (hw->mac.type == fm10k_mac_pf && rxq->drop_en) + rxdctl |= FM10K_RXDCTL_DROP_ON_EMPTY; + + FM10K_WRITE_REG(hw, FM10K_RXDCTL(i), rxdctl); + FM10K_WRITE_FLUSH(hw); + } + + return 0; +} + +static int fm10k_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id) { struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); @@ -371,6 +465,122 @@ fm10k_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id) return 0; } +/* fls = find last set bit = 32 minus the number of leading zeros */ +#ifndef fls +#define fls(x) (((x) == 0) ? 0 : (32 - __builtin_clz((x +#endif +#define BSIZEPKT_RO
[dpdk-dev] [PATCH 11/18] fm10k: add receive and tranmit function
From: Jeff Shaw 1. Add fm10k_recv_pkts and fm10k_xmit_pkts functions. 2. Link app function pointer to actual fm10k recv/xmit functions. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k.h|7 + lib/librte_pmd_fm10k/fm10k_ethdev.c |2 + lib/librte_pmd_fm10k/fm10k_rxtx.c | 299 +++ 3 files changed, 308 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k.h b/lib/librte_pmd_fm10k/fm10k.h index 6e9effa..bf8b132 100644 --- a/lib/librte_pmd_fm10k/fm10k.h +++ b/lib/librte_pmd_fm10k/fm10k.h @@ -280,4 +280,11 @@ fm10k_addr_alignment_valid(struct rte_mbuf *mb) return 0; } + +/* Rx and Tx prototypes */ +uint16_t fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts); + +uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); #endif diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 3cf5e25..9907906 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -1224,6 +1224,8 @@ eth_fm10k_dev_init(__rte_unused struct eth_driver *eth_drv, PMD_FUNC_TRACE(); dev->dev_ops = &fm10k_eth_dev_ops; + dev->rx_pkt_burst = &fm10k_recv_pkts; + dev->tx_pkt_burst = &fm10k_xmit_pkts; /* only initialize in the primary process */ if (rte_eal_process_type() != RTE_PROC_PRIMARY) diff --git a/lib/librte_pmd_fm10k/fm10k_rxtx.c b/lib/librte_pmd_fm10k/fm10k_rxtx.c index e69de29..952bb0e 100644 --- a/lib/librte_pmd_fm10k/fm10k_rxtx.c +++ b/lib/librte_pmd_fm10k/fm10k_rxtx.c @@ -0,0 +1,299 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2013-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include +#include +#include "fm10k.h" +#include "SHARED/fm10k_type.h" + +#ifdef RTE_PMD_PACKET_PREFETCH +#define rte_packet_prefetch(p) rte_prefetch1(p) +#else +#define rte_packet_prefetch(p) do {} while (0) +#endif + +static inline void dump_rxd(union fm10k_rx_desc *rxd) +{ +#ifndef RTE_LIBRTE_FM10K_DEBUG_RX + RTE_SET_USED(rxd); +#endif + PMD_LOG_RX(DEBUG, "+|+"); + PMD_LOG_RX(DEBUG, "| GLORT | PKT HDR & TYPE |"); + PMD_LOG_RX(DEBUG, "| 0x%08x | 0x%08x |", rxd->d.glort, + rxd->d.data); + PMD_LOG_RX(DEBUG, "+|+"); + PMD_LOG_RX(DEBUG, "| VLAN & LEN | STATUS |"); + PMD_LOG_RX(DEBUG, "| 0x%08x | 0x%08x |", rxd->d.vlan_len, + rxd->d.staterr); + PMD_LOG_RX(DEBUG, "+|+"); + PMD_LOG_RX(DEBUG, "|RESERVED|RSS_HASH|"); + PMD_LOG_RX(DEBUG, "| 0x%08x | 0x%08x |", 0, rxd->d.rss); + PMD_LOG_RX(DEBUG, "+|+"); + PMD_LOG_RX(DEBUG, "|TIME TAG |"); + PMD_LOG_RX(DEBUG, "| 0x%016lx|", rxd->q.timestamp); + PMD_LOG_RX(DEBUG, "+|+"); +} + +static inline void +rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d) +{ + uint16_t ptype; + static const uint16_t pt_lut[] = { 0, + PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, + PKT_RX_IPV6_HDR, PKT_R
[dpdk-dev] [PATCH 12/18] fm10k: add PF RSS support
From: Jeff Shaw 1. Configure RSS in fm10k_dev_rx_init function. 2. Add fm10k_rss_hash_update and fm10k_rss_hash_conf_get to get and inquery RSS configuration. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 156 +++ 1 files changed, 156 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 9907906..4711047 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -269,6 +269,78 @@ fm10k_dev_configure(struct rte_eth_dev *dev) return 0; } +static void +fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct rte_eth_conf *dev_conf = &dev->data->dev_conf; + uint32_t mrqc, *key, i, reta, j; + uint64_t hf; + +#define RSS_KEY_SIZE 40 + static uint8_t rss_intel_key[RSS_KEY_SIZE] = { + 0x6D, 0x5A, 0x56, 0xDA, 0x25, 0x5B, 0x0E, 0xC2, + 0x41, 0x67, 0x25, 0x3D, 0x43, 0xA3, 0x8F, 0xB0, + 0xD0, 0xCA, 0x2B, 0xCB, 0xAE, 0x7B, 0x30, 0xB4, + 0x77, 0xCB, 0x2D, 0xA3, 0x80, 0x30, 0xF2, 0x0C, + 0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA, + }; + + if (dev->data->nb_rx_queues == 1 || + dev_conf->rxmode.mq_mode != ETH_MQ_RX_RSS || + dev_conf->rx_adv_conf.rss_conf.rss_hf == 0) + return; + + /* random key is rss_intel_key (default) or user provided (rss_key) */ + if (dev_conf->rx_adv_conf.rss_conf.rss_key == NULL) + key = (uint32_t *)rss_intel_key; + else + key = (uint32_t *)dev_conf->rx_adv_conf.rss_conf.rss_key; + + /* Now fill our hash function seeds, 4 bytes at a time */ + for (i = 0; i < RSS_KEY_SIZE / sizeof(*key); ++i) + FM10K_WRITE_REG(hw, FM10K_RSSRK(0, i), key[i]); + + /* +* Fill in redirection table +* The byte-swap is needed because NIC registers are in +* little-endian order. +*/ + reta = 0; + for (i = 0, j = 0; i < FM10K_RETA_SIZE; i++, j++) { + if (j == dev->data->nb_rx_queues) + j = 0; + reta = (reta << CHAR_BIT) | j; + if ((i & 3) == 3) + FM10K_WRITE_REG(hw, FM10K_RETA(0, i >> 2), + rte_bswap32(reta)); + } + + /* +* Generate RSS hash based on packet types, TCP/UDP +* port numbers and/or IPv4/v6 src and dst addresses +*/ + hf = dev_conf->rx_adv_conf.rss_conf.rss_hf; + mrqc = 0; + mrqc |= (hf & ETH_RSS_IPV4_TCP)? FM10K_MRQC_TCP_IPV4 : 0; + mrqc |= (hf & ETH_RSS_IPV4)? FM10K_MRQC_IPV4 : 0; + mrqc |= (hf & ETH_RSS_IPV6)? FM10K_MRQC_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_EX) ? FM10K_MRQC_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_TCP)? FM10K_MRQC_TCP_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_TCP_EX) ? FM10K_MRQC_TCP_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV4_UDP)? FM10K_MRQC_UDP_IPV4 : 0; + mrqc |= (hf & ETH_RSS_IPV6_UDP)? FM10K_MRQC_UDP_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_UDP_EX) ? FM10K_MRQC_UDP_IPV6 : 0; + + if (mrqc == 0) { + PMD_LOG(ERR, "Specified RSS mode 0x%"PRIx64"is not supported\n", + hf); + return; + } + + FM10K_WRITE_REG(hw, FM10K_MRQC(0), mrqc); +} + static int fm10k_dev_tx_init(struct rte_eth_dev *dev) { @@ -358,6 +430,8 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev) FM10K_WRITE_FLUSH(hw); } + /* Configure RSS if applicable */ + fm10k_dev_mq_rx_configure(dev); return 0; } @@ -1146,6 +1220,86 @@ fm10k_reta_query(struct rte_eth_dev *dev, return 0; } +static int +fm10k_rss_hash_update(struct rte_eth_dev *dev, + struct rte_eth_rss_conf *rss_conf) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint32_t *key = (uint32_t *)rss_conf->rss_key; + uint32_t mrqc; + uint64_t hf = rss_conf->rss_hf; + int i; + + PMD_FUNC_TRACE(); + + if (rss_conf->rss_key_len < FM10K_RSSRK_SIZE * + FM10K_RSSRK_ENTRIES_PER_REG) + return -EINVAL; + + if (hf == 0) + return -EINVAL; + + mrqc = 0; + mrqc |= (hf & ETH_RSS_IPV4_TCP)? FM10K_MRQC_TCP_IPV4 : 0; + mrqc |= (hf & ETH_RSS_IPV4)? FM10K_MRQC_IPV4 : 0; + mrqc |= (hf & ETH_RSS_IPV6)? FM10K_MRQC_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_EX) ? FM10K_MRQC_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_TCP)? FM10K_MRQC_TCP_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV6_TCP_EX) ? FM10K_MRQC_TCP_IPV6 : 0; + mrqc |= (hf & ETH_RSS_IPV4_UDP)? FM10K_MRQC_UDP_IPV4 :
[dpdk-dev] [PATCH 13/18] fm10k: Add scatter receive function
From: Jeff Shaw 1. Add fm10k_recv_scattered_pkts function to receive jumbo frame and multi-segment packets. 2. Configure correct receive function in rx_init and dev_init. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k.h|3 + lib/librte_pmd_fm10k/fm10k_ethdev.c | 15 lib/librte_pmd_fm10k/fm10k_rxtx.c | 128 +++ 3 files changed, 146 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k.h b/lib/librte_pmd_fm10k/fm10k.h index bf8b132..8bdefad 100644 --- a/lib/librte_pmd_fm10k/fm10k.h +++ b/lib/librte_pmd_fm10k/fm10k.h @@ -285,6 +285,9 @@ fm10k_addr_alignment_valid(struct rte_mbuf *mb) uint16_t fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +uint16_t fm10k_recv_scattered_pkts(void *rx_queue, + struct rte_mbuf **rx_pkts, uint16_t nb_pkts); + uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); #endif diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 4711047..b231c31 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -422,6 +422,13 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev) FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), buf_size >> FM10K_SRRCTL_BSIZEPKT_SHIFT); + /* It adds dual VLAN length for supporting dual VLAN */ + if ((dev->data->dev_conf.rxmode.max_rx_pkt_len + + 2 * FM10K_VLAN_TAG_SIZE) > buf_size){ + dev->data->scattered_rx = 1; + dev->rx_pkt_burst = fm10k_recv_scattered_pkts; + } + /* Enable drop on empty, it's RO for VF */ if (hw->mac.type == fm10k_mac_pf && rxq->drop_en) rxdctl |= FM10K_RXDCTL_DROP_ON_EMPTY; @@ -430,6 +437,11 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev) FM10K_WRITE_FLUSH(hw); } + if (dev->data->dev_conf.rxmode.enable_scatter) { + dev->rx_pkt_burst = fm10k_recv_scattered_pkts; + dev->data->scattered_rx = 1; + } + /* Configure RSS if applicable */ fm10k_dev_mq_rx_configure(dev); return 0; @@ -1383,6 +1395,9 @@ eth_fm10k_dev_init(__rte_unused struct eth_driver *eth_drv, dev->rx_pkt_burst = &fm10k_recv_pkts; dev->tx_pkt_burst = &fm10k_xmit_pkts; + if (dev->data->scattered_rx) + dev->rx_pkt_burst = &fm10k_recv_scattered_pkts; + /* only initialize in the primary process */ if (rte_eal_process_type() != RTE_PROC_PRIMARY) return 0; diff --git a/lib/librte_pmd_fm10k/fm10k_rxtx.c b/lib/librte_pmd_fm10k/fm10k_rxtx.c index 952bb0e..0400b69 100644 --- a/lib/librte_pmd_fm10k/fm10k_rxtx.c +++ b/lib/librte_pmd_fm10k/fm10k_rxtx.c @@ -177,6 +177,134 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, return count; } +uint16_t +fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + struct rte_mbuf *mbuf; + union fm10k_rx_desc desc; + struct fm10k_rx_queue *q = rx_queue; + uint16_t count = 0; + uint16_t nb_rcv, nb_seg; + int alloc = 0; + uint16_t next_dd; + struct rte_mbuf *first_seg = q->pkt_first_seg; + struct rte_mbuf *last_seg = q->pkt_last_seg; + + next_dd = q->next_dd; + nb_rcv = 0; + + nb_seg = RTE_MIN(nb_pkts, q->alloc_thresh); + for (count = 0; count < nb_seg; count++) { + mbuf = q->sw_ring[next_dd]; + desc = q->hw_ring[next_dd]; + if (!(desc.d.staterr & FM10K_RXD_STATUS_DD)) + break; +#ifdef RTE_LIBRTE_FM10K_DEBUG_RX + dump_rxd(&desc); +#endif + + if (++next_dd == q->nb_desc) { + next_dd = 0; + alloc = 1; + } + + /* Prefetch next mbuf while processing current one. */ + rte_prefetch0(q->sw_ring[next_dd]); + + /* +* When next RX descriptor is on a cache-line boundary, +* prefetch the next 4 RX descriptors and the next 8 pointers +* to mbufs. +*/ + if ((next_dd & 0x3) == 0) { + rte_prefetch0(&q->hw_ring[next_dd]); + rte_prefetch0(&q->sw_ring[next_dd]); + } + + /* Fill data length */ + rte_pktmbuf_data_len(mbuf) = desc.w.length; + + /* +* If this is the first buffer of the received packet, +* set the pointer to the first mbuf of the packet and +* initialize its context. +* Otherwise, update the total length and the number of segments +
[dpdk-dev] [PATCH 14/18] fm10k: add function to set vlan
From: Jeff Shaw Add fm10k_vlan_filter_set to set vlan. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index b231c31..daa687e 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -772,6 +772,19 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev, } +static int +fm10k_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + PMD_FUNC_TRACE(); + + /* @todo - add support for the VF */ + if (hw->mac.type != fm10k_mac_pf) + return -ENOTSUP; + + return fm10k_update_vlan(hw, vlan_id, 0, on); +} + static inline int check_nb_desc(uint16_t min, uint16_t max, uint16_t mult, uint16_t request) { @@ -1369,6 +1382,7 @@ static struct eth_dev_ops fm10k_eth_dev_ops = { .stats_get = fm10k_stats_get, .stats_reset = fm10k_stats_reset, .dev_infos_get = fm10k_dev_infos_get, + .vlan_filter_set = fm10k_vlan_filter_set, .rx_queue_start = fm10k_dev_rx_queue_start, .rx_queue_stop = fm10k_dev_rx_queue_stop, .tx_queue_start = fm10k_dev_tx_queue_start, -- 1.7.7.6
[dpdk-dev] [PATCH 16/18] fm10k: add PF and VF interrupt handling function
From: Jeff Shaw 1. Add 2 interrupt handling functions, one for PF and one for VF. 2. Enable interrupt after completing initialization of NIC. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c | 268 +++ 1 files changed, 268 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index 40e3a2b..685fa8f 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -1325,6 +1325,256 @@ fm10k_rss_hash_conf_get(struct rte_eth_dev *dev, return 0; } +static void +fm10k_dev_enable_intr_pf(struct rte_eth_dev *dev) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint32_t int_map = FM10K_INT_MAP_IMMEDIATE; + + /* Bind all local non-queue interrupt to vector 0 */ + int_map |= 0; + + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_Mailbox), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_PCIeFault), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchUpDown), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchEvent), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SRAM), int_map); + FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_VFLR), int_map); + + /* Enable misc causes */ + FM10K_WRITE_REG(hw, FM10K_EIMR, FM10K_EIMR_ENABLE(PCA_FAULT) | + FM10K_EIMR_ENABLE(THI_FAULT) | + FM10K_EIMR_ENABLE(FUM_FAULT) | + FM10K_EIMR_ENABLE(MAILBOX) | + FM10K_EIMR_ENABLE(SWITCHREADY) | + FM10K_EIMR_ENABLE(SWITCHNOTREADY) | + FM10K_EIMR_ENABLE(SRAMERROR) | + FM10K_EIMR_ENABLE(VFLR)); + + /* Enable ITR 0 */ + FM10K_WRITE_REG(hw, FM10K_ITR(0), FM10K_ITR_AUTOMASK | + FM10K_ITR_MASK_CLEAR); + FM10K_WRITE_FLUSH(hw); +} + +static void +fm10k_dev_enable_intr_vf(struct rte_eth_dev *dev) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint32_t int_map = FM10K_INT_MAP_IMMEDIATE; + + /* Bind all local non-queue interrupt to vector 0 */ + int_map |= 0; + + /* Only INT 0 availiable, other 15 are reserved. */ + FM10K_WRITE_REG(hw, FM10K_VFINT_MAP, int_map); + + /* Enable ITR 0 */ + FM10K_WRITE_REG(hw, FM10K_VFITR(0), FM10K_ITR_AUTOMASK | + FM10K_ITR_MASK_CLEAR); + FM10K_WRITE_FLUSH(hw); +} + +static int +fm10k_dev_handle_fault(struct fm10k_hw *hw, uint32_t eicr) +{ + struct fm10k_fault fault; + int err; + const char *estr = "Unknown error"; + + /* Process PCA fault */ + if (eicr & FM10K_EIMR_PCA_FAULT) { + err = fm10k_get_fault(hw, FM10K_PCA_FAULT, &fault); + if (err) + goto error; + switch (fault.type) { + case PCA_NO_FAULT: + estr = "PCA_NO_FAULT"; break; + case PCA_UNMAPPED_ADDR: + estr = "PCA_UNMAPPED_ADDR"; break; + case PCA_BAD_QACCESS_PF: + estr = "PCA_BAD_QACCESS_PF"; break; + case PCA_BAD_QACCESS_VF: + estr = "PCA_BAD_QACCESS_VF"; break; + case PCA_MALICIOUS_REQ: + estr = "PCA_MALICIOUS_REQ"; break; + case PCA_POISONED_TLP: + estr = "PCA_POISONED_TLP"; break; + case PCA_TLP_ABORT: + estr = "PCA_TLP_ABORT"; break; + default: + goto error; + } + PMD_LOG(ERR, "%s: %s(%d) Addr:0x%"PRIu64" Spec: 0x%x", + estr, fault.func ? "VF" : "PF", fault.func, + fault.address, fault.specinfo); + } + + /* Process THI fault */ + if (eicr & FM10K_EIMR_THI_FAULT) { + err = fm10k_get_fault(hw, FM10K_THI_FAULT, &fault); + if (err) + goto error; + switch (fault.type) { + case THI_NO_FAULT: + estr = "THI_NO_FAULT"; break; + case THI_MAL_DIS_Q_FAULT: + estr = "THI_MAL_DIS_Q_FAULT"; break; + default: + goto error; + } + PMD_LOG(ERR, "%s: %s(%d) Addr:0x%"PRIu64" Spec: 0x%x", + estr, fault.func ? "VF" : "PF", fault.func, + fault.address, fault.specinfo); + } + + /* Process FUM fault */ + if (eicr & FM10K_EIMR_FUM_FAULT) { + err = fm10k_get_fault(hw, FM10K_FUM_FAULT, &fault); + if (err) + goto error; +
[dpdk-dev] [PATCH 17/18] Change lib/Makefile to add fm10k driver into compile list.
From: Jeff Shaw Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/Makefile |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/lib/Makefile b/lib/Makefile index 0ffc982..b1f3860 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -43,6 +43,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += librte_pmd_e1000 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += librte_pmd_ixgbe DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e +DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += librte_pmd_fm10k DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += librte_pmd_enic DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring -- 1.7.7.6
[dpdk-dev] [PATCH 18/18] Change mk/rte.app.mk to add fm10k lib into link
From: Jeff Shaw Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- mk/rte.app.mk |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 4294d9a..87d8763 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -211,6 +211,10 @@ ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y) LDLIBS += -lrte_pmd_i40e endif +ifeq ($(CONFIG_RTE_LIBRTE_FM10K_PMD),y) +LDLIBS += -lrte_pmd_fm10k +endif + ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) LDLIBS += -lrte_pmd_ixgbe endif -- 1.7.7.6
[dpdk-dev] [PATCH 15/18] fm10k: Add SRIOV-VF support
From: Jeff Shaw fm10k pmd driver will support both PF and VF device with single copy of code. The reason is NIC maps registers with same function in PF and VF to same PCI I/O address. Then, PF/VF drivers use same address to access registers belonging to it, HW will translatethe request to correct units. For some functionalities that is unique to PF, driver will check current driver type and behave correctly. Signed-off-by: Jeff Shaw Signed-off-by: Chen Jing D(Mark) --- lib/librte_pmd_fm10k/fm10k_ethdev.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c index daa687e..40e3a2b 100644 --- a/lib/librte_pmd_fm10k/fm10k_ethdev.c +++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c @@ -1541,6 +1541,7 @@ eth_fm10k_dev_init(__rte_unused struct eth_driver *eth_drv, */ static struct rte_pci_id pci_id_fm10k_map[] = { #define RTE_PCI_DEV_ID_DECL_FM10K(vend, dev) { RTE_PCI_DEVICE(vend, dev) }, +#define RTE_PCI_DEV_ID_DECL_FM10KVF(vend, dev) { RTE_PCI_DEVICE(vend, dev) }, #include "rte_pci_dev_ids.h" { .vendor_id = 0, /* sentinel */ }, }; -- 1.7.7.6
[dpdk-dev] [DISCUSSION] : ERROR while running vhost example in dpdk-1.8
On 2015/1/30 0:48, Srinivasreddy R wrote: > EAL: 512 hugepages of size 2097152 reserved, but no mounted hugetlbfs found > for that size Maybe you haven't mount hugetlbfs. -- Regards, Haifeng
[dpdk-dev] [PATCH v5 00/13] Port Hotplug Framework
This patch series adds a dynamic port hotplug framework to DPDK. With the patches, DPDK apps can attach or detach ports at runtime. The basic concept of the port hotplug is like followings. - DPDK apps must have responsibility to manage ports. DPDK apps only know which ports are attached or detached at the moment. The port hotplug framework is implemented to allow DPDK apps to manage ports. For example, when DPDK apps call port attach function, attached port number will be returned. Also DPDK apps can detach port by port number. - Kernel support is needed for attaching or detaching physical device ports. To attach a new physical device port, the device will be recognized by userspace directly I/O framework in kernel at first. Then DPDK apps can call the port hotplug functions to attach ports. For detaching, steps are vice versa. - Before detach ports, ports must be stopped and closed. DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before detaching ports. These function will call finalization codes of PMDs. But so far, no PMD frees all resources allocated by initialization. It means PMDs are needed to be fixed to support the port hotplug. 'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching. Without this flag, detaching will be failed. - Mustn't affect legacy DPDK apps. No DPDK EAL behavior is changed, if the port hotplug functions are't called. So all legacy DPDK apps can still work without modifications. And a few limitations. - The port hotplug functions are not thread safe. DPDK apps should handle it. - Only support Linux and igb_uio so far. BSD and VFIO is not supported. I will send VFIO patches at least, but I don't have a plan to submit BSD patch so far. Here is port hotplug APIs. --- /** * Attach a new device. * * @param devargs * A pointer to a strings array describing the new device * to be attached. The strings should be a pci address like * ':01:00.0' or virtual device name like 'eth_pcap0'. * @param port_id * A pointer to a port identifier actually attached. * @return * 0 on success and port_id is filled, negative on error */ int rte_eal_dev_attach(const char *devargs, uint8_t *port_id); /** * Detach a device. * * @param port_id * The port identifier of the device to detach. * @param addr * A pointer to a device name actually detached. * @return * 0 on success and devname is filled, negative on error */ int rte_eal_dev_detach(uint8_t port_id, char *devname); --- This patch series are for DPDK EAL. To use port hotplug function by DPDK apps, each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please check a patch for pcap PMD. Also please check testpmd patch. It will show you how to fix your legacy applications to support port hotplug feature. PATCH v5 changes - Add runtime check passthrough driver type, like vfio-pci, igb_uio and uio_pci_generic. This was done by Qiu, Michael. Thanks a lot. - Change function names like below. - rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke(). - rte_eal_dev_invoke() to rte_eal_vdev_invoke(). - Add code to handle a return value of rte_eal_devargs_remove(). - Fix pci address format in rte_eal_dev_detach(). - Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused. - Change function definition of rte_eal_devargs_remove(). - Fix pci_unmap_device() to check pt_driver. - Fix return value of below functions. - rte_eth_dev_get_changed_port(). - rte_eth_dev_get_port_by_addr(). - Change paramters of rte_eth_dev_validate_port() to cleanup code. - Fix pci_scan_one to handle pt_driver correctly. (Thanks to Qiu, Michael for above sugesstions) PATCH v4 changes - Merge patches to review easier. - Fix indent of 'if' statement. - Fix calculation method of eal_compare_pci_addr(). - Fix header file declaration. - Add header file to determine if hotplug can be enabled. (Thanks to Qiu, Michael) - Use braces with 'for' loop. - Add paramerter checking. - Fix sanity check code - Fix comments of rte_eth_dev_type. - Change function names. (Thanks to Iremonger, Bernard) PATCH v3 changes: - Fix enum definition used in rte_ethdev.c. (Thanks to Zhang, Helin) PATCH v2 changes: - Replace rte_eal_dev_attach_pdev(), rte_eal_dev_detach_pdev, rte_eal_dev_attach_vdev() and rte_eal_dev_detach_vdev() to rte_eal_dev_attach() and rte_eal_dev_detach(). - Add parameter values checking. - Refashion a few functions. (Thanks to Iremonger, Bernard) PATCH v1 Changes: - Fix error checking code of librte_eth APIs. - Fix issue that port from pcap PMD cannot be detached correctly. - Fix issue that testpmd could hang after forwarding, if attaching and detaching is repeatedly. - Fix if-condition of rte_eth_dev_get_port_by_addr(). (Thank
[dpdk-dev] [PATCH v5 01/13] eal_pci: Add flag to hold kernel driver type
From: Michael Qiu Currently, dpdk has no ability to know which type of driver( vfio-pci/igb_uio/uio_pci_generic) the device used. It only can check whether vfio is enabled or not staticly. It really useful to have the flag, becasue different type need to handle differently in runtime. For example, pci memory map, pot hotplug, and so on. This patch add a flag field for pci device to solve above issue. Signed-off-by: Michael Qiu Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/include/rte_pci.h | 8 + lib/librte_eal/linuxapp/eal/eal_pci.c | 53 +++-- 2 files changed, 59 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 66ed793..7b48b55 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -139,6 +139,13 @@ struct rte_pci_addr { struct rte_devargs; +enum rte_pt_driver { + RTE_PT_UNKNOWN = 0, + RTE_PT_IGB_UIO = 1, + RTE_PT_VFIO = 2, + RTE_PT_UIO_GENERIC = 3, +}; + /** * A structure describing a PCI device. */ @@ -152,6 +159,7 @@ struct rte_pci_device { uint16_t max_vfs; /**< sriov enable if not zero */ int numa_node; /**< NUMA node connection */ struct rte_devargs *devargs;/**< Device user arguments */ + enum rte_pt_driver pt_driver; /**< Driver of passthrough */ }; /** Any PCI device identifier (vendor, device, ...) */ diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index b5f5410..bd3f77d 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,6 +97,35 @@ error: return -1; } +static int +pci_get_kernel_driver_by_path(const char *filename, char *dri_name) +{ + int count; + char path[PATH_MAX]; + char *name; + + if (!filename || !dri_name) + return -1; + + count = readlink(filename, path, PATH_MAX); + if (count >= PATH_MAX) + return -1; + + /* For device does not have a driver */ + if (count < 0) + return 1; + + path[count] = '\0'; + + name = strrchr(path, '/'); + if (name) { + strncpy(dri_name, name + 1, strlen(name + 1) + 1); + return 0; + } + + return -1; +} + void * pci_find_max_end_va(void) { @@ -222,11 +251,12 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t bus, char filename[PATH_MAX]; unsigned long tmp; struct rte_pci_device *dev; + char driver[PATH_MAX]; + int ret; dev = malloc(sizeof(*dev)); - if (dev == NULL) { + if (dev == NULL) return -1; - } memset(dev, 0, sizeof(*dev)); dev->addr.domain = domain; @@ -298,6 +328,25 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t bus, return -1; } + /* parse driver */ + snprintf(filename, sizeof(filename), "%s/driver", dirname); + ret = pci_get_kernel_driver_by_path(filename, driver); + if (!ret) { + if (!strcmp(driver, "vfio-pci")) + dev->pt_driver = RTE_PT_VFIO; + else if (!strcmp(driver, "igb_uio")) + dev->pt_driver = RTE_PT_IGB_UIO; + else if (!strcmp(driver, "uio_pci_generic")) + dev->pt_driver = RTE_PT_UIO_GENERIC; + else + dev->pt_driver = RTE_PT_UNKNOWN; + } else if (ret < 0) { + RTE_LOG(ERR, EAL, "Fail to get kernel driver\n"); + free(dev); + return -1; + } else + dev->pt_driver = RTE_PT_UNKNOWN; + /* device is valid, add in list (sorted) */ if (TAILQ_EMPTY(&pci_device_list)) { TAILQ_INSERT_TAIL(&pci_device_list, dev, next); -- 1.9.1
[dpdk-dev] [PATCH v5 02/13] eal_pci: pci memory map work with driver type
From: Michael Qiu With the driver type flag in struct rte_pci_dev, we do not need to always map uio devices with vfio related function when vfio enabled. Signed-off-by: Michael Qiu Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/linuxapp/eal/eal_pci.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index bd3f77d..c0ca5a5 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -549,25 +549,29 @@ pci_config_space_set(struct rte_pci_device *dev) static int pci_map_device(struct rte_pci_device *dev) { - int ret, mapped = 0; + int ret = -1; /* try mapping the NIC resources using VFIO if it exists */ + switch (dev->pt_driver) { + case RTE_PT_VFIO: #ifdef VFIO_PRESENT - if (pci_vfio_is_enabled()) { - ret = pci_vfio_map_resource(dev); - if (ret == 0) - mapped = 1; - else if (ret < 0) - return ret; - } + if (pci_vfio_is_enabled()) + ret = pci_vfio_map_resource(dev); #endif - /* map resources for devices that use igb_uio */ - if (!mapped) { + break; + case RTE_PT_IGB_UIO: + case RTE_PT_UIO_GENERIC: + /* map resources for devices that use uio */ ret = pci_uio_map_resource(dev); - if (ret != 0) - return ret; + break; + default: + RTE_LOG(DEBUG, EAL, " Not managed by known pt driver," + " skipped\n"); + ret = 1; + break; } - return 0; + + return ret; } /* -- 1.9.1
[dpdk-dev] [PATCH v5 03/13] eal/pci, ethdev: Remove assumption that port will not be detached
To remove assumption, do like followings. This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver structure. The flags indicates the driver can detach devices at runtime. Also remove assumption that port will not be detached. To remove the assumption. - Add 'attached' member to rte_eth_dev structure. This member is used for indicating the port is attached, or not. - Add rte_eth_dev_allocate_new_port(). This function is used for allocating new port. v5: - Change paramters of rte_eth_dev_validate_port() to cleanup code. v4: - Use braces with 'for' loop. - Fix indent of 'if' statement. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/include/rte_pci.h | 2 + lib/librte_ether/rte_ethdev.c | 454 +--- lib/librte_ether/rte_ethdev.h | 5 + 3 files changed, 186 insertions(+), 275 deletions(-) diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 7b48b55..7f2d699 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -207,6 +207,8 @@ struct rte_pci_driver { #define RTE_PCI_DRV_FORCE_UNBIND 0x0004 /** Device driver supports link state interrupt */ #define RTE_PCI_DRV_INTR_LSC 0x0008 +/** Device driver supports detaching capability */ +#define RTE_PCI_DRV_DETACHABLE 0x0010 /**< Internal use only - Macro used by pci addr parsing functions **/ #define GET_PCIADDR_FIELD(in, fd, lim, dlm) \ diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ea3a1fb..d70854f 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -175,6 +175,16 @@ enum { STAT_QMAP_RX }; +enum { + DEV_INVALID = 0, + DEV_VALID, +}; + +enum { + DEV_DISCONNECTED = 0, + DEV_CONNECTED +}; + static inline void rte_eth_dev_data_alloc(void) { @@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name) { unsigned i; - for (i = 0; i < nb_ports; i++) { - if (strcmp(rte_eth_devices[i].data->name, name) == 0) + for (i = 0; i < RTE_MAX_ETHPORTS; i++) { + if ((rte_eth_devices[i].attached == DEV_CONNECTED) && + strcmp(rte_eth_devices[i].data->name, name) == 0) return &rte_eth_devices[i]; } return NULL; } +static uint8_t +rte_eth_dev_allocate_new_port(void) +{ + unsigned i; + + for (i = 0; i < RTE_MAX_ETHPORTS; i++) { + if (rte_eth_devices[i].attached == DEV_DISCONNECTED) + return i; + } + return RTE_MAX_ETHPORTS; +} + struct rte_eth_dev * rte_eth_dev_allocate(const char *name) { + uint8_t port_id; struct rte_eth_dev *eth_dev; - if (nb_ports == RTE_MAX_ETHPORTS) { + port_id = rte_eth_dev_allocate_new_port(); + if (port_id == RTE_MAX_ETHPORTS) { PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n"); return NULL; } @@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name) return NULL; } - eth_dev = &rte_eth_devices[nb_ports]; - eth_dev->data = &rte_eth_dev_data[nb_ports]; + eth_dev = &rte_eth_devices[port_id]; + eth_dev->data = &rte_eth_dev_data[port_id]; snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name); - eth_dev->data->port_id = nb_ports++; + eth_dev->data->port_id = port_id; + eth_dev->attached = DEV_CONNECTED; + nb_ports++; return eth_dev; } @@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv, (unsigned) pci_dev->id.device_id); if (rte_eal_process_type() == RTE_PROC_PRIMARY) rte_free(eth_dev->data->dev_private); + eth_dev->attached = DEV_DISCONNECTED; nb_ports--; return diag; } @@ -308,10 +336,28 @@ rte_eth_driver_register(struct eth_driver *eth_drv) rte_eal_pci_register(ð_drv->pci_drv); } +enum { + NONE_TRACE = 0, + TRACE +}; + +static int +rte_eth_dev_validate_port(uint8_t port_id, int trace) +{ + if (port_id >= RTE_MAX_ETHPORTS || + rte_eth_devices[port_id].attached != DEV_CONNECTED) { + if (trace) { + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); + } + return DEV_INVALID; + } else + return DEV_VALID; +} + int rte_eth_dev_socket_id(uint8_t port_id) { - if (port_id >= nb_ports) + if (rte_eth_dev_validate_port(port_id, NONE_TRACE) == DEV_INVALID) return -1; return rte_eth_devices[port_id].pci_dev->numa_node; } @@ -369,10 +415,8 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id) * in a multi-process setup*/ PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (port_id >= nb_ports) { - PMD_DEBUG_TRACE("Invalid por
[dpdk-dev] [PATCH v5 04/13] eal/pci: Consolidate pci address comparison APIs
This patch replaces pci_addr_comparison() and memcmp() of pci addresses by eal_compare_pci_addr(). v5: - Fix pci_scan_one to handle pt_driver correctly. v4: - Fix calculation method of eal_compare_pci_addr(). - Add parameter checking. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/bsdapp/eal/eal_pci.c | 25 --- lib/librte_eal/common/eal_common_pci.c| 2 +- lib/librte_eal/common/include/rte_pci.h | 34 +++ lib/librte_eal/linuxapp/eal/eal_pci.c | 25 --- lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 2 +- 5 files changed, 54 insertions(+), 34 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 74ecce7..c844d58 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev) return (0); } -/* Compare two PCI device addresses. */ -static int -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2) -{ - uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + (addr->devid << 8) + addr->function; - uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + (addr2->devid << 8) + addr2->function; - - if (dev_addr > dev_addr2) - return 1; - else - return 0; -} - - /* Scan one pci sysfs entry, and fill the devices list from it. */ static int pci_scan_one(int dev_pci_fd, struct pci_conf *conf) @@ -356,13 +342,20 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf) } else { struct rte_pci_device *dev2 = NULL; + int ret; TAILQ_FOREACH(dev2, &pci_device_list, next) { - if (pci_addr_comparison(&dev->addr, &dev2->addr)) + ret = eal_compare_pci_addr(&dev->addr, &dev2->addr); + if (ret > 0) continue; - else { + else if (ret < 0) { TAILQ_INSERT_BEFORE(dev2, dev, next); return 0; + } else { /* already registered */ + /* update pt_driver */ + dev2->pt_driver = dev->pt_driver; + free(dev); + return 0; } } TAILQ_INSERT_TAIL(&pci_device_list, dev, next); diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index f3c7f71..a89f5c3 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI && devargs->type != RTE_DEVTYPE_WHITELISTED_PCI) continue; - if (!memcmp(&dev->addr, &devargs->pci.addr, sizeof(dev->addr))) + if (!eal_compare_pci_addr(&dev->addr, &devargs->pci.addr)) return devargs; } return NULL; diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 7f2d699..4814cd7 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -269,6 +269,40 @@ eal_parse_pci_DomBDF(const char *input, struct rte_pci_addr *dev_addr) } #undef GET_PCIADDR_FIELD +/* Compare two PCI device addresses. */ +/** + * Utility function to compare two PCI device addresses. + * + * @param addr + * The PCI Bus-Device-Function address to compare + * @param addr2 + * The PCI Bus-Device-Function address to compare + * @return + * 0 on equal PCI address. + * Positive on addr is greater than addr2. + * Negative on addr is less than addr2, or error. + */ +static inline int +eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_addr *addr2) +{ + uint64_t dev_addr, dev_addr2; + + if ((addr == NULL) || (addr2 == NULL)) + return -1; + + dev_addr = (addr->domain << 24) | (addr->bus << 16) | + (addr->devid << 8) | addr->function; + dev_addr2 = (addr2->domain << 24) | (addr2->bus << 16) | + (addr2->devid << 8) | addr2->function; + + if (dev_addr > dev_addr2) + return 1; + else if (dev_addr < dev_addr2) + return -1; + else + return 0; +} + /** * Probe the PCI bus for registered drivers. * diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index c0ca5a5..d847102 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -229,20 +229,6 @@ error: return -1; } -/* Compa
[dpdk-dev] [PATCH v5 05/13] ethdev: Add rte_eth_dev_free to free specified device
This patch adds rte_eth_dev_free(). The function is used for changing a attached status of the device that has specified name. v4: - Add paramerter checking. Signed-off-by: Tetsuya Mukawa --- lib/librte_ether/rte_ethdev.c | 20 lib/librte_ether/rte_ethdev.h | 11 +++ 2 files changed, 31 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index d70854f..0f3094f 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -260,6 +260,26 @@ rte_eth_dev_allocate(const char *name) return eth_dev; } +struct rte_eth_dev * +rte_eth_dev_free(const char *name) +{ + struct rte_eth_dev *eth_dev; + + if (name == NULL) + return NULL; + + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) { + PMD_DEBUG_TRACE("Ethernet Device with name %s doesn't exist!\n", + name); + return NULL; + } + + eth_dev->attached = 0; + nb_ports--; + return eth_dev; +} + static int rte_eth_dev_init(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index ca101f5..6add058 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1627,6 +1627,17 @@ extern uint8_t rte_eth_dev_count(void); */ struct rte_eth_dev *rte_eth_dev_allocate(const char *name); +/** + * Function for internal use by dummy drivers primarily, e.g. ring-based + * driver. + * Free the specified ethdev and returns the pointer to that slot. + * + * @param nameUnique identifier name for each Ethernet device + * @return + * - Slot in the rte_dev_devices array for the freed device; + */ +struct rte_eth_dev *rte_eth_dev_free(const char *name); + struct eth_driver; /** * @internal -- 1.9.1
[dpdk-dev] [PATCH v5 06/13] eal, ethdev: Add a function and function pointers to close ether device
The patch adds function pointer to rte_pci_driver and eth_driver structure. These function pointers are used when ports are detached. Also the patch adds rte_eth_dev_uninit(). So far, it's not called by anywhere, but it will be called when port hotplug function is implemented. v4: - Add paramerter checking. - Change function names. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/include/rte_pci.h | 7 ++ lib/librte_ether/rte_ethdev.c | 40 + lib/librte_ether/rte_ethdev.h | 24 3 files changed, 71 insertions(+) diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 4814cd7..87ca4cf 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -189,12 +189,19 @@ struct rte_pci_driver; typedef int (pci_devinit_t)(struct rte_pci_driver *, struct rte_pci_device *); /** + * Uninitialisation function for the driver called during hotplugging. + */ +typedef int (pci_devuninit_t)( + struct rte_pci_driver *, struct rte_pci_device *); + +/** * A structure describing a PCI driver. */ struct rte_pci_driver { TAILQ_ENTRY(rte_pci_driver) next; /**< Next in list. */ const char *name; /**< Driver name. */ pci_devinit_t *devinit; /**< Device init. function. */ + pci_devuninit_t *devuninit; /**< Device uninit function. */ struct rte_pci_id *id_table;/**< ID table, NULL terminated. */ uint32_t drv_flags; /**< Flags contolling handling of device. */ }; diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 0f3094f..fd19140 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -335,6 +335,45 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv, return diag; } +static int +rte_eth_dev_uninit(struct rte_pci_driver *pci_drv, +struct rte_pci_device *pci_dev) +{ + struct eth_driver *eth_drv; + struct rte_eth_dev *eth_dev; + char ethdev_name[RTE_ETH_NAME_MAX_LEN]; + + if ((pci_drv == NULL) || (pci_dev == NULL)) + return -EINVAL; + + /* Create unique Ethernet device name using PCI address */ + snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d", + pci_dev->addr.bus, pci_dev->addr.devid, + pci_dev->addr.function); + + eth_dev = rte_eth_dev_free(ethdev_name); + if (eth_dev == NULL) + return -ENODEV; + + eth_drv = (struct eth_driver *)pci_drv; + + /* Invoke PMD device uninit function */ + if (*eth_drv->eth_dev_uninit) + (*eth_drv->eth_dev_uninit)(eth_drv, eth_dev); + + /* init user callbacks */ + TAILQ_INIT(&(eth_dev->callbacks)); + + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + rte_free(eth_dev->data->dev_private); + + eth_dev->pci_dev = NULL; + eth_dev->driver = NULL; + eth_dev->data = NULL; + + return 0; +} + /** * Register an Ethernet [Poll Mode] driver. * @@ -353,6 +392,7 @@ void rte_eth_driver_register(struct eth_driver *eth_drv) { eth_drv->pci_drv.devinit = rte_eth_dev_init; + eth_drv->pci_drv.devuninit = rte_eth_dev_uninit; rte_eal_pci_register(ð_drv->pci_drv); } diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 6add058..0b4c27c 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1675,6 +1675,27 @@ typedef int (*eth_dev_init_t)(struct eth_driver *eth_drv, /** * @internal + * Finalization function of an Ethernet driver invoked for each matching + * Ethernet PCI device detected during the PCI closing phase. + * + * @param eth_drv + * The pointer to the [matching] Ethernet driver structure supplied by + * the PMD when it registered itself. + * @param eth_dev + * The *eth_dev* pointer is the address of the *rte_eth_dev* structure + * associated with the matching device and which have been [automatically] + * allocated in the *rte_eth_devices* array. + * @return + * - 0: Success, the device is properly finalized by the driver. + *In particular, the driver MUST free the *dev_ops* pointer + *of the *eth_dev* structure. + * - <0: Error code of the device initialization failure. + */ +typedef int (*eth_dev_uninit_t)(struct eth_driver *eth_drv, + struct rte_eth_dev *eth_dev); + +/** + * @internal * The structure associated with a PMD Ethernet driver. * * Each Ethernet driver acts as a PCI driver and is represented by a generic @@ -1684,11 +1705,14 @@ typedef int (*eth_dev_init_t)(struct eth_driver *eth_drv, * * - The *eth_dev_init* function invoked for each matching PCI device. * + * - The *eth_dev_uninit* function invoked for each matching PCI devi
[dpdk-dev] [PATCH v5 10/13] eal/pci: Cleanup pci driver initialization code
- Add rte_eal_pci_close_one_dirver() The function is used for closing the specified driver and device. - Add pci_invoke_all_drivers() The function is based on pci_probe_all_drivers. But it can not only probe but also close drivers. - Add pci_close_all_drivers() The function tries to find a driver for the specified device, and then close the driver. - Add rte_eal_pci_probe_one() and rte_eal_pci_close_one() The functions are used for probe and close a device. First the function tries to find a device that has the specfied PCI address. Then, probe or close the device. v5: - Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused. v4: - Fix paramerter checking. - Fix indent of 'if' statement. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/eal_common_pci.c | 90 + lib/librte_eal/common/eal_private.h | 24 + lib/librte_eal/common/include/rte_pci.h | 33 lib/librte_eal/linuxapp/eal/eal_pci.c | 69 + 4 files changed, 206 insertions(+), 10 deletions(-) diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index a89f5c3..7c9b8c5 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -99,19 +99,27 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) return NULL; } -/* - * If vendor/device ID match, call the devinit() function of all - * registered driver for the given device. Return -1 if initialization - * failed, return 1 if no driver is found for this device. - */ static int -pci_probe_all_drivers(struct rte_pci_device *dev) +pci_invoke_all_drivers(struct rte_pci_device *dev, + enum rte_eal_invoke_type type) { struct rte_pci_driver *dr = NULL; - int rc; + int rc = 0; + + if ((dev == NULL) || (type >= RTE_EAL_INVOKE_TYPE_MAX)) + return -1; TAILQ_FOREACH(dr, &pci_driver_list, next) { - rc = rte_eal_pci_probe_one_driver(dr, dev); + switch (type) { + case RTE_EAL_INVOKE_TYPE_PROBE: + rc = rte_eal_pci_probe_one_driver(dr, dev); + break; + case RTE_EAL_INVOKE_TYPE_CLOSE: + rc = rte_eal_pci_close_one_driver(dr, dev); + break; + default: + return -1; + } if (rc < 0) /* negative value is an error */ return -1; @@ -123,6 +131,66 @@ pci_probe_all_drivers(struct rte_pci_device *dev) return 1; } +#ifdef ENABLE_HOTPLUG +static int +rte_eal_pci_invoke_one(struct rte_pci_addr *addr, + enum rte_eal_invoke_type type) +{ + struct rte_pci_device *dev = NULL; + int ret = 0; + + if ((addr == NULL) || (type >= RTE_EAL_INVOKE_TYPE_MAX)) + return -1; + + TAILQ_FOREACH(dev, &pci_device_list, next) { + if (eal_compare_pci_addr(&dev->addr, addr)) + continue; + + ret = pci_invoke_all_drivers(dev, type); + if (ret < 0) + goto invoke_err_return; + + if (type == RTE_EAL_INVOKE_TYPE_CLOSE) + goto remove_dev; + + return 0; + } + + return -1; + +invoke_err_return: + RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT + " cannot be used\n", dev->addr.domain, dev->addr.bus, + dev->addr.devid, dev->addr.function); + return -1; + +remove_dev: + TAILQ_REMOVE(&pci_device_list, dev, next); + return 0; +} + + +/* + * Find the pci device specified by pci address, then invoke probe function of + * the driver of the devive. + */ +int +rte_eal_pci_probe_one(struct rte_pci_addr *addr) +{ + return rte_eal_pci_invoke_one(addr, RTE_EAL_INVOKE_TYPE_PROBE); +} + +/* + * Find the pci device specified by pci address, then invoke close function of + * the driver of the devive. + */ +int +rte_eal_pci_close_one(struct rte_pci_addr *addr) +{ + return rte_eal_pci_invoke_one(addr, RTE_EAL_INVOKE_TYPE_CLOSE); +} +#endif /* ENABLE_HOTPLUG */ + /* * Scan the content of the PCI bus, and call the devinit() function for * all registered drivers that have a matching entry in its id_table @@ -148,10 +216,12 @@ rte_eal_pci_probe(void) /* probe all or only whitelisted devices */ if (probe_all) - ret = pci_probe_all_drivers(dev); + ret = pci_invoke_all_drivers(dev, + RTE_EAL_INVOKE_TYPE_PROBE); else if (devargs != NULL && devargs->type == RTE_DEVTYPE_WHITELISTED_PCI) - ret = pci_probe_all_drivers(dev); + ret = pci_invoke_all_drivers(dev, +
[dpdk-dev] [PATCH v5 11/13] ethdev: Add one dev_type paramerter to rte_eth_dev_allocate
This new parameter is needed to keep device type like physical or virtual. Port detaching processes are different between physical and virtual. This paramerter lets detaching function know a device type of the port. v4: - Fix comments of rte_eth_dev_type. Signed-off-by: Tetsuya Mukawa --- app/test/virtual_pmd.c | 2 +- lib/librte_ether/rte_ethdev.c| 14 -- lib/librte_ether/rte_ethdev.h| 25 - lib/librte_pmd_af_packet/rte_eth_af_packet.c | 2 +- lib/librte_pmd_bond/rte_eth_bond_api.c | 2 +- lib/librte_pmd_pcap/rte_eth_pcap.c | 2 +- lib/librte_pmd_ring/rte_eth_ring.c | 2 +- lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 2 +- 8 files changed, 42 insertions(+), 9 deletions(-) diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c index 9fac95d..8d3a5ff 100644 --- a/app/test/virtual_pmd.c +++ b/app/test/virtual_pmd.c @@ -556,7 +556,7 @@ virtual_ethdev_create(const char *name, struct ether_addr *mac_addr, goto err; /* reserve an ethdev entry */ - eth_dev = rte_eth_dev_allocate(name); + eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PHYSICAL); if (eth_dev == NULL) goto err; diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 2acafc7..0c7fbb1 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -232,7 +232,7 @@ rte_eth_dev_allocate_new_port(void) } struct rte_eth_dev * -rte_eth_dev_allocate(const char *name) +rte_eth_dev_allocate(const char *name, enum rte_eth_dev_type type) { uint8_t port_id; struct rte_eth_dev *eth_dev; @@ -256,6 +256,7 @@ rte_eth_dev_allocate(const char *name) snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name); eth_dev->data->port_id = port_id; eth_dev->attached = DEV_CONNECTED; + eth_dev->dev_type = type; nb_ports++; return eth_dev; } @@ -276,6 +277,7 @@ rte_eth_dev_free(const char *name) } eth_dev->attached = 0; + eth_dev->dev_type = RTE_ETH_DEV_UNKNOWN; nb_ports--; return eth_dev; } @@ -296,7 +298,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv, snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d", pci_dev->addr.bus, pci_dev->addr.devid, pci_dev->addr.function); - eth_dev = rte_eth_dev_allocate(ethdev_name); + eth_dev = rte_eth_dev_allocate(ethdev_name, RTE_ETH_DEV_PHYSICAL); if (eth_dev == NULL) return -ENOMEM; @@ -428,6 +430,14 @@ rte_eth_dev_count(void) return (nb_ports); } +enum rte_eth_dev_type +rte_eth_dev_get_device_type(uint8_t port_id) +{ + if (rte_eth_dev_validate_port(port_id, NONE_TRACE) == DEV_INVALID) + return -1; + return rte_eth_devices[port_id].dev_type; +} + void rte_eth_dev_save(struct rte_eth_dev *devs) { diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index ebc48b0..83a4000 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1522,6 +1522,17 @@ struct eth_dev_ops { }; /** + * The eth device type + */ +enum rte_eth_dev_type { + RTE_ETH_DEV_UNKNOWN,/**< unknown device type */ + RTE_ETH_DEV_PHYSICAL, + /**< Physical function and Virtual function devices of NIC */ + RTE_ETH_DEV_VIRTUAL,/**< non hardware device */ + RTE_ETH_DEV_MAX /**< max value of this enum */ +}; + +/** * @internal * The generic data structure associated with each ethernet device. * @@ -1540,6 +1551,7 @@ struct rte_eth_dev { struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */ struct rte_eth_dev_cb_list callbacks; /**< User application callbacks */ uint8_t attached; /**< Flag indicating the port is attached */ + enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */ }; struct rte_eth_dev_sriov { @@ -1617,6 +1629,15 @@ extern uint8_t rte_eth_dev_count(void); /** * Function for internal use by port hotplug functions. + * Get the device type to know whether the device is physical or virtual. + * @param port_id The pointer to the port id + * @return + * - Device type. + */ +extern enum rte_eth_dev_type rte_eth_dev_get_device_type(uint8_t port_id); + +/** + * Function for internal use by port hotplug functions. * Copies current ethdev structures to the specified pointer. * * @param devsThe pointer to the ethdev structures @@ -1702,10 +1723,12 @@ extern struct rte_eth_dev *rte_eth_dev_allocated(const char *name); * to that slot for the driver to use. * * @param nameUnique identifier name for each Ethernet device + * @param typeDevice type of this Ethernet device * @return * - Slot in the rte_dev_devices array for a new device; */ -struct rte_eth_dev *rte_eth_dev_allocate(const
[dpdk-dev] [PATCH v5 07/13] ethdev: Add functions that will be used by port hotplug functions
The patch adds following functions. - rte_eth_dev_save() The function is used for saving current rte_eth_dev structures. - rte_eth_dev_get_changed_port() The function receives the rte_eth_dev structures, then compare these with current values to know which port is actually attached or detached. - rte_eth_dev_get_addr_by_port() The function returns a pci address of a ethdev specified by port identifier. - rte_eth_dev_get_port_by_addr() The function returns a port identifier of a ethdev specified by pci address. - rte_eth_dev_get_name_by_port() The function returns a unique identifier name of a ethdev specified by port identifier. - Add rte_eth_dev_check_detachable() The function returns whether a PMD supports detach function. Also the patch changes scope of rte_eth_dev_allocated() to global. This function will be called by virtual PMDs to support port hotplug. So change scope of the function to global. v5: - Fix return value of below functions. rte_eth_dev_get_changed_port(). rte_eth_dev_get_port_by_addr(). v4: - Add paramerter checking. v3: - Fix if-condition bug while comparing pci addresses. - Add error checking codes. Reported-by: Mark Enright Signed-off-by: Tetsuya Mukawa --- lib/librte_ether/rte_ethdev.c | 98 ++- lib/librte_ether/rte_ethdev.h | 80 +++ 2 files changed, 177 insertions(+), 1 deletion(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index fd19140..2acafc7 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void) RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data)); } -static struct rte_eth_dev * +struct rte_eth_dev * rte_eth_dev_allocated(const char *name) { unsigned i; @@ -428,6 +428,102 @@ rte_eth_dev_count(void) return (nb_ports); } +void +rte_eth_dev_save(struct rte_eth_dev *devs) +{ + if (devs == NULL) + return; + + /* save current rte_eth_devices */ + memcpy(devs, rte_eth_devices, + sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS); +} + +int +rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id) +{ + if ((devs == NULL) || (port_id == NULL)) + return -EINVAL; + + /* check which port was attached or detached */ + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++) { + if (rte_eth_devices[*port_id].attached ^ devs->attached) + return 0; + } + return -ENODEV; +} + +int +rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr) +{ + if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID) + return -EINVAL; + + if (addr == NULL) { + PMD_DEBUG_TRACE("Null pointer is specified\n"); + return -EINVAL; + } + + *addr = rte_eth_devices[port_id].pci_dev->addr; + return 0; +} + +int +rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id) +{ + struct rte_pci_addr *tmp; + + if ((addr == NULL) || (port_id == NULL)) { + PMD_DEBUG_TRACE("Null pointer is specified\n"); + return -EINVAL; + } + + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) { + if (!rte_eth_devices[*port_id].attached) + continue; + if (!rte_eth_devices[*port_id].pci_dev) + continue; + tmp = &rte_eth_devices[*port_id].pci_dev->addr; + if (eal_compare_pci_addr(tmp, addr) == 0) + return 0; + } + return -ENODEV; +} + +int +rte_eth_dev_get_name_by_port(uint8_t port_id, char *name) +{ + char *tmp; + + if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID) + return -EINVAL; + + if (name == NULL) { + PMD_DEBUG_TRACE("Null pointer is specified\n"); + return -EINVAL; + } + + /* shouldn't check 'rte_eth_devices[i].data', +* because it might be overwritten by VDEV PMD */ + tmp = rte_eth_dev_data[port_id].name; + strncpy(name, tmp, strlen(tmp) + 1); + return 0; +} + +int +rte_eth_dev_check_detachable(uint8_t port_id) +{ + uint32_t drv_flags; + + if (port_id >= RTE_MAX_ETHPORTS) { + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); + return -EINVAL; + } + + drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags; + return !(drv_flags & RTE_PCI_DRV_DETACHABLE); +} + static int rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) { diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 0b4c27c..ebc48b0 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1616,6 +1616,86 @@ extern struct
[dpdk-dev] [PATCH v5 08/13] eal/linux/pci: Add functions for unmapping igb_uio resources
The patch adds functions for unmapping igb_uio resources. The patch is only for Linux and igb_uio environment. VFIO and BSD are not supported. v5: - Fix pci_unmap_device() to check pt_driver. v4: - Add paramerter checking. - Add header file to determine if hotplug can be enabled. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/Makefile | 1 + lib/librte_eal/common/include/rte_dev_hotplug.h | 44 + lib/librte_eal/linuxapp/eal/eal_pci.c | 44 + lib/librte_eal/linuxapp/eal/eal_pci_init.h | 8 +++ lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 65 + 5 files changed, 162 insertions(+) create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile index 52c1a5f..db7cc93 100644 --- a/lib/librte_eal/common/Makefile +++ b/lib/librte_eal/common/Makefile @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h INC += rte_hexdump.h rte_devargs.h rte_dev.h INC += rte_common_vect.h INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h +INC += rte_dev_hotplug.h ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y) INC += rte_warnings.h diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h b/lib/librte_eal/common/include/rte_dev_hotplug.h new file mode 100644 index 000..b333e0f --- /dev/null +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h @@ -0,0 +1,44 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 IGEL Co.,LTd. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of IGEL Co.,Ltd. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_DEV_HOTPLUG_H_ +#define _RTE_DEV_HOTPLUG_H_ + +/* + * determine if hotplug can be enabled on the system + */ +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP) +#define ENABLE_HOTPLUG +#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */ + +#endif /* _RTE_DEV_HOTPLUG_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index d847102..c3b7917 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -166,6 +166,25 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) return mapaddr; } +#ifdef ENABLE_HOTPLUG +/* unmap a particular resource */ +void +pci_unmap_resource(void *requested_addr, size_t size) +{ + if (requested_addr == NULL) + return; + + /* Unmap the PCI memory resource of device */ + if (munmap(requested_addr, size)) { + RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n", + __func__, requested_addr, (unsigned long)size, + strerror(errno)); + } else + RTE_LOG(DEBUG, EAL, " PCI memory mapped at %p\n", + requested_addr); +} +#endif /* ENABLE_HOTPLUG */ + /* parse the "resource" sysfs file */ #define IORESOURCE_MEM 0x0200 @@ -567,6 +586,31 @@ pci_map_device(struct rte_pci_device *dev) return ret; } +#ifdef ENABLE_HOTPLUG +static void +pci_unmap_device(struct rte_pci_device *dev) +{ + if (dev == NULL) + return; + + /* try unmapping the NIC resources using VFIO if it exists */ + switch (dev->pt_driver) { + case RTE_PT_VFIO: + RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n"); + break; + case RTE_P
[dpdk-dev] [PATCH v5 09/13] eal/pci: Add a function to remove the entry of devargs list
The function removes the specified devargs entry from devargs_list. Also the patch adds sanity checking to rte_eal_devargs_add(). v5: - Change function definition of rte_eal_devargs_remove(). v4: - Fix sanity check code. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/eal_common_devargs.c | 60 + lib/librte_eal/common/include/rte_devargs.h | 21 ++ 2 files changed, 81 insertions(+) diff --git a/lib/librte_eal/common/eal_common_devargs.c b/lib/librte_eal/common/eal_common_devargs.c index 4c7d11a..5b1ac8e 100644 --- a/lib/librte_eal/common/eal_common_devargs.c +++ b/lib/librte_eal/common/eal_common_devargs.c @@ -44,6 +44,35 @@ struct rte_devargs_list devargs_list = TAILQ_HEAD_INITIALIZER(devargs_list); + +/* find a entry specified by pci address or device name */ +static struct rte_devargs * +rte_eal_devargs_find(enum rte_devtype devtype, void *args) +{ + struct rte_devargs *devargs; + + if (args == NULL) + return NULL; + + TAILQ_FOREACH(devargs, &devargs_list, next) { + switch (devtype) { + case RTE_DEVTYPE_WHITELISTED_PCI: + case RTE_DEVTYPE_BLACKLISTED_PCI: + if (eal_compare_pci_addr(&devargs->pci.addr, args) == 0) + goto found; + break; + case RTE_DEVTYPE_VIRTUAL: + if (memcmp(&devargs->virtual.drv_name, args, + strlen((char *)args)) == 0) + goto found; + break; + } + } + return NULL; +found: + return devargs; +} + /* store a whitelist parameter for later parsing */ int rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str) @@ -87,6 +116,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str) free(devargs); return -1; } + /* make sure there is no same entry */ + if (rte_eal_devargs_find(devtype, &devargs->pci.addr)) { + RTE_LOG(ERR, EAL, + "device already registered: <%s>\n", buf); + return -1; + } break; case RTE_DEVTYPE_VIRTUAL: /* save driver name */ @@ -98,6 +133,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str) free(devargs); return -1; } + /* make sure there is no same entry */ + if (rte_eal_devargs_find(devtype, &devargs->virtual.drv_name)) { + RTE_LOG(ERR, EAL, + "device already registered: <%s>\n", buf); + return -1; + } break; } @@ -105,6 +146,25 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str) return 0; } +/* remove it from the devargs_list */ +int +rte_eal_devargs_remove(enum rte_devtype devtype, void *args) +{ + struct rte_devargs *devargs; + + if (args == NULL) + return -EINVAL; + + devargs = rte_eal_devargs_find(devtype, args); + if (devargs == NULL) { + RTE_LOG(ERR, EAL, "device not found\n"); + return -ENODEV; + } + + TAILQ_REMOVE(&devargs_list, devargs, next); + return 0; +} + /* count the number of devices of a specified type */ unsigned int rte_eal_devargs_type_count(enum rte_devtype devtype) diff --git a/lib/librte_eal/common/include/rte_devargs.h b/lib/librte_eal/common/include/rte_devargs.h index 9f9c98f..b5ad4b3 100644 --- a/lib/librte_eal/common/include/rte_devargs.h +++ b/lib/librte_eal/common/include/rte_devargs.h @@ -123,6 +123,27 @@ extern struct rte_devargs_list devargs_list; int rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str); /** + * Remove a device from the user device list + * + * For PCI devices, the format of arguments string is "PCI_ADDR". It shouldn't + * involves parameters for the device. Example: "08:00.1". + * + * For virtual devices, the format of arguments string is "DRIVER_NAME*". It + * shouldn't involves parameters for the device. Example: "eth_ring". The + * validity of the driver name is not checked by this function, it is done + * when closing the drivers. + * + * @param devtype + * The type of the device. + * @param name + * The name of the device. + * + * @return + * - 0 on success, negative on error + */ +int rte_eal_devargs_remove(enum rte_devtype devtype, void *args); + +/** * Count the number of user devices of a specified type * * @param devtype -- 1.9.1
[dpdk-dev] [PATCH v5 12/13] eal/pci: Add rte_eal_dev_attach/detach() functions
These functions are used for attaching or detaching a port. When rte_eal_dev_attach() is called, the function tries to realize the device name as pci address. If this is done successfully, rte_eal_dev_attach() will attach physical device port. If not, attaches virtual devive port. When rte_eal_dev_detach() is called, the function gets the device type of this port to know whether the port is came from physical or virtual. And then specific detaching function will be called. v5: - Change function names like below. rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke(). rte_eal_dev_invoke() to rte_eal_vdev_invoke(). - Add code to handle a return value of rte_eal_devargs_remove(). - Fix pci address format in rte_eal_dev_detach(). v4: - Fix comment. - Add error checking. - Fix indent of 'if' statement. - Change function name. Signed-off-by: Tetsuya Mukawa --- lib/librte_eal/common/eal_common_dev.c | 274 lib/librte_eal/common/eal_private.h | 11 ++ lib/librte_eal/common/include/rte_dev.h | 33 lib/librte_eal/linuxapp/eal/Makefile| 1 + lib/librte_eal/linuxapp/eal/eal_pci.c | 6 +- 5 files changed, 322 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index eae5656..e3a3f54 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -32,10 +32,13 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include +#include #include #include #include +#include #include #include #include @@ -107,3 +110,274 @@ rte_eal_dev_init(void) } return 0; } + +/* So far, DPDK hotplug function only supports linux */ +#ifdef ENABLE_HOTPLUG +static void +rte_eal_vdev_invoke(struct rte_driver *driver, + struct rte_devargs *devargs, enum rte_eal_invoke_type type) +{ + if ((driver == NULL) || (devargs == NULL)) + return; + + switch (type) { + case RTE_EAL_INVOKE_TYPE_PROBE: + driver->init(devargs->virtual.drv_name, devargs->args); + break; + case RTE_EAL_INVOKE_TYPE_CLOSE: + driver->uninit(devargs->virtual.drv_name, devargs->args); + break; + default: + break; + } +} + +static int +rte_eal_vdev_find_and_invoke(const char *name, int type) +{ + struct rte_devargs *devargs; + struct rte_driver *driver; + + if (name == NULL) + return -EINVAL; + + /* call the init function for each virtual device */ + TAILQ_FOREACH(devargs, &devargs_list, next) { + + if (devargs->type != RTE_DEVTYPE_VIRTUAL) + continue; + + if (strncmp(name, devargs->virtual.drv_name, strlen(name))) + continue; + + TAILQ_FOREACH(driver, &dev_driver_list, next) { + if (driver->type != PMD_VDEV) + continue; + + /* search a driver prefix in virtual device name */ + if (!strncmp(driver->name, devargs->virtual.drv_name, + strlen(driver->name))) { + rte_eal_vdev_invoke(driver, devargs, type); + break; + } + } + + if (driver == NULL) { + RTE_LOG(WARNING, EAL, "no driver found for %s\n", + devargs->virtual.drv_name); + } + return 0; + } + return 1; +} + +/* attach the new physical device, then store port_id of the device */ +static int +rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id) +{ + uint8_t new_port_id; + struct rte_eth_dev devs[RTE_MAX_ETHPORTS]; + + if ((addr == NULL) || (port_id == NULL)) + goto err; + + /* save current port status */ + rte_eth_dev_save(devs); + /* re-construct pci_device_list */ + if (rte_eal_pci_scan()) + goto err; + /* invoke probe func of the driver can handle the new device */ + if (rte_eal_pci_probe_one(addr)) + goto err; + /* get port_id enabled by above procedures */ + if (rte_eth_dev_get_changed_port(devs, &new_port_id)) + goto err; + + *port_id = new_port_id; + return 0; +err: + RTE_LOG(ERR, EAL, "Drver, cannot attach the device\n"); + return -1; +} + +/* detach the new physical device, then store pci_addr of the device */ +static int +rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr) +{ + struct rte_pci_addr freed_addr; + struct rte_pci_addr vp; + + if (addr == NULL) + goto err; + + /* check whether the driver supports detach feature, or not */ + if (rte_eth_dev_check_detachable(port_id)) +
[dpdk-dev] [PATCH v5 13/13] eal: Enable port hotplug framework in Linux
The patch enables CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux configuration. Signed-off-by: Tetsuya Mukawa --- config/common_linuxapp | 5 + 1 file changed, 5 insertions(+) diff --git a/config/common_linuxapp b/config/common_linuxapp index 2f9643b..27d05be 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -114,6 +114,11 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y # +# Compile Environment Abstraction Layer to support hotplug +# +CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y + +# # Compile Environment Abstraction Layer to support Vmware TSC map # CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y -- 1.9.1
[dpdk-dev] [PATCH v5] librte_pmd_pcap: Add port hotplug support
This patch adds finalization code to free resources allocated by the PMD. v4: - Change function name. Signed-off-by: Tetsuya Mukawa --- lib/librte_pmd_pcap/rte_eth_pcap.c | 40 ++ 1 file changed, 40 insertions(+) diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c b/lib/librte_pmd_pcap/rte_eth_pcap.c index af7fae8..9263eab 100644 --- a/lib/librte_pmd_pcap/rte_eth_pcap.c +++ b/lib/librte_pmd_pcap/rte_eth_pcap.c @@ -498,6 +498,13 @@ static struct eth_dev_ops ops = { .stats_reset = eth_stats_reset, }; +static struct eth_driver rte_pcap_pmd = { + .pci_drv = { + .name = "rte_pcap_pmd", + .drv_flags = RTE_PCI_DRV_DETACHABLE, + }, +}; + /* * Function handler that opens the pcap file for reading a stores a * reference of it for use it later on. @@ -713,6 +720,10 @@ rte_pmd_init_internals(const char *name, const unsigned nb_rx_queues, if (*eth_dev == NULL) goto error; + /* check length of device name */ + if ((strlen((*eth_dev)->data->name) + 1) > sizeof(data->name)) + goto error; + /* now put it all together * - store queue data in internals, * - store numa_node info in pci_driver @@ -739,10 +750,13 @@ rte_pmd_init_internals(const char *name, const unsigned nb_rx_queues, data->nb_tx_queues = (uint16_t)nb_tx_queues; data->dev_link = pmd_link; data->mac_addrs = ð_addr; + strncpy(data->name, + (*eth_dev)->data->name, strlen((*eth_dev)->data->name)); (*eth_dev)->data = data; (*eth_dev)->dev_ops = &ops; (*eth_dev)->pci_dev = pci_dev; + (*eth_dev)->driver = &rte_pcap_pmd; return 0; @@ -927,10 +941,36 @@ rte_pmd_pcap_devinit(const char *name, const char *params) } +static int +rte_pmd_pcap_devuninit(const char *name, const char *params __rte_unused) +{ + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(INFO, PMD, "Closing pcap ethdev on numa socket %u\n", + rte_socket_id()); + + if (name == NULL) + return -1; + + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) + return -1; + + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data); + rte_free(eth_dev->pci_dev); + + rte_eth_dev_free(name); + + return 0; +} + static struct rte_driver pmd_pcap_drv = { .name = "eth_pcap", .type = PMD_VDEV, .init = rte_pmd_pcap_devinit, + .uninit = rte_pmd_pcap_devuninit, }; PMD_REGISTER_DRIVER(pmd_pcap_drv); -- 1.9.1
[dpdk-dev] [PATCH v5] testpmd: Add port hotplug support
The patch introduces following commands. - port attach [ident] - port detach [port_id] - attach: attaching a port - detach: detaching a port - ident: pci address of physical device. Or device name and paramerters of virtual device. (ex. :02:00.0, eth_pcap0,iface=eth0) - port_id: port identifier v5: - Add testpmd documentation. (Thanks to Iremonger, Bernard) v4: - Fix strings of command help. Signed-off-by: Tetsuya Mukawa --- app/test-pmd/cmdline.c | 133 +++ app/test-pmd/config.c | 116 +--- app/test-pmd/parameters.c | 22 ++- app/test-pmd/testpmd.c | 199 +--- app/test-pmd/testpmd.h | 18 ++- doc/guides/testpmd_app_ug/testpmd_funcs.rst | 57 6 files changed, 415 insertions(+), 130 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 4beb404..2f813d8 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -572,6 +572,12 @@ static void cmd_help_long_parsed(void *parsed_result, "port close (port_id|all)\n" "Close all ports or port_id.\n\n" + "port attach (ident)\n" + "Attach physical or virtual dev by pci address or virtual device name\n\n" + + "port detach (port_id)\n" + "Detach physical or virtual dev by port_id\n\n" + "port config (port_id|all)" " speed (10|100|1000|1|4|auto)" " duplex (half|full|auto)\n" @@ -848,6 +854,89 @@ cmdline_parse_inst_t cmd_operate_specific_port = { }, }; +/* *** attach a specificied port *** */ +struct cmd_operate_attach_port_result { + cmdline_fixed_string_t port; + cmdline_fixed_string_t keyword; + cmdline_fixed_string_t identifier; +}; + +static void cmd_operate_attach_port_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_operate_attach_port_result *res = parsed_result; + + if (!strcmp(res->keyword, "attach")) + attach_port(res->identifier); + else + printf("Unknown parameter\n"); +} + +cmdline_parse_token_string_t cmd_operate_attach_port_port = + TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result, + port, "port"); +cmdline_parse_token_string_t cmd_operate_attach_port_keyword = + TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result, + keyword, "attach"); +cmdline_parse_token_string_t cmd_operate_attach_port_identifier = + TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result, + identifier, NULL); + +cmdline_parse_inst_t cmd_operate_attach_port = { + .f = cmd_operate_attach_port_parsed, + .data = NULL, + .help_str = "port attach identifier, " + "identifier: pci address or virtual dev name", + .tokens = { + (void *)&cmd_operate_attach_port_port, + (void *)&cmd_operate_attach_port_keyword, + (void *)&cmd_operate_attach_port_identifier, + NULL, + }, +}; + +/* *** detach a specificied port *** */ +struct cmd_operate_detach_port_result { + cmdline_fixed_string_t port; + cmdline_fixed_string_t keyword; + uint8_t port_id; +}; + +static void cmd_operate_detach_port_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_operate_detach_port_result *res = parsed_result; + + if (!strcmp(res->keyword, "detach")) + detach_port(res->port_id); + else + printf("Unknown parameter\n"); +} + +cmdline_parse_token_string_t cmd_operate_detach_port_port = + TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result, + port, "port"); +cmdline_parse_token_string_t cmd_operate_detach_port_keyword = + TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result, + keyword, "detach"); +cmdline_parse_token_num_t cmd_operate_detach_port_port_id = + TOKEN_NUM_INITIALIZER(struct cmd_operate_detach_port_result, + port_id, UINT8); + +cmdline_parse_inst_t cmd_operate_detach_port = { + .f = cmd_operate_detach_port_parsed, + .data = NULL, + .help_str = "port detach port_id", + .tokens = { + (void *)&cmd_operate_detach_port_port, + (void *)&cmd_operate_detach_port_keyword, + (void *)&cmd_operate_detach_port_port_id, + NULL, + }, +}; + /* *** configure speed for all
[dpdk-dev] [DISCUSSION] : ERROR while running vhost example in dpdk-1.8
thanks for your reply . even I face the same issue .any pointers to proceed .. ./build/app/vhost-switch -c f -n 4 -- -p 0x1 --dev-basename usvhost-1 --stats 2 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 2 on socket 0 EAL: Detected lcore 3 as core 3 on socket 0 EAL: Detected lcore 4 as core 0 on socket 0 EAL: Detected lcore 5 as core 1 on socket 0 EAL: Detected lcore 6 as core 2 on socket 0 EAL: Detected lcore 7 as core 3 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 8 lcore(s) EAL: cannot open VFIO container, error 2 (No such file or directory) EAL: VFIO support could not be initialized EAL: Setting up memory... EAL: Ask a virtual area of 0x2 bytes EAL: Virtual area found at 0x7fb6c000 (size = 0x2) EAL: Ask a virtual area of 0x1a0 bytes EAL: Virtual area found at 0x7fb8f560 (size = 0x1a0) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8f520 (size = 0x20) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8f4e0 (size = 0x20) EAL: Ask a virtual area of 0x6c0 bytes EAL: Virtual area found at 0x7fb8ee00 (size = 0x6c0) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8edc0 (size = 0x20) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8ed80 (size = 0x20) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8ed40 (size = 0x20) EAL: Ask a virtual area of 0x9e0 bytes EAL: Virtual area found at 0x7fb8e340 (size = 0x9e0) EAL: Ask a virtual area of 0x1900 bytes EAL: Virtual area found at 0x7fb8ca20 (size = 0x1900) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8c9e0 (size = 0x20) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8c9a0 (size = 0x20) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x7fb6afe0 (size = 0x1000) EAL: Ask a virtual area of 0x3c0 bytes EAL: Virtual area found at 0x7fb8c5c0 (size = 0x3c0) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fb8c580 (size = 0x20) EAL: Requesting 8 pages of size 1024MB from socket 0 EAL: Requesting 512 pages of size 2MB from socket 0 EAL: TSC frequency is ~3092840 KHz EAL: Master core 0 is ready (tid=f83c0880) PMD: ENICPMD trace: rte_enic_pmd_init EAL: Core 3 is ready (tid=c3ded700) EAL: Core 2 is ready (tid=c45ee700) EAL: Core 1 is ready (tid=c4def700) EAL: PCI device :01:00.0 on NUMA socket -1 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: PCI memory mapped at 0x7fb8f700 EAL: PCI memory mapped at 0x7fb8f710 PMD: eth_igb_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1521 EAL: PCI device :01:00.1 on NUMA socket -1 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: :01:00.1 not managed by UIO driver, skipping EAL: PCI device :03:00.0 on NUMA socket -1 EAL: probe driver: 8086:10d3 rte_em_pmd EAL: :03:00.0 not managed by UIO driver, skipping EAL: PCI device :04:00.0 on NUMA socket -1 EAL: probe driver: 8086:10d3 rte_em_pmd EAL: :04:00.0 not managed by UIO driver, skipping pf queue num: 0, configured vmdq pool num: 8, each vmdq pool has 1 queues PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f7e00 hw_ring=0x7fb8f5228580 dma_addr=0x36628580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f5d00 hw_ring=0x7fb8f5238580 dma_addr=0x36638580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f3c00 hw_ring=0x7fb8f5248580 dma_addr=0x36648580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f1b00 hw_ring=0x7fb8f5258580 dma_addr=0x36658580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60efa00 hw_ring=0x7fb8f5268580 dma_addr=0x36668580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60ed900 hw_ring=0x7fb8f5278580 dma_addr=0x36678580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60eb800 hw_ring=0x7fb8f5288580 dma_addr=0x36688580 PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60e9700 hw_ring=0x7fb8f5298580 dma_addr=0x36698580 PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e7600 hw_ring=0x7fb8f52a8580 dma_addr=0x366a8580 PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e5500 hw_ring=0x7fb8f52b8580 dma_addr=0x366b8580 PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e3400 hw_ring=0x7fb8f52c8580 dma_addr=0x366c8580 PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e1300 hw_ring=0x7fb8f52d8580 dma_a
[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms
Hey Konstantin, This method does reduce code size but lead to significant performance drop. I think we need to keep the original code. Thanks Zhihong (John) > -Original Message- > From: Ananyev, Konstantin > Sent: Thursday, January 29, 2015 11:18 PM > To: Wang, Zhihong; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > arch/x86/rte_memcpy.h for both SSE and AVX platforms > > Hi Zhihong, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang > > Sent: Thursday, January 29, 2015 2:39 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > > arch/x86/rte_memcpy.h for both SSE and AVX platforms > > > > Main code changes: > > > > 1. Differentiate architectural features based on CPU flags > > > > a. Implement separated move functions for SSE/AVX/AVX2 to make > > full utilization of cache bandwidth > > > > b. Implement separated copy flow specifically optimized for target > > architecture > > > > 2. Rewrite the memcpy function "rte_memcpy" > > > > a. Add store aligning > > > > b. Add load aligning based on architectural features > > > > c. Put block copy loop into inline move functions for better > > control of instruction order > > > > d. Eliminate unnecessary MOVs > > > > 3. Rewrite the inline move functions > > > > a. Add move functions for unaligned load cases > > > > b. Change instruction order in copy loops for better pipeline > > utilization > > > > c. Use intrinsics instead of assembly code > > > > 4. Remove slow glibc call for constant copies > > > > Signed-off-by: Zhihong Wang > > --- > > .../common/include/arch/x86/rte_memcpy.h | 680 > +++-- > > 1 file changed, 509 insertions(+), 171 deletions(-) > > > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > index fb9eba8..7b2d382 100644 > > --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > @@ -34,166 +34,189 @@ > > #ifndef _RTE_MEMCPY_X86_64_H_ > > #define _RTE_MEMCPY_X86_64_H_ > > > > +/** > > + * @file > > + * > > + * Functions for SSE/AVX/AVX2 implementation of memcpy(). > > + */ > > + > > +#include > > #include > > #include > > -#include > > +#include > > > > #ifdef __cplusplus > > extern "C" { > > #endif > > > > -#include "generic/rte_memcpy.h" > > +/** > > + * Copy bytes from one location to another. The locations must not > overlap. > > + * > > + * @note This is implemented as a macro, so it's address should not > > +be taken > > + * and care is needed as parameter expressions may be evaluated > multiple times. > > + * > > + * @param dst > > + * Pointer to the destination of the data. > > + * @param src > > + * Pointer to the source data. > > + * @param n > > + * Number of bytes to copy. > > + * @return > > + * Pointer to the destination data. > > + */ > > +static inline void * > > +rte_memcpy(void *dst, const void *src, size_t n) > > +__attribute__((always_inline)); > > > > -#ifdef __INTEL_COMPILER > > -#pragma warning(disable:593) /* Stop unused variable warning (reg_a > > etc). */ -#endif > > +#ifdef RTE_MACHINE_CPUFLAG_AVX2 > > > > +/** > > + * AVX2 implementation below > > + */ > > + > > +/** > > + * Copy 16 bytes from one location to another, > > + * locations should not overlap. > > + */ > > static inline void > > rte_mov16(uint8_t *dst, const uint8_t *src) { > > - __m128i reg_a; > > - asm volatile ( > > - "movdqu (%[src]), %[reg_a]\n\t" > > - "movdqu %[reg_a], (%[dst])\n\t" > > - : [reg_a] "=x" (reg_a) > > - : [src] "r" (src), > > - [dst] "r"(dst) > > - : "memory" > > - ); > > + __m128i xmm0; > > + > > + xmm0 = _mm_loadu_si128((const __m128i *)src); > > + _mm_storeu_si128((__m128i *)dst, xmm0); > > } > > > > +/** > > + * Copy 32 bytes from one location to another, > > + * locations should not overlap. > > + */ > > static inline void > > rte_mov32(uint8_t *dst, const uint8_t *src) { > > - __m128i reg_a, reg_b; > > - asm volatile ( > > - "movdqu (%[src]), %[reg_a]\n\t" > > - "movdqu 16(%[src]), %[reg_b]\n\t" > > - "movdqu %[reg_a], (%[dst])\n\t" > > - "movdqu %[reg_b], 16(%[dst])\n\t" > > - : [reg_a] "=x" (reg_a), > > - [reg_b] "=x" (reg_b) > > - : [src] "r" (src), > > - [dst] "r"(dst) > > - : "memory" > > - ); > > -} > > + __m256i ymm0; > > > > -static inline void > > -rte_mov48(uint8_t *dst, const uint8_t *src) -{ > > - __m128i reg_a, reg_b, reg_c; > > - asm volatile ( > > - "movdqu (%[src]), %[reg_a]\n\t" > > - "movdqu 16(%[src]), %[reg_b]\n\t" > > - "movdqu 32(%[src]), %[reg_c]\n\t" > > - "movdqu %[reg_a], (%[dst])\n\t" > > -
[dpdk-dev] [DISCUSSION] : ERROR while running vhost example in dpdk-1.8
hi, May be I am missing something regarding hugetlbfs . I performed below steps for hugetlbfs . I am running on Ubuntu 14.04.1 LTS. cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=628ff32b-dede-4b47-bd13-893c13c18d00 ro quiet splash hugepagesz=2M hugepages=512 default_hugepagesz=1G hugepagesz=1G hugepages=8 vt.handoff=7 mount -t hugetlbfs nodev /mnt/huge echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/ nr_hugepages mount -t hugetlbfs nodev /mnt/huge -o pagesize=2M thanks , Srinivas. On Fri, Jan 30, 2015 at 11:19 AM, Srinivasreddy R < srinivasreddy4390 at gmail.com> wrote: > thanks for your reply . even I face the same issue .any pointers to > proceed .. > > > ./build/app/vhost-switch -c f -n 4 -- -p 0x1 --dev-basename usvhost-1 > --stats 2 > EAL: Detected lcore 0 as core 0 on socket 0 > EAL: Detected lcore 1 as core 1 on socket 0 > EAL: Detected lcore 2 as core 2 on socket 0 > EAL: Detected lcore 3 as core 3 on socket 0 > EAL: Detected lcore 4 as core 0 on socket 0 > EAL: Detected lcore 5 as core 1 on socket 0 > EAL: Detected lcore 6 as core 2 on socket 0 > EAL: Detected lcore 7 as core 3 on socket 0 > EAL: Support maximum 128 logical core(s) by configuration. > EAL: Detected 8 lcore(s) > EAL: cannot open VFIO container, error 2 (No such file or directory) > EAL: VFIO support could not be initialized > EAL: Setting up memory... > EAL: Ask a virtual area of 0x2 bytes > EAL: Virtual area found at 0x7fb6c000 (size = 0x2) > EAL: Ask a virtual area of 0x1a0 bytes > EAL: Virtual area found at 0x7fb8f560 (size = 0x1a0) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8f520 (size = 0x20) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8f4e0 (size = 0x20) > EAL: Ask a virtual area of 0x6c0 bytes > EAL: Virtual area found at 0x7fb8ee00 (size = 0x6c0) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8edc0 (size = 0x20) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8ed80 (size = 0x20) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8ed40 (size = 0x20) > EAL: Ask a virtual area of 0x9e0 bytes > EAL: Virtual area found at 0x7fb8e340 (size = 0x9e0) > EAL: Ask a virtual area of 0x1900 bytes > EAL: Virtual area found at 0x7fb8ca20 (size = 0x1900) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8c9e0 (size = 0x20) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8c9a0 (size = 0x20) > EAL: Ask a virtual area of 0x1000 bytes > EAL: Virtual area found at 0x7fb6afe0 (size = 0x1000) > EAL: Ask a virtual area of 0x3c0 bytes > EAL: Virtual area found at 0x7fb8c5c0 (size = 0x3c0) > EAL: Ask a virtual area of 0x20 bytes > EAL: Virtual area found at 0x7fb8c580 (size = 0x20) > EAL: Requesting 8 pages of size 1024MB from socket 0 > EAL: Requesting 512 pages of size 2MB from socket 0 > EAL: TSC frequency is ~3092840 KHz > EAL: Master core 0 is ready (tid=f83c0880) > PMD: ENICPMD trace: rte_enic_pmd_init > EAL: Core 3 is ready (tid=c3ded700) > EAL: Core 2 is ready (tid=c45ee700) > EAL: Core 1 is ready (tid=c4def700) > EAL: PCI device :01:00.0 on NUMA socket -1 > EAL: probe driver: 8086:1521 rte_igb_pmd > EAL: PCI memory mapped at 0x7fb8f700 > EAL: PCI memory mapped at 0x7fb8f710 > PMD: eth_igb_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1521 > EAL: PCI device :01:00.1 on NUMA socket -1 > EAL: probe driver: 8086:1521 rte_igb_pmd > EAL: :01:00.1 not managed by UIO driver, skipping > EAL: PCI device :03:00.0 on NUMA socket -1 > EAL: probe driver: 8086:10d3 rte_em_pmd > EAL: :03:00.0 not managed by UIO driver, skipping > EAL: PCI device :04:00.0 on NUMA socket -1 > EAL: probe driver: 8086:10d3 rte_em_pmd > EAL: :04:00.0 not managed by UIO driver, skipping > pf queue num: 0, configured vmdq pool num: 8, each vmdq pool has 1 queues > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f7e00 > hw_ring=0x7fb8f5228580 dma_addr=0x36628580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f5d00 > hw_ring=0x7fb8f5238580 dma_addr=0x36638580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f3c00 > hw_ring=0x7fb8f5248580 dma_addr=0x36648580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f1b00 > hw_ring=0x7fb8f5258580 dma_addr=0x36658580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60efa00 > hw_ring=0x7fb8f5268580 dma_addr=0x36668580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60ed900 > hw_ring=0x7fb8f5278580 dma_addr=0x36678580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60eb800 > hw_ring=0x7fb8f5288580 dma_addr=0x36688580 > PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60e9700 > hw_ring=0x7fb8f5298580 dma_addr=0x36698580 > PMD: eth_igb_tx_queue_setup(): To improve 1G driver perfo
[dpdk-dev] [PATCH 04/17] ixgbe: support of unified packet type
Hi Bruce > -Original Message- > From: Richardson, Bruce > Sent: Friday, January 30, 2015 7:30 AM > To: Zhang, Helin > Cc: dev at dpdk.org; Cao, Waterman; Liang, Cunming; Liu, Jijiang; Ananyev, > Konstantin > Subject: Re: [PATCH 04/17] ixgbe: support of unified packet type > > On Thu, Jan 29, 2015 at 11:15:52AM +0800, Helin Zhang wrote: > > To unify packet types among all PMDs, bit masks of packet type for > > ol_flags are replaced by unified packet type for Vector PMD. > > > > Two suggestions on the commit log: > 1. Can you add scalar and vector into the titles to make it clear how this > patch > and the previous ones differ 2. Can you add a note calling out performance > impacts for this patch. If no performance impacts, then please note that for > reviewers. OK. That will be in the v2 patches. Thanks for the good comments! Regards, Helin > > /Bruce > > > Signed-off-by: Cunming Liang > > Signed-off-by: Helin Zhang > > --- > > lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 39 > > +++ > > 1 file changed, 21 insertions(+), 18 deletions(-) > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > index b54cb19..b3cf7dd 100644 > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > @@ -134,44 +134,35 @@ ixgbe_rxq_rearm(struct igb_rx_queue *rxq) > > */ > > #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE > > > > -#define OLFLAGS_MASK ((uint16_t)(PKT_RX_VLAN_PKT | > PKT_RX_IPV4_HDR |\ > > -PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\ > > -PKT_RX_IPV6_HDR_EXT)) > > -#define OLFLAGS_MASK_V (((uint64_t)OLFLAGS_MASK << 48) | \ > > - ((uint64_t)OLFLAGS_MASK << 32) | \ > > - ((uint64_t)OLFLAGS_MASK << 16) | \ > > - ((uint64_t)OLFLAGS_MASK)) > > -#define PTYPE_SHIFT(1) > > +#define OLFLAGS_MASK_V (((uint64_t)PKT_RX_VLAN_PKT << 48) | \ > > + ((uint64_t)PKT_RX_VLAN_PKT << 32) | \ > > + ((uint64_t)PKT_RX_VLAN_PKT << 16) | \ > > + ((uint64_t)PKT_RX_VLAN_PKT)) > > #define VTAG_SHIFT (3) > > > > static inline void > > desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts) { > > - __m128i ptype0, ptype1, vtag0, vtag1; > > + __m128i vtag0, vtag1; > > union { > > uint16_t e[4]; > > uint64_t dword; > > } vol; > > > > - ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]); > > - ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]); > > vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]); > > vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]); > > > > - ptype1 = _mm_unpacklo_epi32(ptype0, ptype1); > > vtag1 = _mm_unpacklo_epi32(vtag0, vtag1); > > - > > - ptype1 = _mm_slli_epi16(ptype1, PTYPE_SHIFT); > > vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT); > > > > - ptype1 = _mm_or_si128(ptype1, vtag1); > > - vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V; > > + vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V; > > > > rx_pkts[0]->ol_flags = vol.e[0]; > > rx_pkts[1]->ol_flags = vol.e[1]; > > rx_pkts[2]->ol_flags = vol.e[2]; > > rx_pkts[3]->ol_flags = vol.e[3]; > > } > > + > > #else > > #define desc_to_olflags_v(desc, rx_pkts) do {} while (0) #endif @@ > > -204,6 +195,8 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct > rte_mbuf **rx_pkts, > > 0/* ignore pkt_type field */ > > ); > > __m128i dd_check, eop_check; > > + __m128i desc_mask = _mm_set_epi32(0x, 0x, > > + 0x, 0x07F0); > > > > if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST)) > > return 0; > > @@ -239,7 +232,8 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, > struct rte_mbuf **rx_pkts, > > 0xFF, 0xFF, /* skip high 16 bits pkt_len, zero out */ > > 13, 12, /* octet 12~13, low 16 bits pkt_len */ > > 13, 12, /* octet 12~13, 16 bits data_len */ > > - 0xFF, 0xFF /* skip pkt_type field */ > > + 1, /* octet 1, 8 bits pkt_type field */ > > + 0/* octet 0, 4 bits offset 4 pkt_type field */ > > ); > > > > /* Cache is empty -> need to scan the buffer rings, but first move > > @@ -248,6 +242,7 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, > > struct rte_mbuf **rx_pkts, > > > > /* > > * A. load 4 packet in one loop > > +* [A*. mask out 4 unused dirty field in desc] > > * B. copy 4 mbuf point from swring to rx_pkts > > * C. calc the number of DD bits among the 4 packets > > * [C*. extract the end-of-packet bit, if requested] @@ -289,6 > > +284,14 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct rte_mbuf > **rx_pkts, > > /* B.2 copy 2 mbuf point into rx_pkts */ > > _mm_st
[dpdk-dev] Increasing MAX_RX_QUEUE_PER_LCORE value
I don't see any example dpdk apps using a value greater than 16 for MAX_RX_QUEUE_PER_LCORE. Is there any specific reason (say performance) why this is set to 16? Thanks, Srini
[dpdk-dev] [PATCH 00/12] qemu vhost-user support
vhost-user supports passing vring information to a seperate vhost enabled process, normally a user space vSwitch, through unix domain socket. In previous DPDK version, we implement a user space character device driver vhost-cuse in user space DPDK process. vring informations are passed to the driver through ioctl call, including eventfds for interrupt injection and host notification. We need to develop a kernel module to copy that fd from qemu into our process. We also need some trick to map guest memory. (TODO: kickfd/callfd is reversed which causes confusion) known issue in vhost-user implementation in QEMU, reported by haifeng.lin at huawei.com * QEMU doesn't send correct memory region information with multiple numa node configuration http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when receive -1 as fd on Ubuntu 14.04". Huawei Xie (12): enable VIRTIO_NET_F_CTRL_RX create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse rename vhost-net-cdev.h to vhost-net.h move fd copying(from qemu process into vhost process) to eventfd_copy.c copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c make host_memory_map more generic split set_memory_table into two parts add select based event driven processing free memory when receive new set_memory_table message vhost user support support dev->ifname in vhost-user support calling rte_vhost_driver_register after rte_vhost_driver_session_start lib/librte_vhost/Makefile | 8 +- lib/librte_vhost/rte_virtio_net.h | 5 +- lib/librte_vhost/vhost-net-cdev.c | 389 - lib/librte_vhost/vhost-net-cdev.h | 113 --- lib/librte_vhost/vhost-net.h | 121 +++ lib/librte_vhost/vhost_cuse/eventfd_copy.c| 89 + lib/librte_vhost/vhost_cuse/eventfd_copy.h| 40 +++ lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 417 +++ lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 401 ++ lib/librte_vhost/vhost_cuse/virtio-net-cdev.h | 48 +++ lib/librte_vhost/vhost_rxtx.c | 2 +- lib/librte_vhost/vhost_user/fd_man.c | 234 + lib/librte_vhost/vhost_user/fd_man.h | 66 lib/librte_vhost/vhost_user/vhost-net-user.c | 469 ++ lib/librte_vhost/vhost_user/vhost-net-user.h | 106 ++ lib/librte_vhost/vhost_user/virtio-net-user.c | 322 ++ lib/librte_vhost/vhost_user/virtio-net-user.h | 49 +++ lib/librte_vhost/virtio-net.c | 455 +++-- lib/librte_vhost/virtio-net.h | 43 +++ 19 files changed, 2460 insertions(+), 917 deletions(-) delete mode 100644 lib/librte_vhost/vhost-net-cdev.c delete mode 100644 lib/librte_vhost/vhost-net-cdev.h create mode 100644 lib/librte_vhost/vhost-net.h create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h create mode 100644 lib/librte_vhost/vhost_user/fd_man.c create mode 100644 lib/librte_vhost/vhost_user/fd_man.h create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h create mode 100644 lib/librte_vhost/virtio-net.h -- 1.8.1.4
[dpdk-dev] [PATCH 01/12] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX
VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled. In virtnet_send_command: /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) || (out + in > VIRTNET_SEND_COMMAND_SG_MAX)); Signed-off-by: Huawei Xie --- lib/librte_vhost/virtio-net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index b041849..52b4957 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root; /* Features supported by this lib. */ #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ - (1ULL << VIRTIO_NET_F_CTRL_RX)) + (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ + (1ULL << VIRTIO_NET_F_CTRL_RX)) static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; /* Line size for reading maps file. */ -- 1.8.1.4
[dpdk-dev] [PATCH 02/12] lib/librte_vhost: seperate vhost cuse driver from vhost common logic
create vhost_cuse directory move vhost-net-cdev.c into vhost_cuse directory vhost-cuse driver will be splitted into two parts: cuse driver specific message handling and common vhost message handling logic. cuse driver specific message handling is in vhost_cuse directory. vhost ioctl message is pre-processed there and then sent to virtio-net(virtio-net.c) module if necessary. Some message handling is terminated in vhost-cuse or vhost-user. virtio-net.c provides common message handling for both vhost-cuse and vhost-user. Signed-off-by: Huawei Xie --- lib/librte_vhost/Makefile| 4 +- lib/librte_vhost/vhost-net-cdev.c| 389 --- lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 389 +++ 3 files changed, 391 insertions(+), 391 deletions(-) delete mode 100644 lib/librte_vhost/vhost-net-cdev.c create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index c008d64..0b2f08f 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -34,10 +34,10 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_vhost.a -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse LDFLAGS += -lfuse # all source are stored in SRCS-y -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c virtio-net.c vhost_rxtx.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c deleted file mode 100644 index 57c76cb..000 --- a/lib/librte_vhost/vhost-net-cdev.c +++ /dev/null @@ -1,389 +0,0 @@ -/*- - * BSD LICENSE - * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * * Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * * Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in - * the documentation and/or other materials provided with the - * distribution. - * * Neither the name of Intel Corporation nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -#include "vhost-net-cdev.h" - -#define FUSE_OPT_DUMMY "\0\0" -#define FUSE_OPT_FORE "-f\0\0" -#define FUSE_OPT_NOMULTI "-s\0\0" - -static const uint32_t default_major = 231; -static const uint32_t default_minor = 1; -static const char cuse_device_name[] = "/dev/cuse"; -static const char default_cdev[] = "vhost-net"; - -static struct fuse_session *session; -static struct vhost_net_device_ops const *ops; - -/* - * Returns vhost_device_ctx from given fuse_req_t. The index is populated later - * when the device is added to the device linked list. - */ -static struct vhost_device_ctx -fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi) -{ - struct vhost_device_ctx ctx; - struct fuse_ctx const *const req_ctx = fuse_req_ctx(req); - - ctx.pid = req_ctx->pid; - ctx.fh = fi->fh; - - return ctx; -} - -/* - * When the device is created in QEMU it gets initialised here and - * added to the device linked list. - */ -static void -vhost_net_open(fuse_req_t req, struct fuse_file_info *fi) -{ - struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); - int err = 0; - - err = ops->new_device(ctx); - if (err == -1) { - fuse_reply_err(req, EPERM); - return; -
[dpdk-dev] [PATCH 03/12] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h
This file defines common operations provided by virtio-net(.c). Signed-off-by: Huawei Xie --- lib/librte_vhost/vhost-net-cdev.h| 113 --- lib/librte_vhost/vhost-net.h | 113 +++ lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 2 +- lib/librte_vhost/vhost_rxtx.c| 2 +- lib/librte_vhost/virtio-net.c| 2 +- 5 files changed, 116 insertions(+), 116 deletions(-) delete mode 100644 lib/librte_vhost/vhost-net-cdev.h create mode 100644 lib/librte_vhost/vhost-net.h diff --git a/lib/librte_vhost/vhost-net-cdev.h b/lib/librte_vhost/vhost-net-cdev.h deleted file mode 100644 index 03a5c57..000 --- a/lib/librte_vhost/vhost-net-cdev.h +++ /dev/null @@ -1,113 +0,0 @@ -/*- - * BSD LICENSE - * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * * Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * * Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in - * the documentation and/or other materials provided with the - * distribution. - * * Neither the name of Intel Corporation nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -#ifndef _VHOST_NET_CDEV_H_ -#define _VHOST_NET_CDEV_H_ -#include -#include -#include -#include -#include - -#include - -/* Macros for printing using RTE_LOG */ -#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1 -#define RTE_LOGTYPE_VHOST_DATA RTE_LOGTYPE_USER1 - -#ifdef RTE_LIBRTE_VHOST_DEBUG -#define VHOST_MAX_PRINT_BUFF 6072 -#define LOG_LEVEL RTE_LOG_DEBUG -#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args) -#define PRINT_PACKET(device, addr, size, header) do { \ - char *pkt_addr = (char *)(addr); \ - unsigned int index; \ - char packet[VHOST_MAX_PRINT_BUFF]; \ - \ - if ((header)) \ - snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \ - else \ - snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \ - for (index = 0; index < (size); index++) { \ - snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \ - "%02hhx ", pkt_addr[index]); \ - } \ - snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \ - \ - LOG_DEBUG(VHOST_DATA, "%s", packet); \ -} while (0) -#else -#define LOG_LEVEL RTE_LOG_INFO -#define LOG_DEBUG(log_type, fmt, args...) do {} while (0) -#define PRINT_PACKET(device, addr, size, header) do {} while (0) -#endif - - -/* - * Structure used to identify device context. - */ -struct vhost_device_ctx { - pid_t pid;/* PID of process calling the IOCTL. */ - uint64_tfh; /* Populated with fi->fh to track the device index. */ -}; - -/* - * Structure contains function pointers to be defined in virtio-net.c. These - * functions are called in CUSE context and are used to configure devices. - */ -struct vhost_net_device_ops { - int (*new_device)(struct vhost_device_ctx); - void (*destroy_device)(struct vhost_device_ctx); - - int (*get_features)(struct vhost_device_ctx, uint64_t *); - int (*set_features)(struct vhost_device_ctx, uint64_t *); - - int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t); - - int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *); - int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr
[dpdk-dev] [PATCH 05/12] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
Signed-off-by: Huawei Xie --- lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 258 ++ 1 file changed, 258 insertions(+) create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c new file mode 100644 index 000..fbfc403 --- /dev/null +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c @@ -0,0 +1,258 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "vhost-net.h" + +/* Line size for reading maps file. */ +static const uint32_t BUFSIZE = PATH_MAX; + +/* Size of prot char array in procmap. */ +#define PROT_SZ 5 + +/* Number of elements in procmap struct. */ +#define PROCMAP_SZ 8 + +/* Structure containing information gathered from maps file. */ +struct procmap { + uint64_t va_start; /* Start virtual address in file. */ + uint64_t len; /* Size of file. */ + uint64_t pgoff; /* Not used. */ + uint32_t maj; /* Not used. */ + uint32_t min; /* Not used. */ + uint32_t ino; /* Not used. */ + char prot[PROT_SZ]; /* Not used. */ + char fname[PATH_MAX]; /* File name. */ +}; + +/* + * Locate the file containing QEMU's memory space and + * map it to our address space. + */ +static int +host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, + pid_t pid, uint64_t addr) +{ + struct dirent *dptr = NULL; + struct procmap procmap; + DIR *dp = NULL; + int fd; + int i; + char memfile[PATH_MAX]; + char mapfile[PATH_MAX]; + char procdir[PATH_MAX]; + char resolved_path[PATH_MAX]; + char *path = NULL; + FILE *fmap; + void *map; + uint8_t found = 0; + char line[BUFSIZE]; + char dlm[] = "- : "; + char *str, *sp, *in[PROCMAP_SZ]; + char *end = NULL; + + /* Path where mem files are located. */ + snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid); + /* Maps file used to locate mem file. */ + snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid); + + fmap = fopen(mapfile, "r"); + if (fmap == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, + "(%"PRIu64") Failed to open maps file for pid %d\n", + dev->device_fh, pid); + return -1; + } + + /* Read through maps file until we find out base_address. */ + while (fgets(line, BUFSIZE, fmap) != 0) { + str = line; + errno = 0; + /* Split line into fields. */ + for (i = 0; i < PROCMAP_SZ; i++) { + in[i] = strtok_r(str, &dlm[i], &sp); + if ((in[i] == NULL) || (errno != 0)) { + fclose(fmap); + return -1; + } + str = NULL; + } + + /* Convert/Copy each field as needed. */ + procmap.va_start = strtoull(in[0], &end, 16); + if ((in[0] == '\0') || (end == NULL) || (*
[dpdk-dev] [PATCH 06/12] lib/librte_vhost: make host_memory_map more generic
This functions accepts a virtual address and pid(qemu), and maps it into current process(vhost)'s address space. The memory behind the virtual address should be backed by a file(normally a hugepage file), and virtual address should be the starting address. Signed-off-by: Huawei Xie --- lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 43 +-- 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c index fbfc403..58ac3dd 100644 --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c @@ -75,8 +75,8 @@ struct procmap { * map it to our address space. */ static int -host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, - pid_t pid, uint64_t addr) +host_memory_map(pid_t pid, uint64_t addr, + uint64_t *mapped_address, uint64_t *mapped_size) { struct dirent *dptr = NULL; struct procmap procmap; @@ -104,8 +104,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, fmap = fopen(mapfile, "r"); if (fmap == NULL) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to open maps file for pid %d\n", - dev->device_fh, pid); + "Failed to open maps file for pid %d\n", + pid); return -1; } @@ -179,8 +179,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, if (!found) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to find memory file in pid %d maps file\n", - dev->device_fh, pid); + "Failed to find memory file in pid %d maps file\n", + pid); return -1; } @@ -188,8 +188,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, dp = opendir(procdir); if (dp == NULL) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Cannot open pid %d process directory\n", - dev->device_fh, pid); + "Cannot open pid %d process directory\n", + pid); return -1; } @@ -202,8 +202,7 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, path = realpath(memfile, resolved_path); if ((path == NULL) && (strlen(resolved_path) == 0)) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to resolve fd directory\n", - dev->device_fh); + "Failed to resolve fd directory\n"); closedir(dp); return -1; } @@ -218,8 +217,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, if (found == 0) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to find memory file for pid %d\n", - dev->device_fh, pid); + "Failed to find memory file for pid %d\n", + pid); return -1; } /* Open the shared memory file and map the memory into this process. */ @@ -227,32 +226,30 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, if (fd == -1) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to open %s for pid %d\n", - dev->device_fh, memfile, pid); + "Failed to open %s for pid %d\n", + memfile, pid); return -1; } map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE, - MAP_POPULATE|MAP_SHARED, fd, 0); + MAP_POPULATE|MAP_SHARED, fd, 0); close(fd); if (map == MAP_FAILED) { RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Error mapping the file %s for pid %d\n", - dev->device_fh, memfile, pid); + "Error mapping the file %s for pid %d\n", + memfile, pid); return -1; } /* Store the memory address and size in the device data structure */ - mem->mapped_address = (uint64_t)(uintptr_t)map; - mem->mapped_size = procmap.len; + *mapped_address = (uint64_t)(uintptr_t)map; + *mapped_size = procmap.len; LOG_DEBUG(VHOST_CONFIG, - "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n", - dev->device_fh, + "Mem File: %s->%s - Size: %llu - VA: %p\n", memfile, resolved_path, - (unsigned long long)mem->mapped_size, map); + (unsigned long long)*mapped_size, map); return 0;
[dpdk-dev] [PATCH 04/12] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c
eventfd copy is vhost-cuse specific. vhost-user doesn't need eventfd kernel module to copy fds between processes. fd is automatically installed into target process through unix domain socket. Signed-off-by: Huawei Xie --- lib/librte_vhost/Makefile| 2 +- lib/librte_vhost/vhost_cuse/eventfd_copy.c | 89 lib/librte_vhost/vhost_cuse/eventfd_copy.h | 40 + lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 41 + lib/librte_vhost/virtio-net.c| 57 +- 5 files changed, 163 insertions(+), 66 deletions(-) create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 0b2f08f..6fe7471 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -37,7 +37,7 @@ LIB = librte_vhost.a CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse LDFLAGS += -lfuse # all source are stored in SRCS-y -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c virtio-net.c vhost_rxtx.c +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h diff --git a/lib/librte_vhost/vhost_cuse/eventfd_copy.c b/lib/librte_vhost/vhost_cuse/eventfd_copy.c new file mode 100644 index 000..f2ed04e --- /dev/null +++ b/lib/librte_vhost/vhost_cuse/eventfd_copy.c @@ -0,0 +1,89 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include + +#include + +#include "eventfd_link/eventfd_link.h" +#include "eventfd_copy.h" +#include "vhost-net.h" + +static const char eventfd_cdev[] = "/dev/eventfd-link"; + +/* + * This function uses the eventfd_link kernel module to copy an eventfd file + * descriptor provided by QEMU in to our process space. + */ +int +eventfd_copy(int target_fd, int target_pid) +{ + int eventfd_link, ret; + struct eventfd_copy eventfd_copy; + int fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + + if (fd == -1) + return -1; + + /* Open the character device to the kernel module. */ + /* TODO: check this earlier rather than fail until VM boots! */ + eventfd_link = open(eventfd_cdev, O_RDWR); + if (eventfd_link < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "eventfd_link module is not loaded\n"); + close(fd); + return -1; + } + + eventfd_copy.source_fd = fd; + eventfd_copy.target_fd = target_fd; + eventfd_copy.target_pid = target_pid; + /* Call the IOCTL to copy the eventfd. */ + ret = ioctl(eventfd_link, EVENTFD_COPY, &eventfd_copy); + close(eventfd_link); + + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "EVENTFD_COPY ioctl failed\n"); + close(fd); + return -1; + } + + return fd; +} + diff --git a/lib/librte_vhost/vhost_cuse/eventfd_copy.h b/lib/librte_vhost/vhost_cuse/eventfd_copy.h new file mode 100644 index 000..5f7307c --- /dev/null +++ b/lib/librte_vhost/vho
[dpdk-dev] [PATCH 08/12] lib/librte_vhost: add select based event driven processing
for more generic event driven processing, refer: http://libevent.org/ Signed-off-by: Huawei Xie --- lib/librte_vhost/vhost_user/fd_man.c | 207 +++ lib/librte_vhost/vhost_user/fd_man.h | 64 +++ 2 files changed, 271 insertions(+) create mode 100644 lib/librte_vhost/vhost_user/fd_man.c create mode 100644 lib/librte_vhost/vhost_user/fd_man.h diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c new file mode 100644 index 000..929fbc3 --- /dev/null +++ b/lib/librte_vhost/vhost_user/fd_man.c @@ -0,0 +1,207 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "fd_man.h" + +/** + * Returns the index in the fdset for a given fd. + * If fd is -1, it means to search for a free entry. + * @return + * index for the fd, or -1 if fd isn't in the fdset. + */ +static int +fdset_find_fd(struct fdset *pfdset, int fd) +{ + int i; + + if (pfdset == NULL) + return -1; + + for (i = 0; i < MAX_FDS && pfdset->fd[i].fd != fd; i++) + ; + + return i == MAX_FDS ? -1 : i; +} + +static int +fdset_find_free_slot(struct fdset *pfdset) +{ + return fdset_find_fd(pfdset, -1); +} + +static void +fdset_add_fd(struct fdset *pfdset, int idx, int fd, + fd_cb rcb, fd_cb wcb, void *dat) +{ + struct fdentry *pfdentry; + + if (pfdset == NULL || idx >= MAX_FDS) + return; + + pfdentry = &pfdset->fd[idx]; + pfdentry->fd = fd; + pfdentry->rcb = rcb; + pfdentry->wcb = wcb; + pfdentry->dat = dat; +} + +/** + * Fill the read/write fd_set with the fds in the fdset. + * @return + * the maximum fds filled in the read/write fd_set. + */ +static int +fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset) +{ + struct fdentry *pfdentry; + int i, maxfds = -1; + int num = MAX_FDS; + + if (pfdset == NULL) + return -1; + + for (i = 0; i < num; i++) { + pfdentry = &pfdset->fd[i]; + if (pfdentry->fd != -1) { + int added = 0; + if (pfdentry->rcb && rfset) { + FD_SET(pfdentry->fd, rfset); + added = 1; + } + if (pfdentry->wcb && wfset) { + FD_SET(pfdentry->fd, wfset); + added = 1; + } + if (added) + maxfds = pfdentry->fd < maxfds ? + maxfds : pfdentry->fd; + } + } + return maxfds; +} + +void +fdset_init(struct fdset *pfdset) +{ + int i; + + if (pfdset == NULL) + return; + + for (i = 0; i < MAX_FDS; i++) + pfdset->fd[i].fd = -1; + pfdset->num = 0; +} + +/** + * Register the fd in the fdset with read/write handler and context. + */ +int +fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) +{ + int i; + + if (pfdset == NULL || fd == -1) + return -1; + + /* Find a free slot in the l
[dpdk-dev] [PATCH 07/12] lib/librte_vhost: split set_memory_table into two parts
set_memory_table message is processed in two places. * cuse_set_memory_table in virtio-net-cdev.c * set_memory_table in virtio-net.c vhost-cuse or vhost-user receives set_memory_region message from qemu, maps guest memory into current process, prepares valid memory regions, and then passes valid regions to set_memory_table ops provided by virtio-net. Signed-off-by: Huawei Xie --- lib/librte_vhost/Makefile | 2 +- lib/librte_vhost/vhost-net.h | 5 +- lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 7 +- lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 85 +++ lib/librte_vhost/vhost_cuse/virtio-net-cdev.h | 45 lib/librte_vhost/virtio-net.c | 306 +- 6 files changed, 145 insertions(+), 305 deletions(-) create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 6fe7471..92ab9a6 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -37,7 +37,7 @@ LIB = librte_vhost.a CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse LDFLAGS += -lfuse # all source are stored in SRCS-y -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h index 03a5c57..11737cc 100644 --- a/lib/librte_vhost/vhost-net.h +++ b/lib/librte_vhost/vhost-net.h @@ -41,6 +41,8 @@ #include +#define VHOST_MEMORY_MAX_NREGIONS 8 + /* Macros for printing using RTE_LOG */ #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1 #define RTE_LOGTYPE_VHOST_DATA RTE_LOGTYPE_USER1 @@ -92,7 +94,8 @@ struct vhost_net_device_ops { int (*get_features)(struct vhost_device_ctx, uint64_t *); int (*set_features)(struct vhost_device_ctx, uint64_t *); - int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t); + int (*set_mem_table)(struct vhost_device_ctx, + const struct virtio_memory_regions *, uint32_t nregions); int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *); int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *); diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c index e7794b0..72609a3 100644 --- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c @@ -44,6 +44,7 @@ #include #include +#include "virtio-net-cdev.h" #include "vhost-net.h" #include "eventfd_copy.h" @@ -57,7 +58,7 @@ static const char cuse_device_name[] = "/dev/cuse"; static const char default_cdev[] = "vhost-net"; static struct fuse_session *session; -static struct vhost_net_device_ops const *ops; +struct vhost_net_device_ops const *ops; /* * Returns vhost_device_ctx from given fuse_req_t. The index is populated later @@ -247,8 +248,8 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg, break; default: - result = ops->set_mem_table(ctx, - in_buf, mem_temp.nregions); + result = cuse_set_mem_table(ctx, in_buf, + mem_temp.nregions); if (result) fuse_reply_err(req, EINVAL); else diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c index 58ac3dd..edcbc10 100644 --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c @@ -47,7 +47,11 @@ #include +#include "rte_virtio_net.h" #include "vhost-net.h" +#include "virtio-net-cdev.h" + +extern struct vhost_net_device_ops const *ops; /* Line size for reading maps file. */ static const uint32_t BUFSIZE = PATH_MAX; @@ -253,3 +257,84 @@ host_memory_map(pid_t pid, uint64_t addr, return 0; } + +int +cuse_set_mem_table(struct vhost_device_ctx ctx, + const struct vhost_memory *mem_regions_addr, uint32_t nregions) +{ + uint64_t size = offsetof(struct vhost_memory, regions); + uint32_t idx, valid_regions; + struct virtio_memory_regions regions[VHOST_MEMORY_MAX_NREGIONS]; + struct vhost_memory_region *mem_regions = (void *)(uintptr_t) + ((uint64_t)(uintptr_t)mem_regions_addr + size); + uint64_t base_address = 0, mapped_address, mapped_size; + + for (idx = 0; idx < nregions; idx++) { + regions[idx].guest_phys_address = + mem_regions[idx].guest_phys_addr; + regions[idx].guest_phys_address_end = +
[dpdk-dev] [PATCH 09/12] lib/librte_vhost: free memory when receive new set_memory_table message in vhost-cuse
Signed-off-by: Huawei Xie --- lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 12 ++-- lib/librte_vhost/virtio-net.h | 43 +++ 2 files changed, 53 insertions(+), 2 deletions(-) create mode 100644 lib/librte_vhost/virtio-net.h diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c index edcbc10..1d2c403 100644 --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c @@ -50,8 +50,7 @@ #include "rte_virtio_net.h" #include "vhost-net.h" #include "virtio-net-cdev.h" - -extern struct vhost_net_device_ops const *ops; +#include "virtio-net.h" /* Line size for reading maps file. */ static const uint32_t BUFSIZE = PATH_MAX; @@ -268,6 +267,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, struct vhost_memory_region *mem_regions = (void *)(uintptr_t) ((uint64_t)(uintptr_t)mem_regions_addr + size); uint64_t base_address = 0, mapped_address, mapped_size; + struct virtio_net *dev; for (idx = 0; idx < nregions; idx++) { regions[idx].guest_phys_address = @@ -335,6 +335,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, regions[idx].guest_phys_address; } + dev = get_device(ctx); + if (dev && dev->mem && dev->mem->mapped_address) { + munmap((void *)(uintptr_t)dev->mem->mapped_address, + (size_t)dev->mem->mapped_size); + free(dev->mem); + dev->mem = NULL; + } + ops->set_mem_table(ctx, ®ions[0], valid_regions); return 0; } diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h new file mode 100644 index 000..75fb57e --- /dev/null +++ b/lib/librte_vhost/virtio-net.h @@ -0,0 +1,43 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _VIRTIO_NET_H +#define _VIRTIO_NET_H + +#include "vhost-net.h" +#include "rte_virtio_net.h" + +struct virtio_net_device_ops const *notify_ops; +struct virtio_net *get_device(struct vhost_device_ctx ctx); + +#endif -- 1.8.1.4
[dpdk-dev] [PATCH 11/12] lib/librte_vhost: set dev->ifname in vhost-user
for vhost-cuse, ifname is the name of the tap device for vhost-user, ifname is the name of the unix domain socket path * provide a common set_ifname ops in virtio-net.c * redefine the size of dev->ifname to fit both vhost-cuse and vhost-user Signed-off-by: Huawei Xie --- lib/librte_vhost/rte_virtio_net.h | 3 +- lib/librte_vhost/vhost-net.h | 3 ++ lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 8 +++- lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 53 ++ lib/librte_vhost/vhost_cuse/virtio-net-cdev.h | 3 ++ lib/librte_vhost/vhost_user/vhost-net-user.c | 7 +++ lib/librte_vhost/virtio-net.c | 63 +-- 7 files changed, 95 insertions(+), 45 deletions(-) diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index 46c2072..611a3d4 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -100,7 +100,8 @@ struct virtio_net { uint64_tfeatures; /**< Negotiated feature set. */ uint64_tdevice_fh; /**< device identifier. */ uint32_tflags; /**< Device flags. Only used to check if device is running on data core. */ - charifname[IFNAMSIZ]; /**< Name of the tap device. */ +#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) + charifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */ void*priv; /**< private context */ } __rte_cache_aligned; diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h index 94b359f..d125a05 100644 --- a/lib/librte_vhost/vhost-net.h +++ b/lib/librte_vhost/vhost-net.h @@ -93,6 +93,9 @@ struct vhost_net_device_ops { int (*new_device)(struct vhost_device_ctx); void (*destroy_device)(struct vhost_device_ctx); + void (*set_ifname)(struct vhost_device_ctx, + const char *if_name, unsigned int if_len); + int (*get_features)(struct vhost_device_ctx, uint64_t *); int (*set_features)(struct vhost_device_ctx, uint64_t *); diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c index 72609a3..6b68abf 100644 --- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c @@ -196,7 +196,13 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg, case VHOST_NET_SET_BACKEND: LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh); - VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend); + if (!in_buf) { + VHOST_IOCTL_RETRY(sizeof(file), 0); + break; + } + file = *(const struct vhost_vring_file *)in_buf; + result = cuse_set_backend(ctx, &file); + fuse_reply_ioctl(req, result, NULL, 0); break; case VHOST_GET_FEATURES: diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c index 1d2c403..b420ca9 100644 --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c @@ -43,6 +43,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -51,6 +55,7 @@ #include "vhost-net.h" #include "virtio-net-cdev.h" #include "virtio-net.h" +#include "eventfd_copy.h" /* Line size for reading maps file. */ static const uint32_t BUFSIZE = PATH_MAX; @@ -346,3 +351,51 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, ops->set_mem_table(ctx, ®ions[0], valid_regions); return 0; } + +/* + * Function to get the tap device name from the provided file descriptor and + * save it in the device structure. + */ +static int +get_ifname(struct vhost_device_ctx ctx, struct virtio_net *dev, int tap_fd, int pid) +{ + int fd_tap; + struct ifreq ifr; + uint32_t ifr_size; + int ret; + + fd_tap = eventfd_copy(tap_fd, pid); + if (fd_tap < 0) + return -1; + + ret = ioctl(fd_tap, TUNGETIFF, &ifr); + + if (close(fd_tap) < 0) + RTE_LOG(ERR, VHOST_CONFIG, + "(%"PRIu64") fd close failed\n", + dev->device_fh); + + if (ret >= 0) { + ifr_size = strnlen(ifr.ifr_name, sizeof(ifr.ifr_name)); + ops->set_ifname(ctx, ifr.ifr_name, ifr_size); + } else + RTE_LOG(ERR, VHOST_CONFIG, + "(%"PRIu64") TUNGETIFF ioctl failed\n", + dev->device_fh); + + return 0; +} + +int cuse_set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file) +{ + struct virtio_net *dev; + + dev = get_device(ctx); + if (dev == NU
[dpdk-dev] [PATCH 12/12] lib/librte_vhost: support calling rte_vhost_driver_register after rte_vhost_driver_session_start
add mutext to protect fdset Signed-off-by: Huawei Xie --- lib/librte_vhost/vhost_user/fd_man.c | 39 +++- lib/librte_vhost/vhost_user/fd_man.h | 2 ++ lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +- 3 files changed, 48 insertions(+), 12 deletions(-) diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c index 929fbc3..e86615d 100644 --- a/lib/librte_vhost/vhost_user/fd_man.c +++ b/lib/librte_vhost/vhost_user/fd_man.c @@ -145,6 +145,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) if (pfdset == NULL || fd == -1) return -1; + pthread_mutex_lock(&pfdset->fd_mutex); + /* Find a free slot in the list. */ i = fdset_find_free_slot(pfdset); if (i == -1) @@ -153,6 +155,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) fdset_add_fd(pfdset, i, fd, rcb, wcb, dat); pfdset->num++; + pthread_mutex_unlock(&pfdset->fd_mutex); + return 0; } @@ -164,12 +168,19 @@ fdset_del(struct fdset *pfdset, int fd) { int i; + if (pfdset == NULL || fd == -1) + return; + + pthread_mutex_lock(&pfdset->fd_mutex); + i = fdset_find_fd(pfdset, fd); if (i != -1 && fd != -1) { pfdset->fd[i].fd = -1; pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL; pfdset->num--; } + + pthread_mutex_unlock(&pfdset->fd_mutex); } /** @@ -183,6 +194,9 @@ fdset_event_dispatch(struct fdset *pfdset) int i, maxfds; struct fdentry *pfdentry; int num = MAX_FDS; + fd_cb rcb, wcb; + void *dat; + int fd; if (pfdset == NULL) return; @@ -190,18 +204,31 @@ fdset_event_dispatch(struct fdset *pfdset) while (1) { FD_ZERO(&rfds); FD_ZERO(&wfds); + pthread_mutex_lock(&pfdset->fd_mutex); + maxfds = fdset_fill(&rfds, &wfds, pfdset); - if (maxfds == -1) - return; + if (maxfds == -1) { + pthread_mutex_unlock(&pfdset->fd_mutex); + sleep(1); + continue; + } + + pthread_mutex_unlock(&pfdset->fd_mutex); select(maxfds + 1, &rfds, &wfds, NULL, NULL); for (i = 0; i < num; i++) { + pthread_mutex_lock(&pfdset->fd_mutex); pfdentry = &pfdset->fd[i]; - if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &rfds) && pfdentry->rcb) - pfdentry->rcb(pfdentry->fd, pfdentry->dat); - if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &wfds) && pfdentry->wcb) - pfdentry->wcb(pfdentry->fd, pfdentry->dat); + fd = pfdentry->fd; + rcb = pfdentry->rcb; + wcb = pfdentry->wcb; + dat = pfdentry->dat; + pthread_mutex_unlock(&pfdset->fd_mutex); + if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb) + rcb(fd, dat); + if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb) + wcb(fd, dat); } } } diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/vhost_user/fd_man.h index 26b4619..4ebae57 100644 --- a/lib/librte_vhost/vhost_user/fd_man.h +++ b/lib/librte_vhost/vhost_user/fd_man.h @@ -34,6 +34,7 @@ #ifndef _FD_MAN_H_ #define _FD_MAN_H_ #include +#include #define MAX_FDS 1024 @@ -48,6 +49,7 @@ struct fdentry { struct fdset { struct fdentry fd[MAX_FDS]; + pthread_mutex_t fd_mutex; int num;/* current fd number of this fdset */ }; diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c index 44ef398..e6df8a8 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.c +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -60,10 +61,18 @@ struct connfd_ctx { }; #define MAX_VHOST_SERVER 1024 -static struct { +struct _vhost_server { struct vhost_server *server[MAX_VHOST_SERVER]; - struct fdset fdset; /**< The fd list this vhost server manages. */ -} g_vhost_server; + struct fdset fdset; +}; + +static struct _vhost_server g_vhost_server = { + .fdset = { + .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL} }, + .fd_mutex = PTHREAD_MUTEX_INITIALIZER, + .num = 0 + }, +}; static int vserver_idx; @@ -423,10 +432,8 @@ rte_vhost_driver_register(const char *path) { struct vhost_server *vserver; - if (vserver_idx == 0) { -
[dpdk-dev] [PATCH 10/12] lib/librte_vhost: vhost user support
In rte_vhost_driver_register(), vhost unix domain socket listener fd is created and added to the selected fdset. In rte_vhost_driver_session_start(), fds in the fdset are checked for processing. If there is new connection on listener fd from qemu, connection fd accepted is added to the selected fdset. The listener and connection fds in the fdset are then both checked there. When there is message on the connection fd, its callback vserver_message_handler is called to process the vhost messages. To support identifying which virtio is from which guest VM, rte_vhost_driver_register is allowed to be called multiple times to specify different socket path for different virtio device. The socket path is then set in the virtio_net device. Signed-off-by: Huawei Xie --- lib/librte_vhost/Makefile | 8 +- lib/librte_vhost/rte_virtio_net.h | 2 + lib/librte_vhost/vhost-net.h | 4 +- lib/librte_vhost/vhost_user/vhost-net-user.c | 455 ++ lib/librte_vhost/vhost_user/vhost-net-user.h | 106 ++ lib/librte_vhost/vhost_user/virtio-net-user.c | 322 ++ lib/librte_vhost/vhost_user/virtio-net-user.h | 49 +++ lib/librte_vhost/virtio-net.c | 26 +- 8 files changed, 957 insertions(+), 15 deletions(-) create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 92ab9a6..22319b8 100644 --- a/lib/librte_vhost/Makefile +++ b/lib/librte_vhost/Makefile @@ -34,10 +34,14 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_vhost.a -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 +CFLAGS += -I vhost_cuse -lfuse +CFLAGS += -I vhost_user LDFLAGS += -lfuse # all source are stored in SRCS-y -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index 0bf07c7..46c2072 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -50,6 +50,8 @@ #include #include +#define VHOST_MEMORY_MAX_NREGIONS 8 + /* Used to indicate that the device is running on a data core */ #define VIRTIO_DEV_RUNNING 1 diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h index 11737cc..94b359f 100644 --- a/lib/librte_vhost/vhost-net.h +++ b/lib/librte_vhost/vhost-net.h @@ -41,7 +41,9 @@ #include -#define VHOST_MEMORY_MAX_NREGIONS 8 +#include "rte_virtio_net.h" + +extern struct vhost_net_device_ops const *ops; /* Macros for printing using RTE_LOG */ #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1 diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c new file mode 100644 index 000..ff83511 --- /dev/null +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -0,0 +1,455 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + *
[dpdk-dev] [PATCH] examples/vhost: fix segfault when link_vmdq fails
Signed-off-by: Huawei Xie --- examples/vhost/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 04f0118..3a35359 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -1308,8 +1308,8 @@ switch_worker(__attribute__((unused)) void *arg) /* If this is the first received packet we need to learn the MAC and setup VMDQ */ if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && tx_count) { if (vdev->remove || (link_vmdq(vdev, pkts_burst[0]) == -1)) { - while (tx_count--) - rte_pktmbuf_free(pkts_burst[tx_count]); + while (tx_count) + rte_pktmbuf_free(pkts_burst[--tx_count]); } } while (tx_count) -- 1.8.1.4
[dpdk-dev] [PATCH] examples/vhost: fix segfault when link_vmdq fails
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > Sent: Friday, January 30, 2015 3:14 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] examples/vhost: fix segfault when link_vmdq > fails > > Signed-off-by: Huawei Xie Acked-by: Changchun Ouyang > --- > examples/vhost/main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/examples/vhost/main.c b/examples/vhost/main.c index > 04f0118..3a35359 100644 > --- a/examples/vhost/main.c > +++ b/examples/vhost/main.c > @@ -1308,8 +1308,8 @@ switch_worker(__attribute__((unused)) void *arg) > /* If this is the first received packet we need > to learn the MAC and setup VMDQ */ > if (unlikely(vdev->ready == > DEVICE_MAC_LEARNING) && tx_count) { > if (vdev->remove || > (link_vmdq(vdev, pkts_burst[0]) == -1)) { > - while (tx_count--) > - > rte_pktmbuf_free(pkts_burst[tx_count]); > + while (tx_count) > + > rte_pktmbuf_free(pkts_burst[--tx_count]); > } > } > while (tx_count) > -- > 1.8.1.4
[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer
Haifeng Could you give more information so that we can reproduce your issue? Thanks. 1. What's your dpdk package, based on which branch, with Huawei's vhost-user's patches? 2. What's your step and command to launch vhost sample? 3. What is mz? Your internal tool? I can't yum install mz or download mz tool. 4. As to your test scenario, I understand it in this way: virtio1 in VM1, virtio2 in VM2, then let virtio1 send packages to virtio2, the problem is that after 3 hours, virtio2 can't receive packets, but virtio1 is still sending packets, am I right? So mz is like a packet generator to send packets, right? -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Linhaifeng Sent: Thursday, January 29, 2015 9:51 PM To: Xie, Huawei; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer On 2015/1/29 21:00, Xie, Huawei wrote: > > >> -Original Message- >> From: Linhaifeng [mailto:haifeng.lin at huawei.com] >> Sent: Thursday, January 29, 2015 8:39 PM >> To: Xie, Huawei; dev at dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer >> when there is no buffer >> >> >> >> On 2015/1/29 18:39, Xie, Huawei wrote: >> - if (count == 0) + /* If there is no buffers we should notify guest to fill. + * This is need when guest use virtio_net driver(not pmd). + */ + if (count == 0) { + if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) + eventfd_write((int)vq->kickfd, 1); return 0; + } >>> >>> Haifeng: >>> Is it the root cause and is it protocol required? >>> Could you give a detailed description for that scenario? >>> >> >> I use mz to send data from one VM1 to VM2.The two VM use virtio-net driver. >> VM1 execute follow script: >> for((i=0;i<9;i++)); >> do >> mz eth0 -t udp -A 1.1.1.1 -B 1.1.1.2 -a 00:00:00:00:00:01 -b >> 00:00:00:00:00:02 -c >> 1000 -p 512 >> sleep 4 >> done >> >> VM2 execute follow command to watch: >> watch -d ifconfig >> >> After many hours VM2 stop to receive data. >> >> Could you test it ? > > > We could try next week after I send the whole patch. > How many hours? Is it reproducible at your side? I inject packets through > packet generator to guest for more than ten hours, haven't met issues. About three hours. What kind of driver you used in guest?virtio-net-pmd or virtio-net? > As I said in another mail sent to you, could you dump the status of vring if > you still have the spot? How to dump the status of vring in guest? > Could you please also reply to that mail? > Which mail? > For the patch, if we have no root cause, I prefer not to apply it, so that we > don't send more interrupts than needed to guest to affect performance. I found that if we add this notify the performance is better(growth of 100kpps when use 64byte UDP packets) > People could temporarily apply this patch as a work around. > > Or anyone > OK.I'm also not sure about this bug.I think i should do something to found the real reason. > >> -- >> Regards, >> Haifeng > > > -- Regards, Haifeng
[dpdk-dev] [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple filters
> -Original Message- > From: De Lara Guarch, Pablo > Sent: Wednesday, January 28, 2015 10:29 PM > To: Wu, Jingjing; dev at dpdk.org > Cc: Cao, Min; Xu, HuilongX > Subject: RE: [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple > filters > > > > > -Original Message- > > From: Wu, Jingjing > > Sent: Thursday, January 22, 2015 7:38 AM > > To: dev at dpdk.org > > Cc: Wu, Jingjing; De Lara Guarch, Pablo; Cao, Min; Xu, HuilongX > > Subject: [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple > > filters > > > > v2 changes: > > - remove the code which is already applied in patch "Integrate ethertype > > filter in igb/ixgbe driver to new API". > > - modify commands' description in doc testpmd_funcs.rst. > > > > The patch set uses filter_ctrl API to replace old 2tuple and 5tuple filter > > APIs. > > It defines ntuple filter to combine 2tuple and 5tuple types. > > It uses new functions and structure to replace old ones in igb/ixgbe > > driver, new commands to replace old ones in testpmd, and removes the > old APIs. > > It removes the filter's index parameters from user interface, only the > > filter's key and assigned queue are visible to user. > > > > Jingjing Wu (6): > > ethdev: define ntuple filter type and its structure > > ixgbe: ntuple filter functions replace old ones for 5tuple filter > > e1000: ntuple filter functions replace old ones for 2tuple and 5tuple > > filter > > testpmd: new commands for ntuple filter > > ethdev: remove old APIs and structures of 5tuple and 2tuple filters > > doc: commands changed in testpmd_funcs for 2tuple amd 5tuple filter > > > > app/test-pmd/cmdline.c | 406 ++--- > > app/test-pmd/config.c | 65 --- > > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 99 +--- > > lib/librte_ether/rte_eth_ctrl.h | 57 ++ > > lib/librte_ether/rte_ethdev.c | 116 > > lib/librte_ether/rte_ethdev.h | 192 -- > > lib/librte_pmd_e1000/e1000_ethdev.h | 69 ++- > > lib/librte_pmd_e1000/igb_ethdev.c | 869 +++--- > --- > > --- > > lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 468 +++ > > lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 52 +- > > 10 files changed, 1300 insertions(+), 1093 deletions(-) > > > > -- > > 1.9.3 > > Acked-by: Pablo de Lara > > Just mind that the last patch (changing the documentation) does not apply > properly, as there was another patch (from you I think), that modifies that > document. > Could you send another version of the last patch? > Not sure if that's OK or if it is better to send the full patchset again. > Thank you, Pablo. Yes. It's due to another patch of doc change is applied before this one. But I think many patches will update this the document. Let's wait to see whether there are more comments about this patch set at first.
[dpdk-dev] [PATCH v6 5/6] enicpmd: DPDK-ENIC PMD interface
Hi, ssujith > -Original Message- > From: Sujith Sankar (ssujith) [mailto:ssujith at cisco.com] > Sent: Tuesday, December 30, 2014 12:46 PM > To: Wu, Jingjing; dev at dpdk.org > Cc: Prasad Rao (prrao) > Subject: Re: [dpdk-dev] [PATCH v6 5/6] enicpmd: DPDK-ENIC PMD interface > > > > On 29/12/14 1:45 pm, "Wu, Jingjing" wrote: > > >Hi, ssujith > > > >> + .tx_queue_release = enicpmd_dev_tx_queue_release, > >> + .dev_led_on = NULL, > >> + .dev_led_off = NULL, > >> + .flow_ctrl_get= NULL, > >> + .flow_ctrl_set= NULL, > >> + .priority_flow_ctrl_set = NULL, > >> + .mac_addr_add = enicpmd_add_mac_addr, > >> + .mac_addr_remove = enicpmd_remove_mac_addr, > >> + .fdir_add_signature_filter= NULL, > >> + .fdir_update_signature_filter = NULL, > >> + .fdir_remove_signature_filter = NULL, > >> + .fdir_infos_get = enicpmd_fdir_info_get, > >> + .fdir_add_perfect_filter = enicpmd_fdir_add_perfect_filter, > >> + .fdir_update_perfect_filter = enicpmd_fdir_add_perfect_filter, > >> + .fdir_remove_perfect_filter = enicpmd_fdir_remove_perfect_filter, > >> + .fdir_set_masks = NULL, > >> +}; > >> + > > > >I found that in perfect fdir is also supported in enic driver. > > > >During the R1.8 development, we defined a new dev_ops call filter_ctrl, > >which can be used to control kinds of filters, flow director is > >included too. Which is mentioned in > >http://www.dpdk.org/ml/archives/dev/2014-September/005179.html . > >In R1.8, filter_ctrl is only used by i40e driver. And we also planned > >use it in the existing ixgbe/e1000 driver in the next days. The old > >APIs such as fdir_add_perfect_filter, fdir_remove_perfect_filter can be > >replaced then. > > > Hi Jingjing, > Thanks for the info and the link. I shall take a look at it. > It looks like bringing in one interface for all filter related operations. > I believe ENIC should also move to it. > > Thanks, > -Sujith > Just let you know that, I already sent the patch to migrate the flow director in ixgbe driver to new filter_ctrl API. http://www.dpdk.org/ml/archives/dev/2015-January/011830.html To avoid compile error and influence to enic driver. I didn't remove the old APIs and structures in rte_ethdev. I think they can be removed when the migration in enic driver is done. Do you have any plan for this? Thank u! Jingjing > > > >So, do you have any plan to migrate the fdir in enic to the filter_ctrl > >API? > > > >Jingjing > > > >Thanks! > > > >> +struct enic *enicpmd_list_head = NULL; > >> +/* Initialize the driver > >> + * It returns 0 on success. > >> + */ > >> +static int eth_enicpmd_dev_init( > >> + __attribute__((unused))struct eth_driver *eth_drv, > >> + struct rte_eth_dev *eth_dev) > >> +{ > >> + struct rte_pci_device *pdev; > >> + struct rte_pci_addr *addr; > >> + struct enic *enic = pmd_priv(eth_dev); > >> + > >> + ENICPMD_FUNC_TRACE(); > >> + > >> + enic->rte_dev = eth_dev; > >> + eth_dev->dev_ops = &enicpmd_eth_dev_ops; > >> + eth_dev->rx_pkt_burst = &enicpmd_recv_pkts; > >> + eth_dev->tx_pkt_burst = &enicpmd_xmit_pkts; > >> + > >> + pdev = eth_dev->pci_dev; > >> + enic->pdev = pdev; > >> + addr = &pdev->addr; > >> + > >> + snprintf(enic->bdf_name, ENICPMD_BDF_LENGTH, > >> "%04x:%02x:%02x.%x", > >> + addr->domain, addr->bus, addr->devid, addr->function); > >> + > >> + return enic_probe(enic); > >> +} > >> + > >> +static struct eth_driver rte_enic_pmd = { > >> + { > >> + .name = "rte_enic_pmd", > >> + .id_table = pci_id_enic_map, > >> + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, > >> + }, > >> + .eth_dev_init = eth_enicpmd_dev_init, > >> + .dev_private_size = sizeof(struct enic), }; > >> + > >> +/* Driver initialization routine. > >> + * Invoked once at EAL init time. > >> + * Register as the [Poll Mode] Driver of Cisco ENIC device. > >> + */ > >> +int rte_enic_pmd_init(const char *name __rte_unused, > >> + const char *params __rte_unused) > >> +{ > >> + ENICPMD_FUNC_TRACE(); > >> + > >> + rte_eth_driver_register(&rte_enic_pmd); > >> + return 0; > >> +} > >> + > >> +static struct rte_driver rte_enic_driver = { > >> + .type = PMD_PDEV, > >> + .init = rte_enic_pmd_init, > >> +}; > >> + > >> +PMD_REGISTER_DRIVER(rte_enic_driver); > >> + > >> -- > >> 1.9.1 > >
[dpdk-dev] Regarding UDP checksum offload
Hi, On 01/29/2015 01:56 PM, Prashant Upadhyaya wrote: > Another thing you can do is to retry on the latest stable dpdk which > is known to work (see csumonly.c in test-pmd). > > Let me add further, I am _just_ doing the UDP checksum offload > and not > the IP hdr checksum offload. I calculate and set IP header > checksum by > my own code. I hope that this is acceptable and does not > interfere with > UDP checksum offload > > > This should not be a problem. > > Indeed it worked with DPDK1.7 and then I retried with DPDK1.6 and it > worked there too. > Must have been some mistake at my end, may be I did not clean properly > when I was experimenting with some values of l2_len. > Sorry for the botheration to the list. > > While we are at it, a quick question -- in case I have an mbuf chain > whose payloads constitute a UDP packet, should I setup the ol_flags and > the l2_len, l3_len fields only in the first mbuf header of the chain or > in all the mbuf headers of the chain ? Only the first mbuf is required. This is the case for all offload infos like flags, tso, ... and it's the same in rx. Regards, Olivier
[dpdk-dev] [PATCH 1/3] librte_reorder: New reorder library
> From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Thursday, January 29, 2015 8:40 PM > To: Gonzalez Monroy, Sergio > Cc: Thomas Monjalon; Pattan, Reshma; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 1/3] librte_reorder: New reorder library > > On Thu, Jan 29, 2015 at 05:35:09PM +, Gonzalez Monroy, Sergio wrote: > > Hi Thomas, > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas > Monjalon > > > Sent: Tuesday, January 20, 2015 8:01 AM > > > To: Pattan, Reshma > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH 1/3] librte_reorder: New reorder > > > library > > > > > > Hi, > > > > > > 2015-01-07 16:39, Reshma Pattan: > > > > 1)New library to provide reordering of out of ordered > > > > mbufs based on sequence number of mbuf. Library uses > > > > reorder > > > buffer structure > > > > which in tern uses two circular buffers called ready and > > > > order > buffers. > > > > *rte_reorder_create API creates instance of reorder buffer. > > > > *rte_reorder_init API initializes given reorder buffer > > > > instance. > > > > *rte_reorder_reset API resets given reorder buffer instance. > > > > *rte_reorder_insert API inserts the mbuf into order circular > buffer. > > > > *rte_reorder_fill_overflow moves mbufs from order > > > > buffer to ready > > > buffer > > > > to accomodate early packets in order buffer. > > > > *rte_reorder_drain API provides draining facility to fetch > > > > out > > > > reordered mbufs from order and ready buffers. > > > > > > > > Signed-off-by: Reshma Pattan > > > > Signed-off-by: Richardson Bruce > > > > > > > > > > I think 2 things are missing in this patchset: > > > > > > 1) Could you show some performance numbers to compare a simple > > > forwarding with and without this library, in the commit log? > > > > > I'm not allowed to provide specific performance numbers. > Can you elaborate on this? Why can you not provide specific performance > numbers from your testing? Is there some concern over the validity of the > measurements? Hi Neil, As far as I know, that is exactly the reason. Any Intel specific performance data goes through a performance validation team before being released. Thanks, Sergio
[dpdk-dev] [PATCH] examples/vhost: fix segfault when link_vmdq fails
> > Signed-off-by: Huawei Xie > > Acked-by: Changchun Ouyang Applied Thanks -- Thomas
[dpdk-dev] [PATCH] kni: optimizing the rte_kni_rx_burst
2014-11-26 22:20, Thomas Monjalon: > Ping > > 2014-11-11 23:58, Thomas Monjalon: > > Is there anyone interested in KNI to review this patch please? > > > > > > 2014-07-23 12:15, Hemant Agrawal: > > > The current implementation of rte_kni_rx_burst polls the fifo for buffers. > > > Irrespective of success or failure, it allocates the mbuf and try to put > > > them into the alloc_q > > > if the buffers are not added to alloc_q, it frees them. > > > This waste lots of cpu cycles in allocating and freeing the buffers if > > > alloc_q is full. > > > > > > The logic has been changed to: > > > 1. Initially allocand add buffer(burstsize) to alloc_q > > > 2. Add buffers to alloc_q only when you are pulling out the buffers. > > > > > > Signed-off-by: Hemant Agrawal >From http://dpdk.org/ml/archives/dev/2015-January/011771.html, Jay said "The patch looks good from a DPDK 1.6r2 viewpoint. We saw the same behavior in our app and ended up avoiding it higher in the stack (in our code)." Reviewed-by: Jay Rolette -- Thomas
[dpdk-dev] [PATCH 00/12] qemu vhost-user support
On 2015/01/30 15:36, Huawei Xie wrote: > vhost-user supports passing vring information to a seperate vhost enabled > process, normally a user space vSwitch, through unix domain socket. > > In previous DPDK version, we implement a user space character device driver > vhost-cuse in user space DPDK process. vring informations are passed to the > driver through ioctl call, including eventfds for interrupt injection and > host notification. We need to develop a kernel module to copy that fd from > qemu into our process. We also need some trick to map guest memory. > (TODO: kickfd/callfd is reversed which causes confusion) > > known issue in vhost-user implementation in QEMU, reported by haifeng.lin at > huawei.com > * QEMU doesn't send correct memory region information with multiple numa node > configuration > http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html > > Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when > receive -1 > as fd on Ubuntu 14.04". > > Huawei Xie (12): > enable VIRTIO_NET_F_CTRL_RX > create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse > rename vhost-net-cdev.h to vhost-net.h > move fd copying(from qemu process into vhost process) to eventfd_copy.c > copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c > make host_memory_map more generic > split set_memory_table into two parts > add select based event driven processing > free memory when receive new set_memory_table message > vhost user support > support dev->ifname in vhost-user > support calling rte_vhost_driver_register after > rte_vhost_driver_session_start > > lib/librte_vhost/Makefile | 8 +- > lib/librte_vhost/rte_virtio_net.h | 5 +- > lib/librte_vhost/vhost-net-cdev.c | 389 - > lib/librte_vhost/vhost-net-cdev.h | 113 --- > lib/librte_vhost/vhost-net.h | 121 +++ > lib/librte_vhost/vhost_cuse/eventfd_copy.c| 89 + > lib/librte_vhost/vhost_cuse/eventfd_copy.h| 40 +++ > lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 417 +++ > lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 401 ++ > lib/librte_vhost/vhost_cuse/virtio-net-cdev.h | 48 +++ > lib/librte_vhost/vhost_rxtx.c | 2 +- > lib/librte_vhost/vhost_user/fd_man.c | 234 + > lib/librte_vhost/vhost_user/fd_man.h | 66 > lib/librte_vhost/vhost_user/vhost-net-user.c | 469 > ++ > lib/librte_vhost/vhost_user/vhost-net-user.h | 106 ++ > lib/librte_vhost/vhost_user/virtio-net-user.c | 322 ++ > lib/librte_vhost/vhost_user/virtio-net-user.h | 49 +++ > lib/librte_vhost/virtio-net.c | 455 +++-- > lib/librte_vhost/virtio-net.h | 43 +++ > 19 files changed, 2460 insertions(+), 917 deletions(-) > delete mode 100644 lib/librte_vhost/vhost-net-cdev.c > delete mode 100644 lib/librte_vhost/vhost-net-cdev.h > create mode 100644 lib/librte_vhost/vhost-net.h > create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c > create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h > create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c > create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c > create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h > create mode 100644 lib/librte_vhost/vhost_user/fd_man.c > create mode 100644 lib/librte_vhost/vhost_user/fd_man.h > create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c > create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h > create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c > create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h > create mode 100644 lib/librte_vhost/virtio-net.h > Hi Xie, it seems "checkpath.pl" reports warnings in some patches. I guess some of them can be fixed. Thanks, Tetsuya
[dpdk-dev] [PATCH 01/12] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX
On 2015/01/30 15:36, Huawei Xie wrote: > VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. > > Observed that virtio-net driver in guest would crash with only CTRL_RX > enabled. > > In virtnet_send_command: > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) || > (out + in > VIRTNET_SEND_COMMAND_SG_MAX)); > > Signed-off-by: Huawei Xie > --- > lib/librte_vhost/virtio-net.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c > index b041849..52b4957 100644 > --- a/lib/librte_vhost/virtio-net.c > +++ b/lib/librte_vhost/virtio-net.c > @@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root; > > /* Features supported by this lib. */ > #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ > - (1ULL << VIRTIO_NET_F_CTRL_RX)) > + (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ > + (1ULL << VIRTIO_NET_F_CTRL_RX)) > static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; > > /* Line size for reading maps file. */ Hi Xie, Could you please check below code? - examples/vhost/main.c - case 'P': promiscuous = 1; vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.rx_mode = ETH_VMDQ_ACCEPT_BROADCAST | ETH_VMDQ_ACCEPT_MULTICAST; rte_vhost_feature_enable(1ULL << VIRTIO_NET_F_CTRL_RX); VIRTIO_NET_F_CTRL_RX is always enabled by this patch. So if 'P' isn't specified in vhost example, does it need to be disabled? Thanks, Tetsuya
[dpdk-dev] [PATCH] MAINTAINERS: claim responsibility for Link Bonding PMD
Signed-off-by: Declan Doherty --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5fccdbb..743fa49 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -161,6 +161,7 @@ Drivers --- Link bonding +M: Declan Doherty F: lib/librte_pmd_bond/ F: doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst F: app/test/test_link_bonding.c -- 1.9.3
[dpdk-dev] [PATCH 05/12] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
On 2015/01/30 15:36, Huawei Xie wrote: > Signed-off-by: Huawei Xie > --- > lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 258 > ++ > 1 file changed, 258 insertions(+) > create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c > > diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c > b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c > new file mode 100644 > index 000..fbfc403 > --- /dev/null > +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c > @@ -0,0 +1,258 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include "vhost-net.h" > + > +/* Line size for reading maps file. */ > +static const uint32_t BUFSIZE = PATH_MAX; > + > +/* Size of prot char array in procmap. */ > +#define PROT_SZ 5 > + > +/* Number of elements in procmap struct. */ > +#define PROCMAP_SZ 8 > + > +/* Structure containing information gathered from maps file. */ > +struct procmap { > + uint64_t va_start; /* Start virtual address in file. */ > + uint64_t len; /* Size of file. */ > + uint64_t pgoff; /* Not used. */ > + uint32_t maj; /* Not used. */ > + uint32_t min; /* Not used. */ > + uint32_t ino; /* Not used. */ > + char prot[PROT_SZ]; /* Not used. */ > + char fname[PATH_MAX]; /* File name. */ > +}; > + > +/* > + * Locate the file containing QEMU's memory space and > + * map it to our address space. > + */ > +static int > +host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, > + pid_t pid, uint64_t addr) > +{ Hi Xie, This patch only copy host_memory_map() to a new file. And actually the original function is removed at below patch. - "[PATCH 07/12] lib/librte_vhost: split set_memory_table into two parts" Is it difficult to remove and copy the function in this patch? Thanks, Tetsuya > + struct dirent *dptr = NULL; > + struct procmap procmap; > + DIR *dp = NULL; > + int fd; > + int i; > + char memfile[PATH_MAX]; > + char mapfile[PATH_MAX]; > + char procdir[PATH_MAX]; > + char resolved_path[PATH_MAX]; > + char *path = NULL; > + FILE *fmap; > + void *map; > + uint8_t found = 0; > + char line[BUFSIZE]; > + char dlm[] = "- : "; > + char *str, *sp, *in[PROCMAP_SZ]; > + char *end = NULL; > + > + /* Path where mem files are located. */ > + snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid); > + /* Maps file used to locate mem file. */ > + snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid); > + > + fmap = fopen(mapfile, "r"); > + if (fmap == NULL) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "(%"PRIu64") Failed to open maps file for pid %d\n", > + dev->device_fh, pid); > + return -1; > + } > + > + /* Read through maps file until we find out base_address. */ > + while (fgets(line, BUFSIZE, fmap) != 0) { > + str = line; > + errno = 0; > + /* Split line into fields. */ > + for (i = 0
[dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
> -Original Message- > From: Ouyang, Changchun > Sent: Friday, January 30, 2015 2:20 AM > To: Wodkowski, PawelX; Thomas Monjalon; Richardson, Bruce > Cc: dev at dpdk.org; Ouyang, Changchun > Subject: RE: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss > > Hi PawelX > > > -Original Message- > > From: Wodkowski, PawelX > > Sent: Friday, January 30, 2015 12:14 AM > > To: Ouyang, Changchun; Thomas Monjalon; Richardson, Bruce > > Cc: dev at dpdk.org > > Subject: RE: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang, > > Changchun > > > Sent: Wednesday, January 28, 2015 2:35 AM > > > To: Thomas Monjalon > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf > > > rss > > > > > > Hi Thomas, > > > > > > > -Original Message- > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > > Sent: Tuesday, January 27, 2015 8:13 PM > > > > To: Ouyang, Changchun > > > > Cc: dev at dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in > > > > vf rss > > > > > > > > > To follow up the comments from Wodkowski, PawelX, remove this > > > > > unnecessary check, as check_mq_mode has already check the queue > > > > number > > > > > in device configure stage, if the queue number of vf is not > > > > > correct, it will return error code and exit, so it doesn't need > > > > > check again here in device start stage(note: pf_host_configure is > > > > > called in device start > > > > stage). > > > > > > > > > > This fixes commit 42d2f78abcb77ecb769be4149df550308169ef0f > > > > > > > > > > Signed-off-by: Changchun Ouyang > > > > > > > > Suggested-by: Pawel Wodkowski > > > > Fixes: 42d2f78abcb77 ("configure VF RSS") > > > > > > > > Applied > > > > > > > > > > Thanks very much for the applying! > > > > > > > Changchun, as you are working on ixgbe, maybe you would like to > > > > review some ixgbe patches from others? > > > > > > > > > > No problem, I will try to do it when my bandwidth allows me to do it, > > > :-) Thanks Changchun > > > > Actually I was suggesting exactly opposite direction. Main issue is that the > > sriov field in rte_eth_dev_data is only used by igb and ixgbe drivers. In > > addition > > rte_eth_dev_check_mq_mode() is specialized for ixgbe driver. > > > > I am thinking about moving sriov from rte_eth_dev_data to driver's private > > structure or at least move rte_eth_dev_check_mq_mode() to struct > > eth_dev_ops as optional driver configuration step. > > > > What do you think about both steps? > > Good opinion! > I prefer to move rte_eth_dev_check_mq_mode to eth_dev_ops as optional > driver configure, > The reason is that in future other eth type may also need such kind of check > or > even refine some queue number values by their own way, > I can help review your patch after you send out. > Thanks for your enhancing that. > What about about moving sriov from rte_eth_dev_data to driver's private structure? Pawel
[dpdk-dev] [PATCH v3] test: fix missing NULL pointer checks
2015-01-27 13:06, Neil Horman: > On Tue, Jan 27, 2015 at 04:44:53PM +0100, Daniel Mrzyglod wrote: > > In test_sched, we are missing NULL pointer checks after create_mempool() > > and rte_pktmbuf_alloc(). Add in these checks using TEST_ASSERT_NOT_NULL > > macros. > > > > VERIFY macro was removed and replaced by standard test ASSERTS from > > "test.h" header. > > This provides additional information to track when the failure occured. > > > > v3 changes: > > - remove VERIFY macro > > - fix spelling error. > > - change unproper comment > > > > v2 changes: > > - Replace all VERIFY macros instances by proper TEST_ASSERT* macros. > > - fix description > > > > v1 changes: > > - first iteration of patch using VERIFY macro. > > > > Signed-off-by: Daniel Mrzyglod > > These TEST_ASSERT macros are no better than the VERIFY macro, they contain > exaxtly the same return issue that I outlined in my first post on the subject. Neil, you are suggesting to rework the assert macros of the unit tests. It should be another patch. Here, Daniel is improving the sched test with existing macros. I think it should be applied. -- Thomas
[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files
Hi Pawel, > Signed-off-by: Pawel Wodkowski > --- > lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 > lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 Why adding extern C in a private header file? -- Thomas
[dpdk-dev] [PATCH 1/2] rte_ethdev: update link status (speed, duplex, link_up) after rte_eth_dev_start
Jia, any news on this patchset? 2014-11-12 03:57, Zhang, Helin: > Hi Jia > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jia Yu > > Sent: Saturday, November 8, 2014 1:32 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH 1/2] rte_ethdev: update link status (speed, > > duplex, > > link_up) after rte_eth_dev_start > > > > Since LSR interrupt is disabled by pmd drivers, link status in > > rte_eth_device is > > always down. > If LSC interrupt is disabled by default, it will poll the link status during > the initialization > or in dev_start, and then the link status should he correct. If I am not > wrong. > > > Bond slave_configure() enables LSR interrupt on devices to get notification > > if link > > status changes. However, the LSC interrupt at device start time is still > > lost. > Before enabling interrupt for LSC, the link status should be polled. So after > the port > startup, the link status should be there. > > > > > In this fix, call link_update to read link status from hardware register at > > device > > start time. > Could you help to explain this code changes a bit more? Why we need it? > > > > > Issue: > > Change-Id: Ib57a1c9114f922485c7b0f4338bfe7b3d3f87d65 > > Signed-off-by: Jia Yu > > --- > > lib/librte_ether/rte_ethdev.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c > > index > > ff1c769..6c01b02 100644 > > --- a/lib/librte_ether/rte_ethdev.c > > +++ b/lib/librte_ether/rte_ethdev.c > > @@ -869,6 +869,10 @@ rte_eth_dev_start(uint8_t port_id) > > > > rte_eth_dev_config_restore(port_id); > > > > + if (dev->data->dev_conf.intr_conf.lsc != 0) { > > + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->link_update, -ENOTSUP); > > + (*dev->dev_ops->link_update)(dev, 0); > > + } > > return 0; > > } > > > > -- > > 1.9.1 > > Regards, > Helin
[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer
On 2015/1/30 16:20, Xu, Qian Q wrote: > Haifeng > Could you give more information so that we can reproduce your issue? Thanks. > 1. What's your dpdk package, based on which branch, with Huawei's > vhost-user's patches? Not with Huawei's patches.I implement a demo before Huawei's patches with OVDK's vhost_dequeue_burst and vhost_enqueue_burst. Now I'm trying to run vhost-user with dpdk vhost example(master branch). > 2. What's your step and command to launch vhost sample? BTW.How to run vhost example with vm2vm mode? Is VM2VM means i can send packet from vm1 to vm2? I setup with follow steps but can't send packet in VM: mount -t hugetlbfs nodev /mnt/huge -o pagesize=1G mount -t hugetlbfs nodev /dev/hugepages -o pagesize=2M echo 8192 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages modprobe uio insmod ${RTE_SDK}/x86_64-native-linuxapp-gcc/kmod/igb_uio.ko dpdk_nic_bind.py -b igb_uio 82:00.0 82:00.1 rmmod vhost_net modprobe cuse insmod ${RTE_SDK}/lib/librte_vhost/eventfd_link/eventfd_link.ko ${RTE_SDK}/examples/vhost/build/app/vhost-switch -c 0x300 -n 4 --huge-dir /mnt/huge -m 2048 -- -p 0x1 --vm2vm 1 qemu-wrap.py -enable-kvm -mem-path /mnt/huge/ -mem-prealloc -smp 2 \ -netdev tap,id=hostnet1,vhost=on,ifname=port0 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:00:00:00:01 -hda /mnt/sdb/linhf/vm1.img -m 2048 -vnc :0 qemu-wrap.py -enable-kvm -mem-path /mnt/huge/ -mem-prealloc -smp 2 \ -netdev tap,id=hostnet1,vhost=on,ifname=port0 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:00:00:00:02 -hda /mnt/sdb/linhf/vm2.img -m 2048 -vnc :1 > 3. What is mz? Your internal tool? I can't yum install mz or download mz > tool. http://www.perihel.at/sec/mz/ > 4. As to your test scenario, I understand it in this way: virtio1 in VM1, > virtio2 in VM2, then let virtio1 send packages to virtio2, the problem is > that after 3 hours, virtio2 can't receive packets, but virtio1 is still > sending packets, am I right? So mz is like a packet generator to send > packets, right? Yes,you are right. > > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Linhaifeng > Sent: Thursday, January 29, 2015 9:51 PM > To: Xie, Huawei; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there > is no buffer > > > > On 2015/1/29 21:00, Xie, Huawei wrote: >> >> >>> -Original Message- >>> From: Linhaifeng [mailto:haifeng.lin at huawei.com] >>> Sent: Thursday, January 29, 2015 8:39 PM >>> To: Xie, Huawei; dev at dpdk.org >>> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer >>> when there is no buffer >>> >>> >>> >>> On 2015/1/29 18:39, Xie, Huawei wrote: >>> > - if (count == 0) > + /* If there is no buffers we should notify guest to fill. > + * This is need when guest use virtio_net driver(not pmd). > + */ > + if (count == 0) { > + if (!(vq->avail->flags & > VRING_AVAIL_F_NO_INTERRUPT)) > + eventfd_write((int)vq->kickfd, 1); > return 0; > + } Haifeng: Is it the root cause and is it protocol required? Could you give a detailed description for that scenario? >>> >>> I use mz to send data from one VM1 to VM2.The two VM use virtio-net driver. >>> VM1 execute follow script: >>> for((i=0;i<9;i++)); >>> do >>> mz eth0 -t udp -A 1.1.1.1 -B 1.1.1.2 -a 00:00:00:00:00:01 -b >>> 00:00:00:00:00:02 -c >>> 1000 -p 512 >>> sleep 4 >>> done >>> >>> VM2 execute follow command to watch: >>> watch -d ifconfig >>> >>> After many hours VM2 stop to receive data. >>> >>> Could you test it ? >> >> >> We could try next week after I send the whole patch. >> How many hours? Is it reproducible at your side? I inject packets through >> packet generator to guest for more than ten hours, haven't met issues. > > About three hours. > What kind of driver you used in guest?virtio-net-pmd or virtio-net? > > >> As I said in another mail sent to you, could you dump the status of vring >> if you still have the spot? > > How to dump the status of vring in guest? > >> Could you please also reply to that mail? >> > > Which mail? > > >> For the patch, if we have no root cause, I prefer not to apply it, so that >> we don't send more interrupts than needed to guest to affect performance. > > I found that if we add this notify the performance is better(growth of > 100kpps when use 64byte UDP packets) > >> People could temporarily apply this patch as a work around. >> >> Or anyone >> > > OK.I'm also not sure about this bug.I think i should do something to found > the real reason. > >> >>> -- >>> Regards, >>> Haifeng >> >> >> > -- Regards, Haifeng
[dpdk-dev] [PATCH] examples: new txburst application
Hi Bhavesh, 2014-11-18 10:32, Bhavesh Davda: > Test application to transmit 32-packet bursts of 220-byte UDP packets every > 50 us, approximating 240,000 pps. We found it useful for testing hypervisor > performance for a transmit-heavy but bursty workload in a VM with DPDK. > > Signed-off-by: Bhavesh Davda There was no review of your patch. Maybe you should explain why you think it should be integrated as an example. What is new compared to other examples? Thanks -- Thomas
[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms
Hey Zhihong, > -Original Message- > From: Wang, Zhihong > Sent: Friday, January 30, 2015 5:57 AM > To: Ananyev, Konstantin; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > arch/x86/rte_memcpy.h for both SSE and AVX platforms > > Hey Konstantin, > > This method does reduce code size but lead to significant performance drop. > I think we need to keep the original code. Sure, no point to make it slower. Thanks for trying it anyway. Konstantin > > > Thanks > Zhihong (John) > > > > -Original Message- > > From: Ananyev, Konstantin > > Sent: Thursday, January 29, 2015 11:18 PM > > To: Wang, Zhihong; dev at dpdk.org > > Subject: RE: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > > arch/x86/rte_memcpy.h for both SSE and AVX platforms > > > > Hi Zhihong, > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang > > > Sent: Thursday, January 29, 2015 2:39 AM > > > To: dev at dpdk.org > > > Subject: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > > > arch/x86/rte_memcpy.h for both SSE and AVX platforms > > > > > > Main code changes: > > > > > > 1. Differentiate architectural features based on CPU flags > > > > > > a. Implement separated move functions for SSE/AVX/AVX2 to make > > > full utilization of cache bandwidth > > > > > > b. Implement separated copy flow specifically optimized for target > > > architecture > > > > > > 2. Rewrite the memcpy function "rte_memcpy" > > > > > > a. Add store aligning > > > > > > b. Add load aligning based on architectural features > > > > > > c. Put block copy loop into inline move functions for better > > > control of instruction order > > > > > > d. Eliminate unnecessary MOVs > > > > > > 3. Rewrite the inline move functions > > > > > > a. Add move functions for unaligned load cases > > > > > > b. Change instruction order in copy loops for better pipeline > > > utilization > > > > > > c. Use intrinsics instead of assembly code > > > > > > 4. Remove slow glibc call for constant copies > > > > > > Signed-off-by: Zhihong Wang > > > --- > > > .../common/include/arch/x86/rte_memcpy.h | 680 > > +++-- > > > 1 file changed, 509 insertions(+), 171 deletions(-) > > > > > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > > b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > > index fb9eba8..7b2d382 100644 > > > --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > > +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > > > @@ -34,166 +34,189 @@ > > > #ifndef _RTE_MEMCPY_X86_64_H_ > > > #define _RTE_MEMCPY_X86_64_H_ > > > > > > +/** > > > + * @file > > > + * > > > + * Functions for SSE/AVX/AVX2 implementation of memcpy(). > > > + */ > > > + > > > +#include > > > #include > > > #include > > > -#include > > > +#include > > > > > > #ifdef __cplusplus > > > extern "C" { > > > #endif > > > > > > -#include "generic/rte_memcpy.h" > > > +/** > > > + * Copy bytes from one location to another. The locations must not > > overlap. > > > + * > > > + * @note This is implemented as a macro, so it's address should not > > > +be taken > > > + * and care is needed as parameter expressions may be evaluated > > multiple times. > > > + * > > > + * @param dst > > > + * Pointer to the destination of the data. > > > + * @param src > > > + * Pointer to the source data. > > > + * @param n > > > + * Number of bytes to copy. > > > + * @return > > > + * Pointer to the destination data. > > > + */ > > > +static inline void * > > > +rte_memcpy(void *dst, const void *src, size_t n) > > > +__attribute__((always_inline)); > > > > > > -#ifdef __INTEL_COMPILER > > > -#pragma warning(disable:593) /* Stop unused variable warning (reg_a > > > etc). */ -#endif > > > +#ifdef RTE_MACHINE_CPUFLAG_AVX2 > > > > > > +/** > > > + * AVX2 implementation below > > > + */ > > > + > > > +/** > > > + * Copy 16 bytes from one location to another, > > > + * locations should not overlap. > > > + */ > > > static inline void > > > rte_mov16(uint8_t *dst, const uint8_t *src) { > > > - __m128i reg_a; > > > - asm volatile ( > > > - "movdqu (%[src]), %[reg_a]\n\t" > > > - "movdqu %[reg_a], (%[dst])\n\t" > > > - : [reg_a] "=x" (reg_a) > > > - : [src] "r" (src), > > > - [dst] "r"(dst) > > > - : "memory" > > > - ); > > > + __m128i xmm0; > > > + > > > + xmm0 = _mm_loadu_si128((const __m128i *)src); > > > + _mm_storeu_si128((__m128i *)dst, xmm0); > > > } > > > > > > +/** > > > + * Copy 32 bytes from one location to another, > > > + * locations should not overlap. > > > + */ > > > static inline void > > > rte_mov32(uint8_t *dst, const uint8_t *src) { > > > - __m128i reg_a, reg_b; > > > - asm volatile ( > > > - "movdqu (%[src]), %[reg_a]\n\t" > > > - "movdqu 16(%[src]), %[reg_b]\n\t" > > > - "movdqu %[
[dpdk-dev] [PATCH] mk: allow application to override clean
2015-01-29 22:36, Stephen Hemminger: > On Thu, 29 Jan 2015 02:52:45 -0800 > Thomas Monjalon wrote: > > > Hi Stephen, > > > > 2015-01-28 12:00, Olivier MATZ: > > > Hi Stephen, > > > > > > On 01/23/2015 07:19 AM, stephen at networkplumber.org wrote: > > > > From: Stephen Hemminger > > > > > > > > In some cases application may want to have additional rules > > > > for clean. This can be handled by allowing the double colon > > > > form of rule. > > > > > > > > > > > > https://www.gnu.org/software/make/manual/html_node/Double_002dColon.html > > > > > > There is already a way to do that in dpdk makefiles: you can add > > > the following code in your application Makefile, before the line > > > that includes $(RTE_SDK)/mk/rte.app.mk: > > > > > > POSTCLEAN += my_clean > > > > > > .PHONY: my_clean > > > my_clean: > > > @echo executed after clean > > > > Does it fit with your needs? > > Should we revert your patch? Double-colon rules were avoided in DPDK. > > "Double-colon rules are somewhat obscure and not often very useful" > > > > Works for me. So the commit is now reverted: http://dpdk.org/browse/dpdk/commit/?id=785e1a0932b67136 -- Thomas
[dpdk-dev] [PATCH 0/2] new headroom stats library and example application
> -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Thursday, January 29, 2015 8:13 PM > To: Wodkowski, PawelX > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 0/2] new headroom stats library and example > application > > On Thu, Jan 29, 2015 at 05:10:36PM +, Wodkowski, PawelX wrote: > > > -Original Message- > > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > > Sent: Thursday, January 29, 2015 2:25 PM > > > To: Wodkowski, PawelX > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH 0/2] new headroom stats library and > example > > > application > > > > > > On Thu, Jan 29, 2015 at 12:50:04PM +0100, Pawel Wodkowski wrote: > > > > Hi community, > > > > I would like to introduce library for measuring load of some arbitrary > > > > jobs. > It > > > > can be used to profile every kind of job sets on any arbitrary > > > > execution unit. > > > > In provided l2fwd-headroom example I demonstrate how to use this library > to > > > > profile packet forwarding (job set is froward, flush and stats) on > > > > LCores > > > > (execution unit). This example does no limit possible schemes on which > > > > this > > > > library can be used. > > > > > > > > Pawel Wodkowski (2): > > > > librte_headroom: New library for checking core/system/app load > > > > examples: introduce new l2fwd-headroom example > > > > > > > > config/common_bsdapp |6 + > > > > config/common_linuxapp |6 + > > > > examples/Makefile |1 + > > > > examples/l2fwd-headroom/Makefile | 51 +++ > > > > examples/l2fwd-headroom/main.c | 875 > > > > > > > lib/Makefile |1 + > > > > lib/librte_headroom/Makefile | 50 +++ > > > > lib/librte_headroom/rte_headroom.c | 368 +++ > > > > lib/librte_headroom/rte_headroom.h | 481 > > > > mk/rte.app.mk |4 + > > > > 10 files changed, 1843 insertions(+) > > > > create mode 100644 examples/l2fwd-headroom/Makefile > > > > create mode 100644 examples/l2fwd-headroom/main.c > > > > create mode 100644 lib/librte_headroom/Makefile > > > > create mode 100644 lib/librte_headroom/rte_headroom.c > > > > create mode 100644 lib/librte_headroom/rte_headroom.h > > > > > > > > -- > > > > 1.7.9.5 > > > > > > > > > > > > > > Whats the advantage of this library over the other tools to preform the > > > same > > > function. > > > > Hi Neil, > > > > Good point, what is advantage over perf. Answer is: this library does not > > supposed to be a perf competition and is not for profiling app in the way > > perf > does. > > It is an small and fast extension. It's main task is to manage job list to > > invoke > > them exactly when needed and provide some basic stats about application idle > > time (whatever programmer will consider the idle) and busy time. > > > > For example: > > application might decide to add remove some jobs to/from LCore(s) > dynamically > > basing on current idle time (ex: move job from one core to another). > > > > Also application might have some information's about traffic type it handles > > and provide own algorithm to calculate invocation time (it can also > dynamically > > switch between those algorithms only replacing handlers). > > > > > Perf can provide all the information in this library, and do so > > > without having to directly modify the source for the execution unit under > test > > > > Yes, perf can provide those information's but it can't handle the case when > > you are poling for packets too fast or too slow and waist time getting only > couple > > of them. Library will adjust time when it execute job basing on value this > > job > > returned previously. Code modifications are not so deep, as you can see > comparing > > l2wf vs l2fwd-headroom app. > > > > For example in application I introduced, when forward job return less than > > MAX_PKT_BURST execution period will be increased. If it return more it will > decrease > > execution period. Stats provided for that can be used to determine if > application is > > behaving correctly and if there is a time for handling another port (what > > did for > tests). > > > You're still re-inventing the wheel here, and I don't see any advantage to > doing > so. If the goal of the library is to profile the run time of a task, then you > have perf and systemtap for such purposes. If the goal is to create a job > scheduler that allows you to track multiple parallel tasks, and adjust their > execution, there are several pre-existing libraries that any application > programmer can already leverage to do just that (Berkely UPC or libtask to > name > just two examples). Truthfully, on a dedicated cpu, you could just as easily > create multiple child processes runnnig at SCHED_RR and set their priorities > accordingly. > > I don't see why we need another library to do what severa
[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Friday, January 30, 2015 11:21 AM > To: Wodkowski, PawelX > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header > files > > Hi Pawel, > > > Signed-off-by: Pawel Wodkowski > > --- > > lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 > > lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 > > Why adding extern C in a private header file? > > -- > Thomas To be consistent with rte_eth_bond_private.h where it is included. -- Pawel
[dpdk-dev] [PATCH 0/4] Link Bonding mode 6 support (ALB)
This patchset add support for link bonding mode 6. Additionally it changes an arp_header structure definition. Also a basic example is introduced. Using this example, Bonding will configure each client ARP table, that packets from each client will be received on different slave, mode 6 uses round-robin policy to assign slave to client IP address. Michal Jastrzebski (4): net: changed arp_hdr struct declaration. bond: added link bonding mode 6 implementation. bond: add debug info for mode 6 link bonding bond: added example application for link bonding mode 6. app/test-pmd/icmpecho.c| 27 +- config/common_linuxapp |2 +- examples/bond/Makefile | 57 ++ examples/bond/main.c | 790 examples/bond/main.h | 46 ++ lib/librte_net/rte_arp.h | 13 +- lib/librte_pmd_bond/Makefile |1 + lib/librte_pmd_bond/rte_eth_bond.h |9 + lib/librte_pmd_bond/rte_eth_bond_alb.c | 251 + lib/librte_pmd_bond/rte_eth_bond_alb.h | 109 lib/librte_pmd_bond/rte_eth_bond_api.c |6 + lib/librte_pmd_bond/rte_eth_bond_args.c|1 + lib/librte_pmd_bond/rte_eth_bond_pmd.c | 355 - lib/librte_pmd_bond/rte_eth_bond_private.h |2 + 14 files changed, 1623 insertions(+), 46 deletions(-) create mode 100644 examples/bond/Makefile create mode 100644 examples/bond/main.c create mode 100644 examples/bond/main.h create mode 100644 lib/librte_pmd_bond/rte_eth_bond_alb.c create mode 100644 lib/librte_pmd_bond/rte_eth_bond_alb.h -- 1.7.9.5
[dpdk-dev] [PATCH 1/4] net: changed arp_hdr struct declaration.
Changed MAC address type from uint8_t[6] to struct ether_addr and IP address type from uint8_t[4] to uint32_t. Also removed union from arp_hdr struct. Updated test-pmd to match new arp_hdr version. Signed-off-by: Maciej Gajdzica --- app/test-pmd/icmpecho.c | 27 ++- lib/librte_net/rte_arp.h | 13 ++--- 2 files changed, 16 insertions(+), 24 deletions(-) diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c index 08ea01d..010c5a9 100644 --- a/app/test-pmd/icmpecho.c +++ b/app/test-pmd/icmpecho.c @@ -371,18 +371,14 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs) continue; } if (verbose_level > 0) { - memcpy(ð_addr, - arp_h->arp_data.arp_ip.arp_sha, 6); + ether_addr_copy(&arp_h->arp_data.arp_sha, ð_addr); ether_addr_dump("sha=", ð_addr); - memcpy(&ip_addr, - arp_h->arp_data.arp_ip.arp_sip, 4); + ip_addr = arp_h->arp_data.arp_sip; ipv4_addr_dump(" sip=", ip_addr); printf("\n"); - memcpy(ð_addr, - arp_h->arp_data.arp_ip.arp_tha, 6); + ether_addr_copy(&arp_h->arp_data.arp_tha, ð_addr); ether_addr_dump("tha=", ð_addr); - memcpy(&ip_addr, - arp_h->arp_data.arp_ip.arp_tip, 4); + ip_addr = arp_h->arp_data.arp_tip; ipv4_addr_dump(" tip=", ip_addr); printf("\n"); } @@ -402,17 +398,14 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs) ð_h->s_addr); arp_h->arp_op = rte_cpu_to_be_16(ARP_OP_REPLY); - memcpy(ð_addr, arp_h->arp_data.arp_ip.arp_tha, 6); - memcpy(arp_h->arp_data.arp_ip.arp_tha, - arp_h->arp_data.arp_ip.arp_sha, 6); - memcpy(arp_h->arp_data.arp_ip.arp_sha, - ð_h->s_addr, 6); + ether_addr_copy(&arp_h->arp_data.arp_tha, ð_addr); + ether_addr_copy(&arp_h->arp_data.arp_sha, &arp_h->arp_data.arp_tha); + ether_addr_copy(ð_addr, &arp_h->arp_data.arp_sha); /* Swap IP addresses in ARP payload */ - memcpy(&ip_addr, arp_h->arp_data.arp_ip.arp_sip, 4); - memcpy(arp_h->arp_data.arp_ip.arp_sip, - arp_h->arp_data.arp_ip.arp_tip, 4); - memcpy(arp_h->arp_data.arp_ip.arp_tip, &ip_addr, 4); + ip_addr = arp_h->arp_data.arp_sip; + arp_h->arp_data.arp_sip = arp_h->arp_data.arp_tip; + arp_h->arp_data.arp_tip = ip_addr; pkts_burst[nb_replies++] = pkt; continue; } diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h index c7b0e51..72108a1 100644 --- a/lib/librte_net/rte_arp.h +++ b/lib/librte_net/rte_arp.h @@ -39,6 +39,7 @@ */ #include +#include #ifdef __cplusplus extern "C" { @@ -48,10 +49,10 @@ extern "C" { * ARP header IPv4 payload. */ struct arp_ipv4 { - uint8_t arp_sha[6]; /* sender hardware address */ - uint8_t arp_sip[4]; /* sender IP address */ - uint8_t arp_tha[6]; /* target hardware address */ - uint8_t arp_tip[4]; /* target IP address */ + struct ether_addr arp_sha; /* sender hardware address */ + uint32_t arp_sip; /* sender IP address */ + struct ether_addr arp_tha; /* target hardware address */ + uint32_t arp_tip; /* target IP address */ } __attribute__((__packed__)); /** @@ -72,9 +73,7 @@ struct arp_hdr { #defineARP_OP_INVREQUEST 8 /* request to identify peer */ #defineARP_OP_INVREPLY 9 /* response identifying peer */ - union { - struct arp_ipv4 arp_ip; - } arp_data; + struct arp_ipv4 arp_data; } __attribute__((__packed__)); #ifdef __cplusplus -- 1.7.9.5
[dpdk-dev] [PATCH 2/4] bond: added link bonding mode 6 implementation.
This mode includes adaptive TLB and receive load balancing (RLB). In RLB the bonding driver intercepts ARP replies send by local system and overwrites its source MAC address, so that different peers send data to the server on different slave interfaces. When local system sends ARP request, it saves IP information from it. When ARP reply from that peer is received, its MAC is stored, one of slave MACs assigned and ARP reply send to that peer. Signed-off-by: Maciej Gajdzica --- lib/librte_pmd_bond/Makefile |1 + lib/librte_pmd_bond/rte_eth_bond.h |9 + lib/librte_pmd_bond/rte_eth_bond_alb.c | 251 lib/librte_pmd_bond/rte_eth_bond_alb.h | 109 lib/librte_pmd_bond/rte_eth_bond_api.c |6 + lib/librte_pmd_bond/rte_eth_bond_args.c|1 + lib/librte_pmd_bond/rte_eth_bond_pmd.c | 231 ++--- lib/librte_pmd_bond/rte_eth_bond_private.h |2 + 8 files changed, 589 insertions(+), 21 deletions(-) create mode 100644 lib/librte_pmd_bond/rte_eth_bond_alb.c create mode 100644 lib/librte_pmd_bond/rte_eth_bond_alb.h diff --git a/lib/librte_pmd_bond/Makefile b/lib/librte_pmd_bond/Makefile index cdff126..d111f0c 100644 --- a/lib/librte_pmd_bond/Makefile +++ b/lib/librte_pmd_bond/Makefile @@ -46,6 +46,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_api.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_pmd.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_args.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_8023ad.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_alb.c ifeq ($(CONFIG_RTE_MBUF_REFCNT),n) $(info WARNING: Link Bonding Broadcast mode is disabled because it needs MBUF_REFCNT.) diff --git a/lib/librte_pmd_bond/rte_eth_bond.h b/lib/librte_pmd_bond/rte_eth_bond.h index 7177983..13581cb 100644 --- a/lib/librte_pmd_bond/rte_eth_bond.h +++ b/lib/librte_pmd_bond/rte_eth_bond.h @@ -101,6 +101,15 @@ extern "C" { * This mode provides an adaptive transmit load balancing. It dynamically * changes the transmitting slave, according to the computed load. Statistics * are collected in 100ms intervals and scheduled every 10ms */ +#define BONDING_MODE_ALB (6) +/**< Adaptive Load Balancing (Mode 6) + * This mode includes adaptive TLB and receive load balancing (RLB). In RLB the + * bonding driver intercepts ARP replies send by local system and overwrites its + * source MAC address, so that different peers send data to the server on + * different slave interfaces. When local system sends ARP request, it saves IP + * information from it. When ARP reply from that peer is received, its MAC is + * stored, one of slave MACs assigned and ARP reply send to that peer. + */ /* Balance Mode Transmit Policies */ #define BALANCE_XMIT_POLICY_LAYER2 (0) diff --git a/lib/librte_pmd_bond/rte_eth_bond_alb.c b/lib/librte_pmd_bond/rte_eth_bond_alb.c new file mode 100644 index 000..449b2f8 --- /dev/null +++ b/lib/librte_pmd_bond/rte_eth_bond_alb.c @@ -0,0 +1,251 @@ +#include "rte_eth_bond_private.h" +#include "rte_eth_bond_alb.h" + +static inline uint8_t +simple_hash(uint8_t *hash_start, int hash_size) +{ + int i; + uint8_t hash; + + hash = 0; + for (i = 0; i < hash_size; ++i) + hash ^= hash_start[i]; + + return hash; +} + +static uint8_t +calculate_slave(struct bond_dev_private *internals) +{ + uint8_t idx; + + idx = (internals->mode6.last_slave + 1)%internals->active_slave_count; + return internals->active_slaves[idx]; +} + +int +bond_mode_alb_enable(struct rte_eth_dev *bond_dev) +{ + struct bond_dev_private *internals = bond_dev->data->dev_private; + struct client_data *hash_table = internals->mode6.client_table; + + uint16_t element_size; + char mem_name[RTE_ETH_NAME_MAX_LEN]; + int socket_id = bond_dev->pci_dev->numa_node; + + /* Fill hash table with initial values */ + memset(hash_table, 0, sizeof(struct client_data) * ALB_HASH_TABLE_SIZE); + + internals->mode6.last_slave = ALB_NULL_INDEX; + internals->mode6.ntt = 0; + + /* Initialize memory pool for ARP packets to send */ + if (internals->mode6.mempool == NULL) { + /* +* 256 is size of ETH header, ARP header and nested VLAN headers. +* The value is chosen to be cache aligned. +*/ + element_size = 256 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM; + snprintf(mem_name, sizeof(mem_name), "%s_MODE6", bond_dev->data->name); + internals->mode6.mempool = rte_mempool_create(mem_name, + 512 * RTE_MAX_ETHPORTS, + element_size, + RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? + 32 : RTE_MEMPOOL_CACHE_MAX_SIZE, + sizeof(struct rte_pktmbu
[dpdk-dev] [PATCH 3/4] bond: add debug info for mode 6 link bonding
Signed-off-by: Michal Jastrzebski --- config/common_linuxapp |2 +- lib/librte_pmd_bond/rte_eth_bond_pmd.c | 124 2 files changed, 125 insertions(+), 1 deletion(-) diff --git a/config/common_linuxapp b/config/common_linuxapp index 2f9643b..1cc2d7e 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -220,7 +220,7 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=n # Compile link bonding PMD library # CONFIG_RTE_LIBRTE_PMD_BOND=y - +CONFIG_RTE_LIBRTE_BOND_DEBUG_ALB=n # # Compile software PMD backed by AF_PACKET sockets (Linux only) # diff --git a/lib/librte_pmd_bond/rte_eth_bond_pmd.c b/lib/librte_pmd_bond/rte_eth_bond_pmd.c index b0525cc..348c653 100644 --- a/lib/librte_pmd_bond/rte_eth_bond_pmd.c +++ b/lib/librte_pmd_bond/rte_eth_bond_pmd.c @@ -208,6 +208,78 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, return num_rx_total; } +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB +uint32_t burstnumberRX; +uint32_t burstnumberTX; + +static void +arp_op_name(uint16_t arp_op, char *buf) +{ + switch (arp_op) { + case ARP_OP_REQUEST: + snprintf(buf, sizeof("ARP Request"), "%s", "ARP Request"); + return; + case ARP_OP_REPLY: + snprintf(buf, sizeof("ARP Reply"), "%s", "ARP Reply"); + return; + case ARP_OP_REVREQUEST: + snprintf(buf, sizeof("Reverse ARP Request"), "%s", "Reverse ARP Request"); + return; + case ARP_OP_REVREPLY: + snprintf(buf, sizeof("Reverse ARP Reply"), "%s", "Reverse ARP Reply"); + return; + case ARP_OP_INVREQUEST: + snprintf(buf, sizeof("Peer Identify Request"), "%s", "Peer Identify Request"); + return; + case ARP_OP_INVREPLY: + snprintf(buf, sizeof("Peer Identify Reply"), "%s", "Peer Identify Reply"); + return; + default: + break; + } + snprintf(buf, sizeof("Unknown"), "%s", "Unknown"); + return; +} +#define MaxIPv4String 16 +static void +ipv4_addr_to_dot(uint32_t be_ipv4_addr, char *buf, uint8_t buf_size) +{ + uint32_t ipv4_addr; + + ipv4_addr = rte_be_to_cpu_32(be_ipv4_addr); + snprintf(buf, buf_size, "%d.%d.%d.%d", (ipv4_addr >> 24) & 0xFF, + (ipv4_addr >> 16) & 0xFF, (ipv4_addr >> 8) & 0xFF, + ipv4_addr & 0xFF); +} + +#define MODE6_DEBUG(info, src_ip, dst_ip, eth_h, arp_op, port, burstnumber) \ + RTE_LOG(DEBUG, PMD, info \ + "port:%d " \ + "SrcMAC:%02X:%02X:%02X:%02X:%02X:%02X " \ + "SrcIP:%s " \ + "DstMAC:%02X:%02X:%02X:%02X:%02X:%02X " \ + "DstIP:%s " \ + "%s " \ + "%d\n", \ + port, \ + eth_h->s_addr.addr_bytes[0], \ + eth_h->s_addr.addr_bytes[1], \ + eth_h->s_addr.addr_bytes[2], \ + eth_h->s_addr.addr_bytes[3], \ + eth_h->s_addr.addr_bytes[4], \ + eth_h->s_addr.addr_bytes[5], \ + src_ip, \ + eth_h->d_addr.addr_bytes[0], \ + eth_h->d_addr.addr_bytes[1], \ + eth_h->d_addr.addr_bytes[2], \ + eth_h->d_addr.addr_bytes[3], \ + eth_h->d_addr.addr_bytes[4], \ + eth_h->d_addr.addr_bytes[5], \ + dst_ip, \ + arp_op, \ + ++burstnumber) +#endif + static uint16_t bond_ethdev_rx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { @@ -222,6 +294,13 @@ bond_ethdev_rx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) int i; nb_recv_pkts = bond_ethdev_rx_burst(queue, bufs, nb_pkts); +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB + struct arp_hdr *arp_h; + struct ipv4_hdr *ipv4_h; + char src_ip[16]; + char dst_ip[16]; + char ArpOp[24]; +#endif for (i = 0; i < nb_recv_pkts; i++) { eth_h = rte_pktmbuf_mtod(bufs[i], struct ether_hdr *); @@ -229,8 +308,23 @@ bond_ethdev_rx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) ether_type = get_vlan_ethertype(eth_h); if (ether_type == rte_cpu_to_be_16(ETHER_TYPE_ARP)) { +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB + arp_h = (struct arp_hdr *)((char *)(eth_h + 1) + offset); + ipv4_addr_to_dot(arp_h->arp_data.arp_sip, src_ip, MaxIPv4String); + ipv4_addr_to_dot(arp_h->arp_data.arp_tip, dst_ip, MaxIPv4String); + arp_op_name(rte_be_to_cpu_16(arp_h->arp_op), ArpOp); + MODE6_DEBUG("RX ARP:", src_ip, dst_ip, eth_h, ArpOp, bufs[i]->port, burstnumberRX); +#endif bond_mode_alb_arp_recv(eth_h, offset, internals); } +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB + else if (ether_type ==
[dpdk-dev] [PATCH 4/4] bond: added example application for link bonding mode 6.
Signed-off-by: Michal Jastrzebski Signed-off-by: Maciej Gajdzica --- examples/bond/Makefile | 57 examples/bond/main.c | 790 examples/bond/main.h | 46 +++ 3 files changed, 893 insertions(+) create mode 100644 examples/bond/Makefile create mode 100644 examples/bond/main.c create mode 100644 examples/bond/main.h diff --git a/examples/bond/Makefile b/examples/bond/Makefile new file mode 100644 index 000..9262249 --- /dev/null +++ b/examples/bond/Makefile @@ -0,0 +1,57 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = bond_app + +# all source are stored in SRCS-y +SRCS-y := main.c + +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +CFLAGS += -O3 + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/bond/main.c b/examples/bond/main.c new file mode 100644 index 000..57cc672 --- /dev/null +++ b/examples/bond/main.c @@ -0,0 +1,790 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#inclu
[dpdk-dev] [PATCH 3/4] bond: add debug info for mode 6 link bonding
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Friday, January 30, 2015 11:58 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 3/4] bond: add debug info for mode 6 link > bonding > > > Signed-off-by: Michal Jastrzebski > --- > config/common_linuxapp |2 +- > lib/librte_pmd_bond/rte_eth_bond_pmd.c | 124 > > 2 files changed, 125 insertions(+), 1 deletion(-) > > diff --git a/config/common_linuxapp b/config/common_linuxapp > index 2f9643b..1cc2d7e 100644 > --- a/config/common_linuxapp > +++ b/config/common_linuxapp > @@ -220,7 +220,7 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=n > # Compile link bonding PMD library > # > CONFIG_RTE_LIBRTE_PMD_BOND=y > - > +CONFIG_RTE_LIBRTE_BOND_DEBUG_ALB=n > # > # Compile software PMD backed by AF_PACKET sockets (Linux only) > # > diff --git a/lib/librte_pmd_bond/rte_eth_bond_pmd.c > b/lib/librte_pmd_bond/rte_eth_bond_pmd.c > index b0525cc..348c653 100644 > --- a/lib/librte_pmd_bond/rte_eth_bond_pmd.c > +++ b/lib/librte_pmd_bond/rte_eth_bond_pmd.c > @@ -208,6 +208,78 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct > rte_mbuf **bufs, > return num_rx_total; > } > > +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB > +uint32_t burstnumberRX; > +uint32_t burstnumberTX; > + > +static void > +arp_op_name(uint16_t arp_op, char *buf) > +{ > + switch (arp_op) { > + case ARP_OP_REQUEST: > + snprintf(buf, sizeof("ARP Request"), "%s", "ARP Request"); > + return; > + case ARP_OP_REPLY: > + snprintf(buf, sizeof("ARP Reply"), "%s", "ARP Reply"); > + return; > + case ARP_OP_REVREQUEST: > + snprintf(buf, sizeof("Reverse ARP Request"), "%s", "Reverse > ARP Request"); > + return; > + case ARP_OP_REVREPLY: > + snprintf(buf, sizeof("Reverse ARP Reply"), "%s", "Reverse ARP > Reply"); > + return; > + case ARP_OP_INVREQUEST: > + snprintf(buf, sizeof("Peer Identify Request"), "%s", "Peer > Identify Request"); > + return; > + case ARP_OP_INVREPLY: > + snprintf(buf, sizeof("Peer Identify Reply"), "%s", "Peer > Identify Reply"); > + return; > + default: > + break; > + } > + snprintf(buf, sizeof("Unknown"), "%s", "Unknown"); > + return; > +} > +#define MaxIPv4String16 > +static void > +ipv4_addr_to_dot(uint32_t be_ipv4_addr, char *buf, uint8_t buf_size) > +{ > + uint32_t ipv4_addr; > + > + ipv4_addr = rte_be_to_cpu_32(be_ipv4_addr); > + snprintf(buf, buf_size, "%d.%d.%d.%d", (ipv4_addr >> 24) & 0xFF, > + (ipv4_addr >> 16) & 0xFF, (ipv4_addr >> 8) & 0xFF, > + ipv4_addr & 0xFF); > +} > + > +#define MODE6_DEBUG(info, src_ip, dst_ip, eth_h, arp_op, port, > burstnumber) \ > + RTE_LOG(DEBUG, PMD, info \ > + "port:%d " \ > + "SrcMAC:%02X:%02X:%02X:%02X:%02X:%02X " \ > + "SrcIP:%s " \ > + "DstMAC:%02X:%02X:%02X:%02X:%02X:%02X " \ > + "DstIP:%s " \ > + "%s " \ > + "%d\n", \ > + port, \ > + eth_h->s_addr.addr_bytes[0], \ > + eth_h->s_addr.addr_bytes[1], \ > + eth_h->s_addr.addr_bytes[2], \ > + eth_h->s_addr.addr_bytes[3], \ > + eth_h->s_addr.addr_bytes[4], \ > + eth_h->s_addr.addr_bytes[5], \ > + src_ip, \ > + eth_h->d_addr.addr_bytes[0], \ > + eth_h->d_addr.addr_bytes[1], \ > + eth_h->d_addr.addr_bytes[2], \ > + eth_h->d_addr.addr_bytes[3], \ > + eth_h->d_addr.addr_bytes[4], \ > + eth_h->d_addr.addr_bytes[5], \ > + dst_ip, \ > + arp_op, \ > + ++burstnumber) > +#endif > + > static uint16_t > bond_ethdev_rx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t > nb_pkts) > { > @@ -222,6 +294,13 @@ bond_ethdev_rx_burst_alb(void *queue, struct > rte_mbuf **bufs, uint16_t nb_pkts) > int i; > > nb_recv_pkts = bond_ethdev_rx_burst(queue, bufs, nb_pkts); > +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB > + struct arp_hdr *arp_h; > + struct ipv4_hdr *ipv4_h; > + char src_ip[16]; > + char dst_ip[16]; > + char ArpOp[24]; > +#endif > > for (i = 0; i < nb_recv_pkts; i++) { > eth_h = rte_pktmbuf_mtod(bufs[i], struct ether_hdr *); > @@ -229,8 +308,23 @@ bond_ethdev_rx_burst_alb(void *queue, struct > rte_mbuf **bufs, uint16_t nb_pkts) > ether_type = get_vlan_ethertype(eth_h); > > if (ether_type == rte_cpu_to_be_16(ETHER_TYPE_ARP)) { > +#ifdef RTE_LIBRTE_BOND_DEBUG_ALB > + arp_h = (struct arp_hdr *)((char *)(eth_h + 1) + > offset); > + ipv4_addr_to_dot(arp_h->arp_data.arp_sip, src_ip, > MaxIPv4String); > + ipv4_addr_to_dot(arp_h->arp_data.arp_tip
[dpdk-dev] [PATCH] acl: remove standalone header
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > Sent: Thursday, January 29, 2015 10:42 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] acl: remove standalone header > > This is a duplication of some EAL parts for a standalone packaging > which is not documented. > Packaging should be done outside of DPDK. > > Signed-off-by: Thomas Monjalon > --- Acked-by: Konstantin Ananyev Thanks for fixing it for me.
[dpdk-dev] [PATCH 4/4] bond: added example application for link bonding mode 6.
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski > Sent: Friday, January 30, 2015 11:58 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 4/4] bond: added example application for link > bonding mode 6. > > > Signed-off-by: Michal Jastrzebski > Signed-off-by: Maciej Gajdzica > --- > examples/bond/Makefile | 57 > examples/bond/main.c | 790 > > examples/bond/main.h | 46 +++ > 3 files changed, 893 insertions(+) > create mode 100644 examples/bond/Makefile > create mode 100644 examples/bond/main.c > create mode 100644 examples/bond/main.h > > diff --git a/examples/bond/Makefile b/examples/bond/Makefile > new file mode 100644 > index 000..9262249 > --- /dev/null > +++ b/examples/bond/Makefile > @@ -0,0 +1,57 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products derived > +# from this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT > NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS > OF USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED > AND ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF > THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > + > +ifeq ($(RTE_SDK),) > +$(error "Please define RTE_SDK environment variable") > +endif > + > +# Default target, can be overriden by command line or environment > +RTE_TARGET ?= x86_64-native-linuxapp-gcc > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# binary name > +APP = bond_app > + > +# all source are stored in SRCS-y > +SRCS-y := main.c > + > +CFLAGS += $(WERROR_FLAGS) > + > +# workaround for a gcc bug with noreturn attribute > +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 > +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) > +CFLAGS_main.o += -Wno-return-type > +endif > + > +CFLAGS += -O3 > + > +include $(RTE_SDK)/mk/rte.extapp.mk > diff --git a/examples/bond/main.c b/examples/bond/main.c > new file mode 100644 > index 000..57cc672 > --- /dev/null > +++ b/examples/bond/main.c > @@ -0,0 +1,790 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT > NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS > OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED > AND
[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files
On 30/01/15 10:56, Wodkowski, PawelX wrote: >> -Original Message- >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] >> Sent: Friday, January 30, 2015 11:21 AM >> To: Wodkowski, PawelX >> Cc: dev at dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 >> header >> files >> >> Hi Pawel, >> >>> Signed-off-by: Pawel Wodkowski >>> --- >>> lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 >>> lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 >> >> Why adding extern C in a private header file? >> >> -- >> Thomas > > To be consistent with rte_eth_bond_private.h where it is included. > > We only need the decls on the public headers exported by the librte_pmd_bond makefile, so there is no need to modify rte_eth_bond_private.h as it should never be linked to directly by external code.
[dpdk-dev] mmap failed: Cannot allocate memory when init dpdk eal
Hi ?all I am suffering from the problem mmap failed as followed when init dpdk eal. Fri Jan 30 09:03:29 2015:EAL: Setting up memory... Fri Jan 30 09:03:34 2015:EAL: map_all_hugepages(): mmap failed: Cannot allocate memory Fri Jan 30 09:03:34 2015:EAL: Failed to mmap 2 MB hugepages Fri Jan 30 09:03:34 2015:EAL: Cannot init memory Before I run the demo, the free hugepages of my host is : cat /proc/meminfo MemTotal: 132117056 kB MemFree:122040292 kB Buffers: 10984 kB Cached: 123056 kB SwapCached:0 kB Active: 120812 kB Inactive: 85860 kB Active(anon): 79488 kB Inactive(anon): 364 kB Active(file): 41324 kB Inactive(file):85496 kB Unevictable: 23576 kB Mlocked: 23576 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 2576 kB Writeback: 0 kB AnonPages: 96236 kB Mapped:19936 kB Shmem: 552 kB Slab: 101344 kB SReclaimable: 24164 kB SUnreclaim:77180 kB KernelStack:2544 kB PageTables: 4180 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:61864224 kB Committed_AS: 585844 kB VmallocTotal: 34359738367 kB VmallocUsed: 518656 kB VmallocChunk: 34292133264 kB HardwareCorrupted: 0 kB AnonHugePages: 4096 kB HugePages_Total:4096 HugePages_Free: 4096 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 96256 kB DirectMap2M: 6178816 kB DirectMap1G:127926272 kB And after the demo executed, I got the hugepages like this: cat /proc/meminfo MemTotal: 132117056 kB MemFree:117325180 kB Buffers: 33508 kB Cached: 721912 kB SwapCached:0 kB Active: 4217712 kB Inactive: 540956 kB Active(anon):4019068 kB Inactive(anon): 121136 kB Active(file): 198644 kB Inactive(file): 419820 kB Unevictable: 23908 kB Mlocked: 23908 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 2856 kB Writeback: 0 kB AnonPages: 4035184 kB Mapped: 160292 kB Shmem:122100 kB Slab: 177908 kB SReclaimable: 64808 kB SUnreclaim: 113100 kB KernelStack:7560 kB PageTables:62128 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:61864224 kB Committed_AS:8789664 kB VmallocTotal: 34359738367 kB VmallocUsed: 527296 kB VmallocChunk: 34292122604 kB HardwareCorrupted: 0 kB AnonHugePages:262144 kB HugePages_Total:4096 HugePages_Free: 2048 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 141312 kB DirectMap2M: 9279488 kB DirectMap1G:124780544 kB Only the hugepages beyond to node1 was mapped. I was told host(having 64bit OS) cannot allocate memory while node0 has 2048 free hugepages,why? Dose anyone encountered the similar problem ever? Any response will be appreciated! Thanks!
[dpdk-dev] [RFC PATCH v2 01/14] enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
On 2015/1/26 11:20, Huawei Xie wrote: > In virtnet_send_command: > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) || > (out + in > VIRTNET_SEND_COMMAND_SG_MAX)); > > Signed-off-by: Huawei Xie > --- > lib/librte_vhost/virtio-net.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c > index b041849..52b4957 100644 > --- a/lib/librte_vhost/virtio-net.c > +++ b/lib/librte_vhost/virtio-net.c > @@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root; > > /* Features supported by this lib. */ > #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ > - (1ULL << VIRTIO_NET_F_CTRL_RX)) > + (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ > + (1ULL << VIRTIO_NET_F_CTRL_RX)) > static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; > > /* Line size for reading maps file. */ > Hi,Xie If don't have features VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_CTRL_RX what would happen? Why add the two features? -- Regards, Haifeng
[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files
> -Original Message- > From: Doherty, Declan > Sent: Friday, January 30, 2015 12:42 PM > To: Wodkowski, PawelX; Thomas Monjalon > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header > files > > On 30/01/15 10:56, Wodkowski, PawelX wrote: > >> -Original Message- > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > >> Sent: Friday, January 30, 2015 11:21 AM > >> To: Wodkowski, PawelX > >> Cc: dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 > header > >> files > >> > >> Hi Pawel, > >> > >>> Signed-off-by: Pawel Wodkowski > >>> --- > >>> lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 > >>> lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 > >> > >> Why adding extern C in a private header file? > >> > >> -- > >> Thomas > > > > To be consistent with rte_eth_bond_private.h where it is included. > > > > > > We only need the decls on the public headers exported by the > librte_pmd_bond makefile, so there is no need to modify > rte_eth_bond_private.h as it should never be linked to directly by > external code. I modified rte_eth_bond_8023ad_private.h not rte_eth_bond_private.h. In rte_eth_bond_private.h those declarations are present already. If so those declarations should be removed from rte_eth_bond_private.h. I can do this in v2 if you accept this.
[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files
On 30/01/15 12:11, Wodkowski, PawelX wrote: > > -Original Message- > > From: Doherty, Declan > > Sent: Friday, January 30, 2015 12:42 PM > > To: Wodkowski, PawelX; Thomas Monjalon > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 > > header > > files > > > > On 30/01/15 10:56, Wodkowski, PawelX wrote: > >>> -Original Message- > >>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > >>> Sent: Friday, January 30, 2015 11:21 AM > >>> To: Wodkowski, PawelX > >>> Cc: dev at dpdk.org > >>> Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 > > header > >>> files > >>> > >>> Hi Pawel, > >>> > Signed-off-by: Pawel Wodkowski > --- > lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 > lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 > >>> > >>> Why adding extern C in a private header file? > >>> > >>> -- > >>> Thomas > >> > >> To be consistent with rte_eth_bond_private.h where it is included. > >> > >> > > > > We only need the decls on the public headers exported by the > > librte_pmd_bond makefile, so there is no need to modify > > rte_eth_bond_private.h as it should never be linked to directly by > > external code. > > I modified rte_eth_bond_8023ad_private.h not rte_eth_bond_private.h. > In rte_eth_bond_private.h those declarations are present already. > If so those declarations should be removed from rte_eth_bond_private.h. > > I can do this in v2 if you accept this. > Sure, that sounds good to me.
[dpdk-dev] mmap failed: Cannot allocate memory when init dpdk eal
On 2015/1/30 19:40, zhangsha (A) wrote: > Hi ?all > > I am suffering from the problem mmap failed as followed when init dpdk eal. > > Fri Jan 30 09:03:29 2015:EAL: Setting up memory... > Fri Jan 30 09:03:34 2015:EAL: map_all_hugepages(): mmap failed: Cannot > allocate memory > Fri Jan 30 09:03:34 2015:EAL: Failed to mmap 2 MB hugepages > Fri Jan 30 09:03:34 2015:EAL: Cannot init memory > > Before I run the demo, the free hugepages of my host is : > > cat /proc/meminfo > MemTotal: 132117056 kB > MemFree:122040292 kB > Buffers: 10984 kB > Cached: 123056 kB > SwapCached:0 kB > Active: 120812 kB > Inactive: 85860 kB > Active(anon): 79488 kB > Inactive(anon): 364 kB > Active(file): 41324 kB > Inactive(file):85496 kB > Unevictable: 23576 kB > Mlocked: 23576 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 2576 kB > Writeback: 0 kB > AnonPages: 96236 kB > Mapped:19936 kB > Shmem: 552 kB > Slab: 101344 kB > SReclaimable: 24164 kB > SUnreclaim:77180 kB > KernelStack:2544 kB > PageTables: 4180 kB > NFS_Unstable: 0 kB > Bounce:0 kB > WritebackTmp: 0 kB > CommitLimit:61864224 kB > Committed_AS: 585844 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 518656 kB > VmallocChunk: 34292133264 kB > HardwareCorrupted: 0 kB > AnonHugePages: 4096 kB > HugePages_Total:4096 > HugePages_Free: 4096 > HugePages_Rsvd:0 > HugePages_Surp:0 > Hugepagesize: 2048 kB > DirectMap4k: 96256 kB > DirectMap2M: 6178816 kB > DirectMap1G:127926272 kB > > And after the demo executed, I got the hugepages like this: > > cat /proc/meminfo > MemTotal: 132117056 kB > MemFree:117325180 kB > Buffers: 33508 kB > Cached: 721912 kB > SwapCached:0 kB > Active: 4217712 kB > Inactive: 540956 kB > Active(anon):4019068 kB > Inactive(anon): 121136 kB > Active(file): 198644 kB > Inactive(file): 419820 kB > Unevictable: 23908 kB > Mlocked: 23908 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 2856 kB > Writeback: 0 kB > AnonPages: 4035184 kB > Mapped: 160292 kB > Shmem:122100 kB > Slab: 177908 kB > SReclaimable: 64808 kB > SUnreclaim: 113100 kB > KernelStack:7560 kB > PageTables:62128 kB > NFS_Unstable: 0 kB > Bounce:0 kB > WritebackTmp: 0 kB > CommitLimit:61864224 kB > Committed_AS:8789664 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 527296 kB > VmallocChunk: 34292122604 kB > HardwareCorrupted: 0 kB > AnonHugePages:262144 kB > HugePages_Total:4096 > HugePages_Free: 2048 > HugePages_Rsvd:0 > HugePages_Surp:0 > Hugepagesize: 2048 kB > DirectMap4k: 141312 kB > DirectMap2M: 9279488 kB > DirectMap1G:124780544 kB > > Only the hugepages beyond to node1 was mapped. I was told host(having 64bit > OS) cannot allocate memory while node0 has 2048 free hugepages,why? > Dose anyone encountered the similar problem ever? > Any response will be appreciated! > Thanks! > > > > How do you tell kernel not to allocate memory on node0? I guess node0 and node1 both have 2048 hugepages and you want to mmap 4096 hugepages. So you can mmap 2048 hugepages on node1.After this step you cannot mmap any hugepage files because you tell kernel not to allocate memory on node0. -- Regards, Haifeng
[dpdk-dev] [PATCH v2 0/4] New Reorder Library
This series introduces the new reorder library along with unit tests, sample app and a new entry in the programmers guide describing the library. The library provides reordering of mbufs based on their sequence number. As mention in the patch describing the library, one use case is the packet distributor. The distributor receives packets, assigns them a sequence number and sends them to the workers. The workers process those packets and return them to the distributor. The distributor collects out-of-order packets from the workers and uses this library to reorder the packets based on the sequence number they were assigned. v2: - add programmers guide entry describing the library - use malloc instead of memzone to allocate memory - modify create and init implementation, init takes a reorder buffer as input and create reserves memory and call init. - update unit tests Sergio Gonzalez Monroy (4): reorder: new reorder library app: New reorder unit test examples: new sample app packet_ordering doc: new reorder library description app/test/Makefile | 2 + app/test/test_reorder.c| 393 +++ config/common_bsdapp | 5 + config/common_linuxapp | 5 + doc/guides/prog_guide/index.rst| 1 + doc/guides/prog_guide/reorder_lib.rst | 115 + examples/packet_ordering/Makefile | 50 ++ examples/packet_ordering/main.c| 637 + lib/Makefile | 1 + lib/librte_eal/common/include/rte_tailq_elem.h | 2 + lib/librte_mbuf/rte_mbuf.h | 3 + lib/librte_reorder/Makefile| 50 ++ lib/librte_reorder/rte_reorder.c | 416 lib/librte_reorder/rte_reorder.h | 181 +++ mk/rte.app.mk | 4 + 15 files changed, 1865 insertions(+) create mode 100644 app/test/test_reorder.c create mode 100644 doc/guides/prog_guide/reorder_lib.rst create mode 100644 examples/packet_ordering/Makefile create mode 100644 examples/packet_ordering/main.c create mode 100644 lib/librte_reorder/Makefile create mode 100644 lib/librte_reorder/rte_reorder.c create mode 100644 lib/librte_reorder/rte_reorder.h -- 1.9.3
[dpdk-dev] [PATCH v2 1/4] reorder: new reorder library
This library provides reordering capability for out of order mbufs based on a sequence number in the mbuf structure. Signed-off-by: Reshma Pattan Signed-off-by: Richardson Bruce Signed-off-by: Sergio Gonzalez Monroy --- config/common_bsdapp | 5 + config/common_linuxapp | 5 + lib/Makefile | 1 + lib/librte_eal/common/include/rte_tailq_elem.h | 2 + lib/librte_mbuf/rte_mbuf.h | 3 + lib/librte_reorder/Makefile| 50 +++ lib/librte_reorder/rte_reorder.c | 416 + lib/librte_reorder/rte_reorder.h | 181 +++ 8 files changed, 663 insertions(+) create mode 100644 lib/librte_reorder/Makefile create mode 100644 lib/librte_reorder/rte_reorder.c create mode 100644 lib/librte_reorder/rte_reorder.h diff --git a/config/common_bsdapp b/config/common_bsdapp index 9177db1..e3e0e94 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -334,6 +334,11 @@ CONFIG_RTE_SCHED_PORT_N_GRINDERS=8 CONFIG_RTE_LIBRTE_DISTRIBUTOR=y # +# Compile the reorder library +# +CONFIG_RTE_LIBRTE_REORDER=y + +# # Compile librte_port # CONFIG_RTE_LIBRTE_PORT=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 2f9643b..b5ec730 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -342,6 +342,11 @@ CONFIG_RTE_SCHED_PORT_N_GRINDERS=8 CONFIG_RTE_LIBRTE_DISTRIBUTOR=y # +# Compile the reorder library +# +CONFIG_RTE_LIBRTE_REORDER=y + +# # Compile librte_port # CONFIG_RTE_LIBRTE_PORT=y diff --git a/lib/Makefile b/lib/Makefile index 0ffc982..5919d32 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -65,6 +65,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += librte_distributor DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline +DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni diff --git a/lib/librte_eal/common/include/rte_tailq_elem.h b/lib/librte_eal/common/include/rte_tailq_elem.h index f74fc7c..3013869 100644 --- a/lib/librte_eal/common/include/rte_tailq_elem.h +++ b/lib/librte_eal/common/include/rte_tailq_elem.h @@ -84,6 +84,8 @@ rte_tailq_elem(RTE_TAILQ_ACL, "RTE_ACL") rte_tailq_elem(RTE_TAILQ_DISTRIBUTOR, "RTE_DISTRIBUTOR") +rte_tailq_elem(RTE_TAILQ_REORDER, "RTE_REORDER") + rte_tailq_end(RTE_TAILQ_NUM) #undef rte_tailq_elem diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 16059c6..ed27eb8 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -262,6 +262,9 @@ struct rte_mbuf { uint32_t usr; /**< User defined tags. See @rte_distributor_process */ } hash; /**< hash information */ + /* sequence number - field used in distributor and reorder library */ + uint32_t seqn; + /* second cache line - fields only used in slow path or on TX */ MARKER cacheline1 __rte_cache_aligned; diff --git a/lib/librte_reorder/Makefile b/lib/librte_reorder/Makefile new file mode 100644 index 000..12b916f --- /dev/null +++ b/lib/librte_reorder/Makefile @@ -0,0 +1,50 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
[dpdk-dev] [PATCH v2 3/4] examples: new sample app packet_ordering
This new app makes use of the librte_reorder library. It requires at least 3 lcores for RX, Workers (1 or more) and TX threads. Communication between RX-Workers and Workers-TX is done by using rings. The flow of mbufs is the following: * RX thread gets mbufs from driver, set sequence number and enqueue them in ring. * Workers dequeue mbufs from ring, do some 'work' and enqueue mbufs in ring. * TX dequeue mbufs from ring, inserts them in reorder buffer, drains mbufs from reorder and sends them to the driver. Signed-off-by: Reshma Pattan Signed-off-by: Sergio Gonzalez Monroy --- examples/packet_ordering/Makefile | 50 +++ examples/packet_ordering/main.c | 637 ++ 2 files changed, 687 insertions(+) create mode 100644 examples/packet_ordering/Makefile create mode 100644 examples/packet_ordering/main.c diff --git a/examples/packet_ordering/Makefile b/examples/packet_ordering/Makefile new file mode 100644 index 000..44bd2e1 --- /dev/null +++ b/examples/packet_ordering/Makefile @@ -0,0 +1,50 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overridden by command line or environment +RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = packet_ordering + +# all source are stored in SRCS-y +SRCS-y := main.c + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c new file mode 100644 index 000..8b65275 --- /dev/null +++ b/examples/packet_ordering/main.c @@ -0,0 +1,637 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U