[dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD
Hi, 2014-11-07 16:53, Patel, Rashmin N: > Yes, you're right DPDK VMXNET3-PMD in /lib/librte_pmd_vmxnet3 does not > support mbuf chaining today. But it's a standalone bsd driver just like > any other pmd in that directory, it does not need vmxnet3-usermap.ko module. > > Now there is another vmxnet3 solution in a separate branch as a plugin, > which must have vmxnet3-usermap.ko linux module(1), and a user space > interface piece(2) to tie it to any DPDK application in the main branch. > (1) and (2) makes the solution which is known as vmxnet3-plugin. It's been > there for a long time just like virtio-plugin, I don't know who uses it, > but community can *reply* here if there is still any need of a separate > solution that way. Coming back to last year, 6WIND developped some PMDs for virtio and vmxnet3. Later, Intel developped their own version using the uio framework. The versions in the main repository are the Intel ones, whereas the original ones from 6WIND are released as extensions. For completeness, it must be noted that Brocade worked on their own approach of vmxnet3 and contributed to virtio PMD. It's now time to merge all these implementations. The 6WIND implementations show that it's possible to avoid the uio framework. The virtio-net-pmd use port access granted by iopl(). The vmxnet3-usermap reuse the VMware's kernel module with a special mode for memory mapping. It was a pre-bifurcated logic. > I'm in favor of consolidating all those version into one elegant solution > by grabbing best features from all of them and maintain one copy. I'm sure > that developers contributing from VMware would also support that idea > because then it makes easy to maintain and debug and bug fix and most > importantly avoid such confusion in future. > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Aziz Hajee > Sent: Thursday, November 06, 2014 5:47 PM > To: dev at dpdk.org > Subject: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD > > I am using the dpdk1.6.0r1 > I could not find a complete clarification, sorry if missed. > VMXNET3 PMD > > I have enabled the VMXNET3 PMD in the dpdk. > # Compile burst-oriented VMXNET3 PMD driver # > > CONFIG_RTE_LIBRTE_VMXNET3_PMD=y > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_INIT=y > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_RX=n > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX=n > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n > The Intel DPDK VMXNET3 PMD driver does not support mbuf chaining, and I have > to set NOMULTSEGS for the vmxnet3 interface init to succeed. > tx_conf.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS Is there a later version of > DPDK that supports multiseg for the dpdk > VMXNET3 PMD. > > vmware vmxnet3-usermap AND DPDK VMXNET3 PMD > = > Is the vmxnet3-usermap.ko module driver also needed ? (appears that I need, > otherwise the eal initialise fails. > sudo insmod ./vmxnet3-usermap.ko enable_shm=2,2 num_rqs=1,1 num_rxds=2048 > num_txds=2048 > > I do not understand if VMXNET3 PMD is there, what is the purpose of > /vmxnet3-usermap.ko/vmxnet3-usermap.ko ? > > From some responses i saw that the following ifdef RTE_EAL_UNBIND_PORTS is > also need to be removed in lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c ? > { > .name = "rte_vmxnet3_pmd", > .id_table = pci_id_vmxnet3_map, -#ifdef RTE_EAL_UNBIND_PORTS > +// #ifdef RTE_EAL_UNBIND_PORTS > .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO, -#endif > +// #endif > }, > .eth_dev_init = eth_vmxnet3_dev_init, > .dev_private_size = sizeof(struct vmxnet3_adapter), > > thanks, > -aziz
[dpdk-dev] [PATCH v2 0/2] examples/vmdq: support new VMDQ API
This patch supports new VMDQ API in vmdq example. v2 changes: * code rebase * allow app to specify num_pools different with max_nb_pools * fix serious cs issues Huawei Xie (2): support new VMDQ API in vmdq example fix cs issues in vmdq example examples/vmdq/main.c | 233 ++- 1 file changed, 139 insertions(+), 94 deletions(-) -- 1.8.1.4
[dpdk-dev] [PATCH v2 1/2] examples/vmdq: support new VMDQ API
This patch supports new VMDQ API in vmdq example. Besides, it allows users to specify num_pools different with max_nb_poos, thus the polling thread needn't to poll queues of all pools. Due to i40e implmentation issue, there is no default mac for VMDQ pool, so app needs to specify mac address for each pool explicitly. Signed-off-by: Huawei Xie --- examples/vmdq/main.c | 169 +++ 1 file changed, 103 insertions(+), 66 deletions(-) diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c index c51e2fb..5a2305f 100644 --- a/examples/vmdq/main.c +++ b/examples/vmdq/main.c @@ -144,6 +144,13 @@ const uint16_t vlan_tags[] = { 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, }; +const uint16_t num_vlans = RTE_DIM(vlan_tags); +static uint16_t num_pf_queues, num_vmdq_queues; +static uint16_t vmdq_pool_base, vmdq_queue_base; +/* pool mac addr template, pool mac addr is like: 52 54 00 12 port# pool# */ +static struct ether_addr pool_addr_template = { + .addr_bytes = {0x52, 0x54, 0x00, 0x12, 0x00, 0x00} +}; /* ethernet addresses of ports */ static struct ether_addr vmdq_ports_eth_addr[RTE_MAX_ETHPORTS]; @@ -163,22 +170,9 @@ get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_pools) unsigned i; conf.nb_queue_pools = (enum rte_eth_nb_pools)num_pools; + conf.nb_pool_maps = num_pools; conf.enable_default_pool = 0; conf.default_pool = 0; /* set explicit value, even if not used */ - switch (num_pools) { - /* For 10G NIC like 82599, 128 is valid for queue number */ - case MAX_POOL_NUM_10G: - num_queues = MAX_QUEUE_NUM_10G; - conf.nb_pool_maps = MAX_POOL_MAP_NUM_10G; - break; - /* For 1G NIC like i350, 82580 and 82576, 8 is valid for queue number */ - case MAX_POOL_NUM_1G: - num_queues = MAX_QUEUE_NUM_1G; - conf.nb_pool_maps = MAX_POOL_MAP_NUM_1G; - break; - default: - return -1; - } for (i = 0; i < conf.nb_pool_maps; i++){ conf.pool_map[i].vlan_id = vlan_tags[ i ]; @@ -192,40 +186,6 @@ get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_pools) } /* - * Validate the pool number accrording to the max pool number gotten form dev_info - * If the pool number is invalid, give the error message and return -1 - */ -static inline int -validate_num_pools(uint32_t max_nb_pools) -{ - if (num_pools > max_nb_pools) { - printf("invalid number of pools\n"); - return -1; - } - - switch (max_nb_pools) { - /* For 10G NIC like 82599, 64 is valid for pool number */ - case MAX_POOL_NUM_10G: - if (num_pools != MAX_POOL_NUM_10G) { - printf("invalid number of pools\n"); - return -1; - } - break; - /* For 1G NIC like i350, 82580 and 82576, 8 is valid for pool number */ - case MAX_POOL_NUM_1G: - if (num_pools != MAX_POOL_NUM_1G) { - printf("invalid number of pools\n"); - return -1; - } - break; - default: - return -1; - } - - return 0; -} - -/* * Initialises a given port using global settings and with the rx buffers * coming from the mbuf_pool passed as parameter */ @@ -235,26 +195,57 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool) struct rte_eth_dev_info dev_info; struct rte_eth_rxconf *rxconf; struct rte_eth_conf port_conf; - uint16_t rxRings, txRings = (uint16_t)rte_lcore_count(); + uint16_t rxRings, txRings; const uint16_t rxRingSize = RTE_TEST_RX_DESC_DEFAULT, txRingSize = RTE_TEST_TX_DESC_DEFAULT; int retval; uint16_t q; + uint16_t queues_per_pool; uint32_t max_nb_pools; /* The max pool number from dev_info will be used to validate the pool number specified in cmd line */ rte_eth_dev_info_get (port, &dev_info); max_nb_pools = (uint32_t)dev_info.max_vmdq_pools; - retval = validate_num_pools(max_nb_pools); - if (retval < 0) - return retval; - - retval = get_eth_conf(&port_conf, num_pools); + /* +* We allow to process part of VMDQ pools specified by num_pools in +* command line. +*/ + if (num_pools > max_nb_pools) { + printf("num_pools %d >max_nb_pools %d\n", + num_pools, max_nb_pools); + return -1; + } + retval = get_eth_conf(&port_conf, max_nb_pools); if (retval < 0) return retval; + /* +* NIC queues are divided into pf queues and vmdq queues. +*/ + /* There is assumption here all ports have the same configuration! */ + num_pf_queues = dev_info.max_rx_queues - dev_info.v
[dpdk-dev] [PATCH v2 2/2] examples/vmdq: fix cs issues in vmdq example
Signed-off-by: Huawei Xie --- examples/vmdq/main.c | 64 +--- 1 file changed, 36 insertions(+), 28 deletions(-) diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c index 5a2305f..e60b671 100644 --- a/examples/vmdq/main.c +++ b/examples/vmdq/main.c @@ -92,7 +92,7 @@ #define INVALID_PORT_ID 0xFF /* mask of enabled ports */ -static uint32_t enabled_port_mask = 0; +static uint32_t enabled_port_mask; /* number of pools (if user does not specify any, 8 by default */ static uint32_t num_queues = 8; @@ -129,10 +129,10 @@ static const struct rte_eth_conf vmdq_conf_default = { static unsigned lcore_ids[RTE_MAX_LCORE]; static uint8_t ports[RTE_MAX_ETHPORTS]; -static unsigned num_ports = 0; /**< The number of ports specified in command line */ +static unsigned num_ports; /**< The number of ports specified in command line */ /* array used for printing out statistics */ -volatile unsigned long rxPackets[ MAX_QUEUES ] = {0}; +volatile unsigned long rxPackets[MAX_QUEUES] = {0}; const uint16_t vlan_tags[] = { 0, 1, 2, 3, 4, 5, 6, 7, @@ -161,8 +161,11 @@ static struct ether_addr vmdq_ports_eth_addr[RTE_MAX_ETHPORTS]; #define MAX_POOL_MAP_NUM_1G 32 #define MAX_POOL_NUM_10G 64 #define MAX_POOL_NUM_1G 8 -/* Builds up the correct configuration for vmdq based on the vlan tags array - * given above, and determine the queue number and pool map number according to valid pool number */ +/* + * Builds up the correct configuration for vmdq based on the vlan tags array + * given above, and determine the queue number and pool map number according to + * valid pool number + */ static inline int get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_pools) { @@ -174,8 +177,8 @@ get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_pools) conf.enable_default_pool = 0; conf.default_pool = 0; /* set explicit value, even if not used */ - for (i = 0; i < conf.nb_pool_maps; i++){ - conf.pool_map[i].vlan_id = vlan_tags[ i ]; + for (i = 0; i < conf.nb_pool_maps; i++) { + conf.pool_map[i].vlan_id = vlan_tags[i]; conf.pool_map[i].pools = (1UL << (i % num_pools)); } @@ -202,8 +205,11 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool) uint16_t queues_per_pool; uint32_t max_nb_pools; - /* The max pool number from dev_info will be used to validate the pool number specified in cmd line */ - rte_eth_dev_info_get (port, &dev_info); + /* +* The max pool number from dev_info will be used to validate the pool +* number specified in cmd line +*/ + rte_eth_dev_info_get(port, &dev_info); max_nb_pools = (uint32_t)dev_info.max_vmdq_pools; /* * We allow to process part of VMDQ pools specified by num_pools in @@ -234,7 +240,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool) num_pf_queues, num_pools, queues_per_pool); printf("vmdq queue base: %d pool base %d\n", vmdq_queue_base, vmdq_pool_base); - if (port >= rte_eth_dev_count()) return -1; + if (port >= rte_eth_dev_count()) + return -1; /* * Though in this example, we only receive packets from the first queue @@ -253,7 +260,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool) rte_eth_dev_info_get(port, &dev_info); rxconf = &dev_info.default_rxconf; rxconf->rx_drop_en = 1; - for (q = 0; q < rxRings; q ++) { + for (q = 0; q < rxRings; q++) { retval = rte_eth_rx_queue_setup(port, q, rxRingSize, rte_eth_dev_socket_id(port), rxconf, @@ -264,7 +271,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool) } } - for (q = 0; q < txRings; q ++) { + for (q = 0; q < txRings; q++) { retval = rte_eth_tx_queue_setup(port, q, txRingSize, rte_eth_dev_socket_id(port), NULL); @@ -380,7 +387,8 @@ vmdq_parse_args(int argc, char **argv) }; /* Parse command line */ - while ((opt = getopt_long(argc, argv, "p:",long_option,&option_index)) != EOF) { + while ((opt = getopt_long(argc, argv, "p:", long_option, + &option_index)) != EOF) { switch (opt) { /* portmask */ case 'p': @@ -392,7 +400,7 @@ vmdq_parse_args(int argc, char **argv) } break; case 0: - if (vmdq_parse_num_pools(optarg) == -1){ + if (vmdq_parse_num_pools(optarg) == -1) { printf("invalid number of pools\n"); vmdq_usage(prgname); return -1; @@ -405,14 +4
[dpdk-dev] [PATCH v2 1/2] lib/librte_pmd_i40e: set vlan filter fix
">> 5" rather than ">> 4" Signed-off-by: Huawei Xie --- lib/librte_pmd_i40e/i40e_ethdev.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c b/lib/librte_pmd_i40e/i40e_ethdev.c index 5074262..c0cf3cf 100644 --- a/lib/librte_pmd_i40e/i40e_ethdev.c +++ b/lib/librte_pmd_i40e/i40e_ethdev.c @@ -4048,14 +4048,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi, { uint32_t vid_idx, vid_bit; -#define UINT32_BIT_MASK 0x1F -#define VALID_VLAN_BIT_MASK 0xFFF /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find the * element first, then find the bits it belongs to */ - vid_idx = (uint32_t) ((vlan_id & VALID_VLAN_BIT_MASK) >> - sizeof(uint32_t)); - vid_bit = (uint32_t) (1 << (vlan_id & UINT32_BIT_MASK)); + vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F); + vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); if (on) vsi->vfta[vid_idx] |= vid_bit; -- 1.8.1.4
[dpdk-dev] [PATCH v2 2/2] lib/librte_pmd_i40e: add I40E_VFTA_IDX and I40E_VFTA_BIT macros for VFTA related operation
Add two macros I40E_VFTA_IDX and I40E_VFTA_BIT for VFTA manipulation. Add vlan_id check in vlan filter search and set function. Signed-off-by: Huawei Xie --- lib/librte_pmd_i40e/i40e_ethdev.c | 17 ++--- lib/librte_pmd_i40e/i40e_ethdev.h | 9 + 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c b/lib/librte_pmd_i40e/i40e_ethdev.c index c0cf3cf..245460f 100644 --- a/lib/librte_pmd_i40e/i40e_ethdev.c +++ b/lib/librte_pmd_i40e/i40e_ethdev.c @@ -4033,8 +4033,11 @@ i40e_find_vlan_filter(struct i40e_vsi *vsi, { uint32_t vid_idx, vid_bit; - vid_idx = (uint32_t) ((vlan_id >> 5) & 0x7F); - vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); + if (vlan_id > ETH_VLAN_ID_MAX) + return 0; + + vid_idx = I40E_VFTA_IDX(vlan_id); + vid_bit = I40E_VFTA_BIT(vlan_id); if (vsi->vfta[vid_idx] & vid_bit) return 1; @@ -4048,11 +4051,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi, { uint32_t vid_idx, vid_bit; - /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find the -* element first, then find the bits it belongs to -*/ - vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F); - vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); + if (vlan_id > ETH_VLAN_ID_MAX) + return; + + vid_idx = I40E_VFTA_IDX(vlan_id); + vid_bit = I40E_VFTA_BIT(vlan_id); if (on) vsi->vfta[vid_idx] |= vid_bit; diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h b/lib/librte_pmd_i40e/i40e_ethdev.h index 96361c2..4f2c16a 100644 --- a/lib/librte_pmd_i40e/i40e_ethdev.h +++ b/lib/librte_pmd_i40e/i40e_ethdev.h @@ -50,6 +50,15 @@ #define I40E_DEFAULT_QP_NUM_FDIR 64 #define I40E_UINT32_BIT_SIZE (CHAR_BIT * sizeof(uint32_t)) #define I40E_VFTA_SIZE(4096 / I40E_UINT32_BIT_SIZE) +/* + * vlan_id is a 12 bit number. + * The VFTA array is actually a 4096 bit array, 128 of 32bit elements. + * 2^5 = 32. The val of lower 5 bits specifies the bit in the 32bit element. + * The higher 7 bit val specifies VFTA array index. + */ +#define I40E_VFTA_BIT(vlan_id)(1 << ((vlan_id) & 0x1F)) +#define I40E_VFTA_IDX(vlan_id)((vlan_id) >> 5) + /* Default TC traffic in case DCB is not enabled */ #define I40E_DEFAULT_TCMAP0x1 -- 1.8.1.4
[dpdk-dev] [PATCH v2 0/2] lib/librte_pmd_i40e: set vlan filter fix
This patchset fixes "set vlan filter" issue. v2 changes: * add two macros I40E_VFTA_IDX and I40E_VFTA_BIT for VFTA array operation. Huawei Xie (2): vlan id set fix add I40E_VFTA_IDX and I40E_VFTA_BIT macros for VFTA related operation lib/librte_pmd_i40e/i40e_ethdev.c | 20 ++-- lib/librte_pmd_i40e/i40e_ethdev.h | 9 + 2 files changed, 19 insertions(+), 10 deletions(-) -- 1.8.1.4
[dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD
I wasn't sure about why anyone would "avoid using UIO just for Virtio even though other devices need uio/vfio in DPDK." I mean I haven't seen anyone using that way and performance wise they both are similar. But anyways here are the major reasons we had Intel versions: 1. Intel Virtio-PMD version - was elegantly integrated in DPDK code tree and to keep consistency it uses UIO/VFIO framework just like any other PMDs in DPDK/lib 2. Intel VMXNET3-PMD - does not depend on any other module (i.e. vmxnet3 plugin depends on vmxnet3-usermap.ko module)- was elegantly integrated in DPDK code tree and to keep consistency it uses UIO/VFIO framework just like any other PMDs in DPDK/lib - is the version VMware contributes today as they think it's a better idea and they would provide ESXi support for that if they need to Thanks, Rashmin -Original Message- From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] Sent: Sunday, November 09, 2014 4:18 PM To: dev at dpdk.org Cc: Patel, Rashmin N; Aziz Hajee Subject: Re: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD Hi, 2014-11-07 16:53, Patel, Rashmin N: > Yes, you're right DPDK VMXNET3-PMD in /lib/librte_pmd_vmxnet3 does not > support mbuf chaining today. But it's a standalone bsd driver just > like any other pmd in that directory, it does not need vmxnet3-usermap.ko > module. > > Now there is another vmxnet3 solution in a separate branch as a > plugin, which must have vmxnet3-usermap.ko linux module(1), and a user > space interface piece(2) to tie it to any DPDK application in the main branch. > (1) and (2) makes the solution which is known as vmxnet3-plugin. It's > been there for a long time just like virtio-plugin, I don't know who > uses it, but community can *reply* here if there is still any need of > a separate solution that way. Coming back to last year, 6WIND developped some PMDs for virtio and vmxnet3. Later, Intel developped their own version using the uio framework. The versions in the main repository are the Intel ones, whereas the original ones from 6WIND are released as extensions. For completeness, it must be noted that Brocade worked on their own approach of vmxnet3 and contributed to virtio PMD. It's now time to merge all these implementations. The 6WIND implementations show that it's possible to avoid the uio framework. The virtio-net-pmd use port access granted by iopl(). The vmxnet3-usermap reuse the VMware's kernel module with a special mode for memory mapping. It was a pre-bifurcated logic. > I'm in favor of consolidating all those version into one elegant > solution by grabbing best features from all of them and maintain one > copy. I'm sure that developers contributing from VMware would also > support that idea because then it makes easy to maintain and debug and > bug fix and most importantly avoid such confusion in future. > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Aziz Hajee > Sent: Thursday, November 06, 2014 5:47 PM > To: dev at dpdk.org > Subject: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD > > I am using the dpdk1.6.0r1 > I could not find a complete clarification, sorry if missed. > VMXNET3 PMD > > I have enabled the VMXNET3 PMD in the dpdk. > # Compile burst-oriented VMXNET3 PMD driver # > > CONFIG_RTE_LIBRTE_VMXNET3_PMD=y > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_INIT=y > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_RX=n > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX=n > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n > The Intel DPDK VMXNET3 PMD driver does not support mbuf chaining, and I have > to set NOMULTSEGS for the vmxnet3 interface init to succeed. > tx_conf.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS Is there a later version > of DPDK that supports multiseg for the dpdk > VMXNET3 PMD. > > vmware vmxnet3-usermap AND DPDK VMXNET3 PMD > = > Is the vmxnet3-usermap.ko module driver also needed ? (appears that I need, > otherwise the eal initialise fails. > sudo insmod ./vmxnet3-usermap.ko enable_shm=2,2 num_rqs=1,1 > num_rxds=2048 > num_txds=2048 > > I do not understand if VMXNET3 PMD is there, what is the purpose of > /vmxnet3-usermap.ko/vmxnet3-usermap.ko ? > > From some responses i saw that the following ifdef RTE_EAL_UNBIND_PORTS is > also need to be removed in lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c ? > { > .name = "rte_vmxnet3_pmd", > .id_table = pci_id_vmxnet3_map, -#ifdef RTE_EAL_UNBIND_PORTS > +// #ifdef RTE_EAL_UNBIND_PORTS > .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO, -#endif > +// #endif > }, > .eth_dev_init = eth_vmxnet3_dev_init, > .dev_private_size = sizeof(struct vmxnet3_adapter), > > thanks, > -aziz
[dpdk-dev] [PATCH v2 0/2] examples/vmdq: support new VMDQ API
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > Sent: Monday, November 10, 2014 8:30 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/2] examples/vmdq: support new VMDQ > API > > This patch supports new VMDQ API in vmdq example. > > v2 changes: > * code rebase > * allow app to specify num_pools different with max_nb_pools > * fix serious cs issues > > > Huawei Xie (2): > support new VMDQ API in vmdq example > fix cs issues in vmdq example > > examples/vmdq/main.c | 233 ++ > - > 1 file changed, 139 insertions(+), 94 deletions(-) > > -- > 1.8.1.4 Acked-by : Jing Chen
[dpdk-dev] [PATCH v2 2/2] lib/librte_pmd_i40e: add I40E_VFTA_IDX and I40E_VFTA_BIT macros for VFTA related operation
Hi Huawei > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > Sent: Monday, November 10, 2014 10:46 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 2/2] lib/librte_pmd_i40e: add I40E_VFTA_IDX and > I40E_VFTA_BIT macros for VFTA related operation > > Add two macros I40E_VFTA_IDX and I40E_VFTA_BIT for VFTA manipulation. > Add vlan_id check in vlan filter search and set function. > > Signed-off-by: Huawei Xie > --- > lib/librte_pmd_i40e/i40e_ethdev.c | 17 ++--- > lib/librte_pmd_i40e/i40e_ethdev.h | 9 + > 2 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c > b/lib/librte_pmd_i40e/i40e_ethdev.c > index c0cf3cf..245460f 100644 > --- a/lib/librte_pmd_i40e/i40e_ethdev.c > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c > @@ -4033,8 +4033,11 @@ i40e_find_vlan_filter(struct i40e_vsi *vsi, { > uint32_t vid_idx, vid_bit; > > - vid_idx = (uint32_t) ((vlan_id >> 5) & 0x7F); > - vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); > + if (vlan_id > ETH_VLAN_ID_MAX) > + return 0; > + > + vid_idx = I40E_VFTA_IDX(vlan_id); > + vid_bit = I40E_VFTA_BIT(vlan_id); > > if (vsi->vfta[vid_idx] & vid_bit) > return 1; > @@ -4048,11 +4051,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi, { > uint32_t vid_idx, vid_bit; > > - /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find > the > - * element first, then find the bits it belongs to > - */ > - vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F); > - vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); > + if (vlan_id > ETH_VLAN_ID_MAX) > + return; > + > + vid_idx = I40E_VFTA_IDX(vlan_id); > + vid_bit = I40E_VFTA_BIT(vlan_id); > > if (on) > vsi->vfta[vid_idx] |= vid_bit; > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h > b/lib/librte_pmd_i40e/i40e_ethdev.h > index 96361c2..4f2c16a 100644 > --- a/lib/librte_pmd_i40e/i40e_ethdev.h > +++ b/lib/librte_pmd_i40e/i40e_ethdev.h > @@ -50,6 +50,15 @@ > #define I40E_DEFAULT_QP_NUM_FDIR 64 > #define I40E_UINT32_BIT_SIZE (CHAR_BIT * sizeof(uint32_t)) > #define I40E_VFTA_SIZE(4096 / I40E_UINT32_BIT_SIZE) > +/* > + * vlan_id is a 12 bit number. > + * The VFTA array is actually a 4096 bit array, 128 of 32bit elements. > + * 2^5 = 32. The val of lower 5 bits specifies the bit in the 32bit element. > + * The higher 7 bit val specifies VFTA array index. > + */ > +#define I40E_VFTA_BIT(vlan_id)(1 << ((vlan_id) & 0x1F)) > +#define I40E_VFTA_IDX(vlan_id)((vlan_id) >> 5) Why not define the 0x1f and 5 more meaningful in macros? Why define it in this header file? It seems that only used in i40e_ethdev.c. > + > /* Default TC traffic in case DCB is not enabled */ > #define I40E_DEFAULT_TCMAP0x1 > > -- > 1.8.1.4 Regards, Helin
[dpdk-dev] [PATCH v2 1/2] lib/librte_pmd_i40e: set vlan filter fix
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > Sent: Monday, November 10, 2014 10:46 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 1/2] lib/librte_pmd_i40e: set vlan filter fix > > ">> 5" rather than ">> 4" > > Signed-off-by: Huawei Xie > --- > lib/librte_pmd_i40e/i40e_ethdev.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c > b/lib/librte_pmd_i40e/i40e_ethdev.c > index 5074262..c0cf3cf 100644 > --- a/lib/librte_pmd_i40e/i40e_ethdev.c > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c > @@ -4048,14 +4048,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi, { > uint32_t vid_idx, vid_bit; > > -#define UINT32_BIT_MASK 0x1F > -#define VALID_VLAN_BIT_MASK 0xFFF > /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find > the >* element first, then find the bits it belongs to >*/ > - vid_idx = (uint32_t) ((vlan_id & VALID_VLAN_BIT_MASK) >> > - sizeof(uint32_t)); > - vid_bit = (uint32_t) (1 << (vlan_id & UINT32_BIT_MASK)); > + vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F); > + vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); I don't understand why remove macros and use numeric instead? > > if (on) > vsi->vfta[vid_idx] |= vid_bit; > -- > 1.8.1.4 Regards, Helin
[dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
Hi XIe, (2014/11/08 6:25), Xie, Huawei wrote: > How about using client/server model and select/poll event handing mechanism > rather than poll? > The polling could cause periodic jitter. > Sounds nice. I will change like your comment. Thanks, Tetsuya
[dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
Hi Xie, (2014/11/08 5:43), Xie, Huawei wrote: >> -struct vhost_net_device_ops const *get_virtio_net_callbacks(void); >> +struct vhost_net_device_ops const *get_virtio_net_callbacks( >> +vhost_driver_type_t type); > Tetsuya: > I feel currently it is better we still keep the common > get_virtio_net_callbacks(). > For the message flow from control layer 1 (cuse ioctl or user sock message > recv/xmit)---> cuse/user local message handling layer 2-> common virtio > message handling layer 3 > Layer 1 and layer 2 belong to one module. It is that module's choice whether > to implement callbacks between internal layer1 and layer2. We don't need to > force that. > Besides, even that module wants to define the ops between layer 1 and layer2, > the interface could be different between cuse/user. > Refer to the following code for user: > > vhost-user-server.c: > case VHOST_USER_SET_MEM_TABLE: > user_set_mem_table(ctx, &msg) > > virtio-net-user.c: > user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > { > > > > ops->set_mem_table(ctx, regions, memory.nregions); > } > > I may misunderstand what you say, please let me know in the case. I guess it's difficult to remove 'vhost_driver_type_t' from 'get_virtio_net_callbacks()'. In original vhost example code, there are 2 layers related with initialization as you mentioned. + Layer1: cuse ioctl handling layer. + Layer2: vhost-cuse( = vhost-net) message handling layer. Layer1 needs function pointers to call Layer2 functions. 'get_virtio_net_callbacks()' is used for that purpose. My RFC is based on above, but Layer1/2 are abstracted to hide vhost-cuse and vhost-user. + Layer1: device control abstraction layer. -- Layer1-a: cuse ioctl handling layer. -- Layer1-b: unix domain socket handling layer. + Layer2: message handling abstraction layer. -- Layer2-a: vhost-cuse(vhost-net) message handling layer. -- Layer2-b: vhost-user message handling layer. Still Layer1 needs function pointers of Layer2. So, anyway, we still need to implement 'get_virtio_net_callbacks()'. Also, as you mentioned, function definition and behavior are different between Layer2-a and Lanyer2-b like 'user_set_mem_table()'. Because of this, 'get_virtio_net_callbacks()' need to return collect function pointers to Layer1. So I guess 'get_virtio_net_callbacks()' needs 'vhost_driver_type_t' to know which function pointers are needed by Layer1. If someone wants to implement new vhost-backend, of course they can implement Layer2 implementation and Layer1 together. In the case, they doesn't need to call 'get_virtio_net_callbacks()'. Also they can reuse existing Layer2 implementation by calling 'get_virtio_net_callbacks()' with existing driver type, or they can implement a new Layer2 implementation for new vhost-backend. BTW, the name of 'vhost_driver_type_t' is redundant, I will change the name. Tetsuya
[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
> -Original Message- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Wednesday, November 5, 2014 6:28 PM > To: Liu, Jijiang > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum > offload > > Hi Jijiang, > > Thank you for your answer. Please find some comments below. > > > Another thing is surprising me. > > - if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the >driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums. If the flag is not set, and imply that it is not VXLAN packet, and do TX checksum offload as regular packet. > - if PKT_TX_VXLAN_CKSUM is set, then the driver has to use >inner_l{23}_len instead of l{23}_len for the same operation. Your understanding is not fully correct. The l{23}_len is still used for TX checksum offload, please refer to i40e_txd_enable_checksum() implementation. > Adding PKT_TX_VXLAN_CKSUM changes the semantic of l2_len and l3_len. > To fix this, I suggest to remove the new fields inner_l{23}_len then add > outer_l{23}_len instead. Therefore, the semantic of l2_len and l3_len would > not > change, and a driver would always use the same field for a specific offload. Oh... > For my TSO development, I will follow the current semantic. For TSO, you still can use l{2,3} _len . When I develop tunneling TSO, I will use inner_l3_len/inner_l4_len. >
[dpdk-dev] [PATCH v2 1/2] lib/librte_pmd_i40e: set vlan filter fix
> -Original Message- > From: Zhang, Helin > Sent: Sunday, November 09, 2014 10:09 PM > To: Xie, Huawei; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v2 1/2] lib/librte_pmd_i40e: set vlan filter > fix > > > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > > Sent: Monday, November 10, 2014 10:46 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v2 1/2] lib/librte_pmd_i40e: set vlan filter fix > > > > ">> 5" rather than ">> 4" > > > > Signed-off-by: Huawei Xie > > --- > > lib/librte_pmd_i40e/i40e_ethdev.c | 7 ++- > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c > > b/lib/librte_pmd_i40e/i40e_ethdev.c > > index 5074262..c0cf3cf 100644 > > --- a/lib/librte_pmd_i40e/i40e_ethdev.c > > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c > > @@ -4048,14 +4048,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi, { > > uint32_t vid_idx, vid_bit; > > > > -#define UINT32_BIT_MASK 0x1F > > -#define VALID_VLAN_BIT_MASK 0xFFF > > /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find > > the > > * element first, then find the bits it belongs to > > */ > > - vid_idx = (uint32_t) ((vlan_id & VALID_VLAN_BIT_MASK) >> > > - sizeof(uint32_t)); > > - vid_bit = (uint32_t) (1 << (vlan_id & UINT32_BIT_MASK)); > > + vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F); > > + vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); > I don't understand why remove macros and use numeric instead? Those macros are wrongly defined. Correct macros are defined in second patch. > > > > > if (on) > > vsi->vfta[vid_idx] |= vid_bit; > > -- > > 1.8.1.4 > > Regards, > Helin
[dpdk-dev] [PATCH v2 2/2] lib/librte_pmd_i40e: add I40E_VFTA_IDX and I40E_VFTA_BIT macros for VFTA related operation
> -Original Message- > From: Zhang, Helin > Sent: Sunday, November 09, 2014 10:08 PM > To: Xie, Huawei; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v2 2/2] lib/librte_pmd_i40e: add I40E_VFTA_IDX > and I40E_VFTA_BIT macros for VFTA related operation > > Hi Huawei > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > > Sent: Monday, November 10, 2014 10:46 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v2 2/2] lib/librte_pmd_i40e: add I40E_VFTA_IDX > and > > I40E_VFTA_BIT macros for VFTA related operation > > > > Add two macros I40E_VFTA_IDX and I40E_VFTA_BIT for VFTA manipulation. > > Add vlan_id check in vlan filter search and set function. > > > > Signed-off-by: Huawei Xie > > --- > > lib/librte_pmd_i40e/i40e_ethdev.c | 17 ++--- > > lib/librte_pmd_i40e/i40e_ethdev.h | 9 + > > 2 files changed, 19 insertions(+), 7 deletions(-) > > > > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c > > b/lib/librte_pmd_i40e/i40e_ethdev.c > > index c0cf3cf..245460f 100644 > > --- a/lib/librte_pmd_i40e/i40e_ethdev.c > > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c > > @@ -4033,8 +4033,11 @@ i40e_find_vlan_filter(struct i40e_vsi *vsi, { > > uint32_t vid_idx, vid_bit; > > > > - vid_idx = (uint32_t) ((vlan_id >> 5) & 0x7F); > > - vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); > > + if (vlan_id > ETH_VLAN_ID_MAX) > > + return 0; > > + > > + vid_idx = I40E_VFTA_IDX(vlan_id); > > + vid_bit = I40E_VFTA_BIT(vlan_id); > > > > if (vsi->vfta[vid_idx] & vid_bit) > > return 1; > > @@ -4048,11 +4051,11 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi, { > > uint32_t vid_idx, vid_bit; > > > > - /* VFTA is 32-bits size array, each element contains 32 vlan bits, Find > > the > > -* element first, then find the bits it belongs to > > -*/ > > - vid_idx = (uint32_t) ((vlan_id >> 5 ) & 0x7F); > > - vid_bit = (uint32_t) (1 << (vlan_id & 0x1F)); > > + if (vlan_id > ETH_VLAN_ID_MAX) > > + return; > > + > > + vid_idx = I40E_VFTA_IDX(vlan_id); > > + vid_bit = I40E_VFTA_BIT(vlan_id); > > > > if (on) > > vsi->vfta[vid_idx] |= vid_bit; > > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h > > b/lib/librte_pmd_i40e/i40e_ethdev.h > > index 96361c2..4f2c16a 100644 > > --- a/lib/librte_pmd_i40e/i40e_ethdev.h > > +++ b/lib/librte_pmd_i40e/i40e_ethdev.h > > @@ -50,6 +50,15 @@ > > #define I40E_DEFAULT_QP_NUM_FDIR 64 > > #define I40E_UINT32_BIT_SIZE (CHAR_BIT * sizeof(uint32_t)) > > #define I40E_VFTA_SIZE(4096 / I40E_UINT32_BIT_SIZE) > > +/* > > + * vlan_id is a 12 bit number. > > + * The VFTA array is actually a 4096 bit array, 128 of 32bit elements. > > + * 2^5 = 32. The val of lower 5 bits specifies the bit in the 32bit > > element. > > + * The higher 7 bit val specifies VFTA array index. > > + */ > > +#define I40E_VFTA_BIT(vlan_id)(1 << ((vlan_id) & 0x1F)) > > +#define I40E_VFTA_IDX(vlan_id)((vlan_id) >> 5) > Why not define the 0x1f and 5 more meaningful in macros? > > Why define it in this header file? It seems that only used in i40e_ethdev.c. > It is a macro for i40e common functionality, as I40E_VFTA_SIZE macro. > > + > > /* Default TC traffic in case DCB is not enabled */ > > #define I40E_DEFAULT_TCMAP0x1 > > > > -- > > 1.8.1.4 > > Regards, > Helin
[dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
> -Original Message- > From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp] > Sent: Sunday, November 09, 2014 10:13 PM > To: Xie, Huawei; dev at dpdk.org > Cc: nakajima.yoshihiro at lab.ntt.co.jp; masutani.hitoshi at lab.ntt.co.jp > Subject: Re: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction > layer tointerpret messages > > Hi Xie, > > (2014/11/08 5:43), Xie, Huawei wrote: > >> -struct vhost_net_device_ops const *get_virtio_net_callbacks(void); > >> +struct vhost_net_device_ops const *get_virtio_net_callbacks( > >> + vhost_driver_type_t type); > > Tetsuya: > > I feel currently it is better we still keep the common > > get_virtio_net_callbacks(). > > For the message flow from control layer 1 (cuse ioctl or user sock message > recv/xmit)---> cuse/user local message handling layer 2-> common virtio > message handling layer 3 > > Layer 1 and layer 2 belong to one module. It is that module's choice > > whether > to implement callbacks between internal layer1 and layer2. We don't need to > force that. > > Besides, even that module wants to define the ops between layer 1 and > > layer2, > the interface could be different between cuse/user. > > Refer to the following code for user: > > > > vhost-user-server.c: > > case VHOST_USER_SET_MEM_TABLE: > > user_set_mem_table(ctx, &msg) > > > > virtio-net-user.c: > > user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > > { > > > > > > > > ops->set_mem_table(ctx, regions, memory.nregions); > > } > > > > > I may misunderstand what you say, please let me know in the case. > I guess it's difficult to remove 'vhost_driver_type_t' from > 'get_virtio_net_callbacks()'. > In original vhost example code, there are 2 layers related with > initialization as you mentioned. > + Layer1: cuse ioctl handling layer. > + Layer2: vhost-cuse( = vhost-net) message handling layer. > > Layer1 needs function pointers to call Layer2 functions. > 'get_virtio_net_callbacks()' is used for that purpose. > > My RFC is based on above, but Layer1/2 are abstracted to hide vhost-cuse > and vhost-user. > + Layer1: device control abstraction layer. > -- Layer1-a: cuse ioctl handling layer. > -- Layer1-b: unix domain socket handling layer. > + Layer2: message handling abstraction layer. > -- Layer2-a: vhost-cuse(vhost-net) message handling layer. > -- Layer2-b: vhost-user message handling layer. > > Still Layer1 needs function pointers of Layer2. > So, anyway, we still need to implement 'get_virtio_net_callbacks()'. > > Also, as you mentioned, function definition and behavior are different > between Layer2-a and Lanyer2-b like 'user_set_mem_table()'. > Because of this, 'get_virtio_net_callbacks()' need to return collect > function pointers to Layer1. > So I guess 'get_virtio_net_callbacks()' needs 'vhost_driver_type_t' to > know which function pointers are needed by Layer1. Here all layer 2 implementations are required to return same type of vhost_net_device_ops function pointers to layer 1, so layer 1 need to do some kind of preprocessing of its message or wrap some private message ctx in like vhost_device_ctx, and then pass the message to layer2. But as we have a more common layer 3, virtio-net layer, how about we put common message handler in virtio net layer as much as possible, and different layer 2 only do the local message preprocessing, and then pass common message format to layer 3? I think we at least need to define functional pointers between layer 2 and layer 3. Layer 1 and layer 2 actually are sub layers of the same layer. It is that layer(cuse/user) implementation's choice whether to provide an interface between them, and the interface could be different in terms of function prototype. Let us say we are to implement a new vhost, I only care the common interface provided by layer 3. I don't want to register another callbacks for my driver which are used by myself only. Let us think more about this. > > If someone wants to implement new vhost-backend, of course they can > implement Layer2 implementation and Layer1 together. > In the case, they doesn't need to call 'get_virtio_net_callbacks()'. > Also they can reuse existing Layer2 implementation by calling > 'get_virtio_net_callbacks()' with existing driver type, or they can > implement a new Layer2 implementation for new vhost-backend. > > BTW, the name of 'vhost_driver_type_t' is redundant, I will change the name. > > Tetsuya
[dpdk-dev] White listing a virtual device
Hi Nicolas, > Thanks for your reply. The -w option is the same as --pci-whitelist > mentioned in my first email. Declaring a virtual device with --vdev > means that I want to use it but there doesn't seem to be a way to say > that I want to use only that device. Clearly the white list option is > the way to specify this but if virtual devices are excluded from > -w/--pci-whitelist you can't only white list the virtual devices. > > I want to be able to have the same command line arguments across several > systems under test without having to know where the physical devices are > (to black list them). > > My issue is not that I don't want to black list the physical devices > it's just that I want to white list the virtual ones. I don't see why > that option is not available. What about using the --no-pci option ? It would blacklist all physical devices (as PCI devices are the only ones supported today). Regards, Olivier
[dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
Tetsuya: I already did this, :), and will publish the code for review after I do some cleanup next week. > -Original Message- > From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp] > Sent: Sunday, November 09, 2014 10:11 PM > To: Xie, Huawei; dev at dpdk.org > Cc: nakajima.yoshihiro at lab.ntt.co.jp; masutani.hitoshi at lab.ntt.co.jp > Subject: Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user > implementation > > Hi XIe, > > (2014/11/08 6:25), Xie, Huawei wrote: > > How about using client/server model and select/poll event handing mechanism > rather than poll? > > The polling could cause periodic jitter. > > > Sounds nice. I will change like your comment. > > Thanks, > Tetsuya
[dpdk-dev] [PATCH v2 0/2] lib/librte_pmd_i40e: set vlan filter fix
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie > Sent: Monday, November 10, 2014 10:46 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/2] lib/librte_pmd_i40e: set vlan filter fix > > This patchset fixes "set vlan filter" issue. > > v2 changes: > * add two macros I40E_VFTA_IDX and I40E_VFTA_BIT for VFTA array > operation. > > Huawei Xie (2): > vlan id set fix > add I40E_VFTA_IDX and I40E_VFTA_BIT macros for VFTA related operation > > lib/librte_pmd_i40e/i40e_ethdev.c | 20 ++-- > lib/librte_pmd_i40e/i40e_ethdev.h | 9 + > 2 files changed, 19 insertions(+), 10 deletions(-) > > -- > 1.8.1.4 Acked-by : Jing Chen
[dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
Hi Xie, (2014/11/10 17:07), Xie, Huawei wrote: > Here all layer 2 implementations are required to return same type of > vhost_net_device_ops function pointers to > layer 1, so layer 1 need to do some kind of preprocessing of its message or > wrap some private message ctx in like vhost_device_ctx, > and then pass the message to layer2. > But as we have a more common layer 3, virtio-net layer, how about we put > common message handler in virtio net layer as much as possible, > and different layer 2 only do the local message preprocessing, and then pass > common message format to layer 3? > I think we at least need to define functional pointers between layer 2 and > layer 3. > Layer 1 and layer 2 actually are sub layers of the same layer. It is that > layer(cuse/user) implementation's choice whether to provide an interface > between > them, and the interface could be different in terms of function prototype. > Let us say we are to implement a new vhost, I only care the common interface > provided by layer 3. I don't want to register another callbacks for my driver > which > are used by myself only. > Let us think more about this. With my RFC implementation, sometimes Layer1 directly calls Layer2-a or Layer2-b functions. It may be fast a bit, but may not be well abstracted because Layer1 doesn't call virtio common layer sometimes. Anyway, I guess it's nice to change implementation as you mentioned. We don't need speed while initialization. Let's take well abstracted implementation. Thanks, Tetsuya
[dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
Hi Xie, (2014/11/10 17:18), Xie, Huawei wrote: > Tetsuya: > I already did this, :), and will publish the code for review after I do some > cleanup next week. I appreciate it. I guess your implementation assumes that all vhost-user functions you implemented are called by virtio common layer. Is it right? If so, I will change abstraction layer implementation by this week or early next week. (Please also check email related with 'get_virtio_net_callbacks()'). Thanks, Tetsuya
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Thomas, > Hi Alan, > > Did you make any progress in Qemu/KVM community? > We need to be sync'ed up with them to be sure we share the same goal. > I want also to avoid using a solution which doesn't fit with their plan. > Remember that we already had this problem with ivshmem which was > planned to be dropped. > > Thanks > -- > Thomas > > > 2014-10-16 15:21, Carew, Alan: > > Hi Thomas, > > > > > > However with a DPDK solution it would be possible to re-use the > message bus > > > > to pass information like device stats, application state, D-state > > > > requests > > > > etc. to the host and allow for management layer(e.g. OpenStack) to > make > > > > informed decisions. > > > > > > I think that management informations should be transmitted in a > management > > > channel. Such solution should exist in OpenStack. > > > > Perhaps it does, but this solution is not exclusive to OpenStack and just a > potential use case. > > > > > > > > > Also, the scope of adding power management to qemu/KVM would be > huge; > > > > while the easier path is not always the best and the problem of power > > > > management in VMs is both a DPDK problem (given that librte_power > only > > > > worked on the host) and a general virtualization problem that would be > > > > better solved by those with direct knowledge of Qemu/KVM > architecture > > > > and influence on the direction of the Qemu project. > > > > > > Being a huge effort is not an argument. > > > > I agree completely and was implied by what followed the conjunction. > > > > > Please check with Qemu community, they'll welcome it. > > > > > > > As it stands, the host backend is simply an example application that can > > > > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it > has > > > > obvious leanings to Qemu, but even this could be easily swapped out > for > > > > XenBus, IVSHMEM, IP etc. > > > > > > > > If power management is to be eventually supported by Hypervisors > directly > > > > then we could also enable to option to switch to that environment, > currently > > > > the librte_power implementations (VM or Host) can be selected > dynamically > > > > (environment auto-detection) or explicitly via rte_power_set_env(), > adding > > > > an arbitrary number of environments is relatively easy. > > > > > > Yes, you are adding a new layer to workaround hypervisor lacks. And this > layer > > > will handle native support when it will exist. But if you implement native > > > support now, we don't need this extra layer. > > > > Indeed, but we have a solution implemented now and yes it is a > workaround, that is until Hypervisors support such functionality. It is > possible > that whatever solutions for power management present themselves in the > future may require workarounds also, us-vhost is an example of such a > workaround introduced to DPDK. > > > > > > > > > I hope this helps to clarify the approach. > > > > > > Thanks for your explanation. > > > > Thanks for the feedback. > > > > > > > > -- > > > Thomas > > > > Alan. Unfortunately, I have not yet received any feedback: http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html Alan.
[dpdk-dev] [PATCH] app, examples: remove references to drivers config
> > These references to drivers break the layering isolation between > > application and > > drivers. > > > > Signed-off-by: Thomas Monjalon > > Acked-by: Helin Zhang > With minor changes suggested: 'devices' -> 'device'? Applied with suggested changes. Thanks -- Thomas
[dpdk-dev] [PATCH v2] librte_cmdline: FreeBSD Fix oveflow when size of command result structure is greater than BUFSIZ
When using test-pmd with flow director in FreeBSD, the application will segfault/Bus error while parsing the command-line. This is due to how each commands result structure is represented during parsing, where the offsets for each tokens value is stored in a character array(char result_buf[BUFSIZ]) in cmdline_parse()(./lib/librte_cmdline/cmdline_parse.c). The overflow occurs where BUFSIZ is less than the size of a commands result structure, in this case "struct cmd_pkt_filter_result" (app/test-pmd/cmdline.c) is 1088 bytes and BUFSIZ on FreeBSD is 1024 bytes as opposed to 8192 bytes on Linux. The problem can be reproduced by running test-pmd on FreeBSD: ./testpmd -c 0x3 -n 4 -- -i --portmask=0x3 --pkt-filter-mode=perfect And adding a filter: add_perfect_filter 0 udp src 192.168.0.0 1024 dst 192.168.0.0 1024 flexbytes 0x800 vlan 0 queue 0 soft 0x17 This patch removes the OS dependency on BUFSIZ and defines and uses a library #define CMDLINE_PARSE_RESULT_BUFSIZE 8192 Added boundary checking to ensure this buffer size cannot overflow, with an error message being produced. Suggested-by: Olivier MATZ http://git.droids-corp.org/?p=libcmdline.git;a=commitdiff;h=b1d5b169352e57df3fc14c51ffad4b83f3e5613f Signed-off-by: Alan Carew --- lib/librte_cmdline/cmdline_parse.c | 22 +++--- lib/librte_cmdline/cmdline_parse.h | 3 +++ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c index 940480d..f86f163 100644 --- a/lib/librte_cmdline/cmdline_parse.c +++ b/lib/librte_cmdline/cmdline_parse.c @@ -138,7 +138,7 @@ nb_common_chars(const char * s1, const char * s2) */ static int match_inst(cmdline_parse_inst_t *inst, const char *buf, - unsigned int nb_match_token, void * result_buf) + unsigned int nb_match_token, void *result_buf, unsigned result_buf_size) { unsigned int token_num=0; cmdline_parse_token_hdr_t * token_p; @@ -162,10 +162,18 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf, if ( isendofline(*buf) || iscomment(*buf) ) break; - if (result_buf) + if (result_buf) { + if (token_hdr.offset > result_buf_size) { + printf("Parse error(%s:%d): Token offset(%u) exceeds maximum " + "size(%u)\n", __FILE__, __LINE__, token_hdr.offset, + result_buf_size); + return -ENOBUFS; + } + n = token_hdr.ops->parse(token_p, buf, (char *)result_buf + token_hdr.offset); + } else n = token_hdr.ops->parse(token_p, buf, NULL); @@ -219,7 +227,7 @@ cmdline_parse(struct cmdline *cl, const char * buf) unsigned int inst_num=0; cmdline_parse_inst_t *inst; const char *curbuf; - char result_buf[BUFSIZ]; + char result_buf[CMDLINE_PARSE_RESULT_BUFSIZE]; void (*f)(void *, struct cmdline *, void *) = NULL; void *data = NULL; int comment = 0; @@ -280,7 +288,7 @@ cmdline_parse(struct cmdline *cl, const char * buf) debug_printf("INST %d\n", inst_num); /* fully parsed */ - tok = match_inst(inst, buf, 0, result_buf); + tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf)); if (tok > 0) /* we matched at least one token */ err = CMDLINE_PARSE_BAD_ARGS; @@ -377,10 +385,10 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state, inst = ctx[inst_num]; while (inst) { /* parse the first tokens of the inst */ - if (nb_token && match_inst(inst, buf, nb_token, NULL)) + if (nb_token && match_inst(inst, buf, nb_token, NULL, 0)) goto next; - debug_printf("instruction match \n"); + debug_printf("instruction match\n"); token_p = inst->tokens[nb_token]; if (token_p) memcpy(&token_hdr, token_p, sizeof(token_hdr)); @@ -471,7 +479,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state, /* we need to redo it */ inst = ctx[inst_num]; - if (nb_token && match_inst(inst, buf, nb_token, NULL)) + if (nb_token && match_inst(inst, buf, nb_token, NULL, 0)) goto next2; token_p = inst->tokens[nb_token]; diff --git a/lib/librte_cmdline/cmdline_parse.h b/lib/librte_cmdline/cmdline_parse.h index f18836d..dae53ba 100644 --- a/lib/librte_cmdline/cmdline_parse.h +++ b/lib/librte_cmdline/
[dpdk-dev] [PATCH 0/7] Cisco Systems Inc. VIC Ethernet PMD - ENIC PMD
Thomas, This patch is based on 1.7.1. Thought that is the latest. And I got the diff from origin. What made you feel that the patch is from 1.7? Regards, -Sujith On 07/11/14 9:17 pm, "Thomas Monjalon" wrote: >Sujith, > >It seems that this PMD is based on DPDK 1.7. >Could you rebase it on HEAD? > >Thank you >-- >Thomas
[dpdk-dev] [PATCH] librte_kni: Add buildtime checks for rte_kni_mbuf and rte_mbuf
> Adding this check is to avoid breakage from future data structure changes. > > Signed-off-by: Jia Yu Excellent idea! Acked-by: Thomas Monjalon Applied Thanks -- Thomas
[dpdk-dev] [PATCH] kni: fix build
> > Since commit 08b563ffb19 ("mbuf: replace data pointer by an offset"), > > KNI vhost compilation (CONFIG_RTE_KNI_VHOST=y) was broken. > > > > rte_pktmbuf_mtod() is not used in the kernel context but is replaced > > by a simple addition of the base address and the offset. > > > > Signed-off-by: Thomas Monjalon > > Acked-by: Olivier Matz Applied -- Thomas
[dpdk-dev] [PATCH 0/2] rte_ethdev fix/improvement
Hi Jia, 2014-11-07 09:31, Jia Yu: > This patch series include a fix and an improvement to rte_ethdev lib. New enhancements won't be integrated in release 1.8. But fixes are welcome. The problem is that it's not easy to track partially applies patchset. So it would be simpler if you send your fix separately. Thanks -- Thomas
[dpdk-dev] [PATCH] lib: include rte_memory.h for __rte_cache_aligned
2014-11-07 09:28, Jia Yu: > Include rte_memory.h for lib files that use __rte_cache_aligned > attribute. Please could you explain what was the error? As I suspect it's a fix, it would be clearer to start your title with "fix". Thanks -- Thomas
[dpdk-dev] [PATCH 0/7] Cisco Systems Inc. VIC Ethernet PMD - ENIC PMD
Thomas, It is our pleasure to be part of the community and to be contributing to it. Looking forward to a healthy and fruitful association. Thanks and Regards, -Sujith On 07/11/14 4:39 pm, "Thomas Monjalon" wrote: >2014-11-08 01:35, Sujith Sankar: >> ENIC PMD is the poll-mode driver for the Cisco Systems Inc. VIC to be >> used with DPDK suite. > >Great to see you on board! > >Thank you to contribute a new driver. > >-- >Thomas
[dpdk-dev] [PATCH] eal: map PCI memory resources after hugepages
Hi Liang I don't think that overriding the value passed to pci_map_resource as argument is the way to go. While it results in less code, it looks weird, in my opinion at least, as I believe tracking the correctness of address being requested should be the responsibility of the caller, i.e. either UIO or VFIO code. Which is why I keep insisting that you make requested_pci_addr global to linuxapp EAL PCI section and put it into include/eal_pci_init.h. Would you mind if I made a patch for this issue based on your code? Thanks, Anatoly -Original Message- From: Liang Xu [mailto:liang...@cinfotech.cn] Sent: Saturday, November 8, 2014 3:32 AM To: dev at dpdk.org Cc: Burakov, Anatoly; thomas.monjalon at 6wind.com Subject: [PATCH] eal: map PCI memory resources after hugepages A multiple process DPDK application must mmap hugepages and pci resources into same virtual addresses. By default the virtual addresses chosen by the primary process automatically when calling the mmap. But sometime the chosen virtual addresses isn't usable at secondary process. Such as the secondary process linked with more libraries than primary process. The library has been mapped into this virtual address. The command line parameter 'base-virtaddr' has been added for this situation. If it's configured, the hugepages will be mapped into this base address. But the virtual address of pci resources mapped still does not refer to the parameter. In that case "EAL: pci_map_resource(): cannot mmap" will be got. This patch try to map pci resources after hugepages. So the error can be resolved by set base-virtaddr into free virtual address space. Signed-off-by: Liang Xu --- lib/librte_eal/linuxapp/eal/eal_pci.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index ddb0535..502eef2 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,14 +97,42 @@ error: return -1; } +static void * +pci_find_max_end_va(void) +{ + const struct rte_memseg *seg = rte_eal_get_physmem_layout(); + const struct rte_memseg *last = seg; + unsigned i = 0; + + for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) { + if (seg->addr == NULL) + break; + + if (seg->addr > last->addr) + last = seg; + + } + return RTE_PTR_ADD(last->addr, last->len); } + /* map a particular resource from a file */ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) { void *mapaddr; + /* By default the PCI memory resource will be mapped after hugepages */ + static void *default_map_addr; + if (NULL == requested_addr) { + if (NULL == default_map_addr) + default_map_addr = pci_find_max_end_va(); + mapaddr = default_map_addr; + } else { + mapaddr = requested_addr; + } + /* Map the PCI memory resource of device */ - mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, + mapaddr = mmap(mapaddr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); if (mapaddr == MAP_FAILED || (requested_addr != NULL && mapaddr != requested_addr)) { @@ -114,6 +142,8 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) strerror(errno), mapaddr); goto fail; } + if (NULL == requested_addr) + default_map_addr = RTE_PTR_ADD(mapaddr, size); RTE_LOG(DEBUG, EAL, " PCI memory mapped at %p\n", mapaddr); -- 1.9.1
[dpdk-dev] [PATCH 3/7] ENIC PMD Makefile
Neil, If I move the DPDK patch that accommodates ENIC PMD (that is the one that patches lib/Makefile) to the last in the series, builds between commits would succeed, wouldn?t it? Moving that to the last is anyway needed. Thanks, -Sujith On 07/11/14 9:16 pm, "Sujith Sankar (ssujith)" wrote: >Hi Neil, > >Thanks for the comments. I shall work on the modifications that you have >suggested and get back with V2. > >Regards, >-Sujith > >On 07/11/14 5:04 pm, "Neil Horman" wrote: > >>On Sat, Nov 08, 2014 at 01:35:43AM +0530, Sujith Sankar wrote: >>> Signed-off-by: Sujith Sankar >>> --- >>> lib/librte_pmd_enic/Makefile | 66 >>> >>> 1 file changed, 66 insertions(+) >>> create mode 100644 lib/librte_pmd_enic/Makefile >>> >>> diff --git a/lib/librte_pmd_enic/Makefile >>>b/lib/librte_pmd_enic/Makefile >>> new file mode 100644 >>> index 000..7605a8f >>> --- /dev/null >>> +++ b/lib/librte_pmd_enic/Makefile >>> @@ -0,0 +1,66 @@ >>> +# BSD LICENSE >>> +# >>> +# Copyright(c) 2010-2013 Intel Corporation. All rights reserved. >>> +# All rights reserved. >>> +# >>> +# Redistribution and use in source and binary forms, with or without >>> +# modification, are permitted provided that the following conditions >>> +# are met: >>> +# >>> +# * Redistributions of source code must retain the above copyright >>> +# notice, this list of conditions and the following disclaimer. >>> +# * Redistributions in binary form must reproduce the above >>>copyright >>> +# notice, this list of conditions and the following disclaimer >>>in >>> +# the documentation and/or other materials provided with the >>> +# distribution. >>> +# * Neither the name of Intel Corporation nor the names of its >>> +# contributors may be used to endorse or promote products >>>derived >>> +# from this software without specific prior written permission. >>> +# >>> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND >>>CONTRIBUTORS >>> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >>> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS >>>FOR >>> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE >>>COPYRIGHT >>> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, >>>INCIDENTAL, >>> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >>> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF >>>USE, >>> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON >>>ANY >>> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR >>>TORT >>> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE >>>USE >>> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH >>>DAMAGE. >>> +# >>> + >>> +include $(RTE_SDK)/mk/rte.vars.mk >>> + >>> +# >>> +# library name >>> +# >>> +LIB = librte_pmd_enic.a >>> + >>> +CFLAGS += -I$(RTE_SDK)/lib/librte_hash/ >>> +CFLAGS += -O3 -Wno-deprecated >>> + >>> +VPATH += $(RTE_SDK)/lib/librte_pmd_enic/src >>> + >>> +# >>> +# all source are stored in SRCS-y >>> +# >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_main.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_clsf.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_cq.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_wq.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_dev.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_intr.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rq.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_etherdev.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_res.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rss.c >>> + >>> + >>> +# this lib depends upon: >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_eal >>>lib/librte_ether >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_mempool >>>lib/librte_mbuf >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_net >>>lib/librte_malloc >>> + >>> +include $(RTE_SDK)/mk/rte.lib.mk >>> + >>> -- >>> 1.9.1 >>> >>> >> >>Make this the last patch in your series, and merge it with the chunk from >>the >>last patch that adds the enic directory to the lib/Makefile, so that a >>bisect >>will build between these commits. >> >>Neil >> >
[dpdk-dev] [PATCH] eal: map PCI memory resources after hugepages
It is a default value when the requested_addr isn't exist, not an overide. When the?pci_map_resource is called at the primary process, the?requested_addr is NULL. The default value will be provided by?default_map_addr. When the?pci_map_resource is called at the secondery process, the?requested_addr is exist. Then everything isn't be changed.--From:Burakov, Anatoly Time:2014 Nov 10 (Mon) 17 : 54To:?? , dev at dpdk.org Cc:thomas.monjalon at 6wind.com Subject:RE: [PATCH] eal: map PCI memory resources after hugepages Hi Liang I don't think that overriding the value passed to pci_map_resource as argument is the way to go. While it results in less code, it looks weird, in my opinion at least, as I believe tracking the correctness of address being requested should be the responsibility of the caller, i.e. either UIO or VFIO code. Which is why I keep insisting that you make requested_pci_addr global to linuxapp EAL PCI section and put it into include/eal_pci_init.h. Would you mind if I made a patch for this issue based on your code? Thanks, Anatoly -Original Message- From: Liang Xu [mailto:liang...@cinfotech.cn] Sent: Saturday, November 8, 2014 3:32 AM To: dev at dpdk.org Cc: Burakov, Anatoly; thomas.monjalon at 6wind.com Subject: [PATCH] eal: map PCI memory resources after hugepages A multiple process DPDK application must mmap hugepages and pci resources into same virtual addresses. By default the virtual addresses chosen by the primary process automatically when calling the mmap. But sometime the chosen virtual addresses isn't usable at secondary process. Such as the secondary process linked with more libraries than primary process. The library has been mapped into this virtual address. The command line parameter 'base-virtaddr' has been added for this situation. If it's configured, the hugepages will be mapped into this base address. But the virtual address of pci resources mapped still does not refer to the parameter. In that case "EAL: pci_map_resource(): cannot mmap" will be got. This patch try to map pci resources after hugepages. So the error can be resolved by set base-virtaddr into free virtual address space. Signed-off-by: Liang Xu --- lib/librte_eal/linuxapp/eal/eal_pci.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index ddb0535..502eef2 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,14 +97,42 @@ error: return -1; } +static void * +pci_find_max_end_va(void) +{ + const struct rte_memseg *seg = rte_eal_get_physmem_layout(); + const struct rte_memseg *last = seg; + unsigned i = 0; + + for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) { + if (seg->addr == NULL) + break; + + if (seg->addr > last->addr) + last = seg; + + } + return RTE_PTR_ADD(last->addr, last->len); } + /* map a particular resource from a file */ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) { void *mapaddr; + /* By default the PCI memory resource will be mapped after hugepages */ + static void *default_map_addr; + if (NULL == requested_addr) { + if (NULL == default_map_addr) + default_map_addr = pci_find_max_end_va(); + mapaddr = default_map_addr; + } else { + mapaddr = requested_addr; + } + /* Map the PCI memory resource of device */ - mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, + mapaddr = mmap(mapaddr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); if (mapaddr == MAP_FAILED || (requested_addr != NULL && mapaddr != requested_addr)) { @@ -114,6 +142,8 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) strerror(errno), mapaddr); goto fail; } + if (NULL == requested_addr) + default_map_addr = RTE_PTR_ADD(mapaddr, size); RTE_LOG(DEBUG, EAL, " PCI memory mapped at %p\n", mapaddr); -- 1.9.1
[dpdk-dev] [PATCH 0/7] Cisco Systems Inc. VIC Ethernet PMD - ENIC PMD
2014-11-10 09:27, Sujith Sankar: > On 07/11/14 9:17 pm, "Thomas Monjalon" wrote: > >It seems that this PMD is based on DPDK 1.7. > >Could you rebase it on HEAD? > > This patch is based on 1.7.1. Thought that is the latest. And I got the > diff from origin. > What made you feel that the patch is from 1.7? By saying 1.7, I meant 1.7.0 or 1.7.1. In current HEAD (future 1.8.0), there is a lot of changes which make your PMD incompatible. That's why the rule is to base the patches on the latest HEAD. Thanks for your efforts. -- Thomas
[dpdk-dev] [PATCH] eal: map PCI memory resources after hugepages
Of course you can take this job. Thanks you for your help.--From:?? Time:2014 Nov 10 (Mon) 18 : 01To:Burakov, Anatoly , dev at dpdk.org Cc:thomas.monjalon at 6wind.com Subject:Re: [PATCH] eal: map PCI memory resources after hugepages It is a default value when the requested_addr isn't exist, not an overide. When the?pci_map_resource is called at the primary process, the?requested_addr is NULL. The default value will be provided by?default_map_addr. When the?pci_map_resource is called at the secondery process, the?requested_addr is exist. Then everything isn't be changed.--From:Burakov, Anatoly Time:2014 Nov 10 (Mon) 17 : 54To:?? , dev at dpdk.org Cc:thomas.monjalon at 6wind.com Subject:RE: [PATCH] eal: map PCI memory resources after hugepages Hi Liang I don't think that overriding the value passed to pci_map_resource as argument is the way to go. While it results in less code, it looks weird, in my opinion at least, as I believe tracking the correctness of address being requested should be the responsibility of the caller, i.e. either UIO or VFIO code. Which is why I keep insisting that you make requested_pci_addr global to linuxapp EAL PCI section and put it into include/eal_pci_init.h. Would you mind if I made a patch for this issue based on your code? Thanks, Anatoly -Original Message- From: Liang Xu [mailto:liang...@cinfotech.cn] Sent: Saturday, November 8, 2014 3:32 AM To: dev at dpdk.org Cc: Burakov, Anatoly; thomas.monjalon at 6wind.com Subject: [PATCH] eal: map PCI memory resources after hugepages A multiple process DPDK application must mmap hugepages and pci resources into same virtual addresses. By default the virtual addresses chosen by the primary process automatically when calling the mmap. But sometime the chosen virtual addresses isn't usable at secondary process. Such as the secondary process linked with more libraries than primary process. The library has been mapped into this virtual address. The command line parameter 'base-virtaddr' has been added for this situation. If it's configured, the hugepages will be mapped into this base address. But the virtual address of pci resources mapped still does not refer to the parameter. In that case "EAL: pci_map_resource(): cannot mmap" will be got. This patch try to map pci resources after hugepages. So the error can be resolved by set base-virtaddr into free virtual address space. Signed-off-by: Liang Xu --- lib/librte_eal/linuxapp/eal/eal_pci.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index ddb0535..502eef2 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,14 +97,42 @@ error: return -1; } +static void * +pci_find_max_end_va(void) +{ + const struct rte_memseg *seg = rte_eal_get_physmem_layout(); + const struct rte_memseg *last = seg; + unsigned i = 0; + + for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) { + if (seg->addr == NULL) + break; + + if (seg->addr > last->addr) + last = seg; + + } + return RTE_PTR_ADD(last->addr, last->len); } + /* map a particular resource from a file */ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) { void *mapaddr; + /* By default the PCI memory resource will be mapped after hugepages */ + static void *default_map_addr; + if (NULL == requested_addr) { + if (NULL == default_map_addr) + default_map_addr = pci_find_max_end_va(); + mapaddr = default_map_addr; + } else { + mapaddr = requested_addr; + } + /* Map the PCI memory resource of device */ - mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, + mapaddr = mmap(mapaddr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); if (mapaddr == MAP_FAILED || (requested_addr != NULL && mapaddr != requested_addr)) { @@ -114,6 +142,8 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) strerror(errno), mapaddr); goto fail; } + if (NULL == requested_addr) + default_map_addr = RTE_PTR_ADD(mapaddr, size); RTE_LOG(DEBUG, EAL, " PCI memory mapped at %p\n", mapaddr); -- 1.9.1
[dpdk-dev] [PATCH 0/7] Cisco Systems Inc. VIC Ethernet PMD - ENIC PMD
Thanks for the clear response. I?ll take a look at it. Regards, -Sujith On 10/11/14 3:33 pm, "Thomas Monjalon" wrote: >2014-11-10 09:27, Sujith Sankar: >> On 07/11/14 9:17 pm, "Thomas Monjalon" >>wrote: >> >It seems that this PMD is based on DPDK 1.7. >> >Could you rebase it on HEAD? >> >> This patch is based on 1.7.1. Thought that is the latest. And I got >>the >> diff from origin. >> What made you feel that the patch is from 1.7? > >By saying 1.7, I meant 1.7.0 or 1.7.1. >In current HEAD (future 1.8.0), there is a lot of changes which make your >PMD >incompatible. That's why the rule is to base the patches on the latest >HEAD. > >Thanks for your efforts. >-- >Thomas
[dpdk-dev] [PATCH 4/7] VNIC common code
Thomas, These are files common to all flavours (user-space as well as kernel mode) of VIC drivers. I shall add this info to the commit logs. Let me rework the directory structure too. Thanks, -Sujith On 07/11/14 9:21 pm, "Thomas Monjalon" wrote: >2014-11-08 01:35, Sujith Sankar: >> lib/librte_pmd_enic/src/cq_desc.h | 122 >> lib/librte_pmd_enic/src/cq_enet_desc.h | 257 >> lib/librte_pmd_enic/src/rq_enet_desc.h | 72 +++ >> lib/librte_pmd_enic/src/vnic_cq.c | 113 >> lib/librte_pmd_enic/src/vnic_cq.h | 148 + >> lib/librte_pmd_enic/src/vnic_dev.c | 1077 >>+++ >> lib/librte_pmd_enic/src/vnic_dev.h | 198 ++ >> lib/librte_pmd_enic/src/vnic_devcmd.h | 770 ++ >> lib/librte_pmd_enic/src/vnic_enet.h | 74 +++ >> lib/librte_pmd_enic/src/vnic_intr.c | 79 +++ >> lib/librte_pmd_enic/src/vnic_intr.h | 122 >> lib/librte_pmd_enic/src/vnic_nic.h | 84 +++ >> lib/librte_pmd_enic/src/vnic_resource.h | 93 +++ >> lib/librte_pmd_enic/src/vnic_rq.c | 242 +++ >> lib/librte_pmd_enic/src/vnic_rq.h | 278 >> lib/librte_pmd_enic/src/vnic_rss.c | 81 +++ >> lib/librte_pmd_enic/src/vnic_rss.h | 57 ++ >> lib/librte_pmd_enic/src/vnic_stats.h| 82 +++ >> lib/librte_pmd_enic/src/vnic_wq.c | 241 +++ >> lib/librte_pmd_enic/src/vnic_wq.h | 279 >> lib/librte_pmd_enic/src/wq_enet_desc.h | 110 >> 21 files changed, 4579 insertions(+) > >What is the status of these files? >Are they copied from somewhere? >Please explain in the commit log. > >Could you move them in a subdirectory vnic/ or base/? > >Please could you remove the src/ subdirectory level? > >Thanks >-- >Thomas
[dpdk-dev] [PATCH] Add in_flight_bitmask so as to use full 32 bits of tag.
User application is advocated to set the newly introduced union field mbuf->hash.usr as flow id, which is uint32_t. With introduction of in_flight_bitmask, the whole 32 bits of tag can be used. Further more, this patch fixed the integer overflow when finding the matched tags. Note that currently librte_distributor supports up to 64 worker threads. If more workers are needed, the size of in_flight_bitmask and the algorithm of finding matched tag must be revised. Signed-off-by: Qinglai Xiao --- app/test/test_distributor.c | 18 ++-- app/test/test_distributor_perf.c |4 +- lib/librte_distributor/rte_distributor.c | 45 + lib/librte_distributor/rte_distributor.h |3 ++ lib/librte_mbuf/rte_mbuf.h |1 + 5 files changed, 47 insertions(+), 24 deletions(-) diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c index ce06436..9e8c06d 100644 --- a/app/test/test_distributor.c +++ b/app/test/test_distributor.c @@ -120,7 +120,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) /* now set all hash values in all buffers to zero, so all pkts go to the * one worker thread */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; rte_distributor_process(d, bufs, BURST); rte_distributor_flush(d); @@ -142,7 +142,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) if (rte_lcore_count() >= 3) { clear_packet_count(); for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = (i & 1) << 8; + bufs[i]->hash.usr = (i & 1) << 8; rte_distributor_process(d, bufs, BURST); rte_distributor_flush(d); @@ -167,7 +167,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) * so load gets distributed */ clear_packet_count(); for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = i; + bufs[i]->hash.usr = i; rte_distributor_process(d, bufs, BURST); rte_distributor_flush(d); @@ -199,7 +199,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) return -1; } for (i = 0; i < BIG_BATCH; i++) - many_bufs[i]->hash.rss = i << 2; + many_bufs[i]->hash.usr = i << 2; for (i = 0; i < BIG_BATCH/BURST; i++) { rte_distributor_process(d, &many_bufs[i*BURST], BURST); @@ -280,7 +280,7 @@ sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p) while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) rte_distributor_process(d, NULL, 0); for (j = 0; j < BURST; j++) { - bufs[j]->hash.rss = (i+j) << 1; + bufs[j]->hash.usr = (i+j) << 1; rte_mbuf_refcnt_set(bufs[j], 1); } @@ -359,7 +359,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d, /* now set all hash values in all buffers to zero, so all pkts go to the * one worker thread */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; rte_distributor_process(d, bufs, BURST); /* at this point, we will have processed some packets and have a full @@ -372,7 +372,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d, return -1; } for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; /* get worker zero to quit */ zero_quit = 1; @@ -416,7 +416,7 @@ test_flush_with_worker_shutdown(struct rte_distributor *d, /* now set all hash values in all buffers to zero, so all pkts go to the * one worker thread */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; rte_distributor_process(d, bufs, BURST); /* at this point, we will have processed some packets and have a full @@ -488,7 +488,7 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p) zero_quit = 0; quit = 1; for (i = 0; i < num_workers; i++) - bufs[i]->hash.rss = i << 1; + bufs[i]->hash.usr = i << 1; rte_distributor_process(d, bufs, num_workers); rte_mempool_put_bulk(p, (void *)bufs, num_workers); diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c index b04864c..48ee344 100644 --- a/app/test/test_distributor_perf.c +++ b/app/test/test_distributor_perf.c @@ -159,7 +159,7 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p) } /* ensure we have different hash value for each pkt */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = i; + bufs
[dpdk-dev] Ports not detected by IGB_UIO in DPDK 1.7.1 in QEMU_KVM environment
On Fri, Nov 07, 2014 at 11:26:08PM +0530, Manoj Viswanath wrote: > Hi Bruce, > > Please find my comment in lined. > > On Fri, Nov 7, 2014 at 9:00 PM, Bruce Richardson intel.com > > wrote: > > > On Fri, Nov 07, 2014 at 08:31:34PM +0530, Manoj Viswanath wrote: > > > Hi Bruce, > > > > > > I was not doing anything specific for binding the NICs to IGB_UIO (like > > > invoking "dpdk_nic_bind.py" script explicitly) when using my application > > > with DPDK 1.6.0. The e1000 devices assigned via virt-manager to the VM > > were > > > automatically getting picked up and initialized by IGB_UIO within each > > VM. > > > > > > The same is not working with DPDK 1.7.1 now. > > > > > > I tried exporting the "dpdk_nic_bind.py" script into my VM (running DPDK > > > 1.7.1) and tried to check the status. The emulated devices were shown as > > > neither bound to kernel nor to IGB_UIO as evident from below output:- > > > > > > > > <---> > > > Network devices using DPDK-compatible driver > > > > > > > > > > > > Network devices using kernel driver > > > === > > > :00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio > > > > > > Other network devices > > > = > > > :00:04.0 '82540EM Gigabit Ethernet Controller' unused=igb_uio > > > :00:05.0 '82540EM Gigabit Ethernet Controller' unused=igb_uio > > > > > <---> > > > > > > When i tried to forcefully bind the NICs using the "--bind=igb_uio" > > option > > > > Was there any output of the dpdk_nic_bind script? What does the output of > > it with --status show afterwards? > > > > ? > [MANOJ]? > > ?Yes. Please refer below output:- > > Network devices using DPDK-compatible driver > > :00:04.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused= > :00:05.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused= > > Network devices using kernel driver > === > :00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio > > Other network devices > = > ? > > > ?However, when i start the DPDK application, i am getting the error log as > indicated in earlier mail. ? > > The difference with DPDK 1.6.1 is that at the same stage IGB_UIO has > already bound the assigned devices without having to explicitly run the > "dpdk_nic_bind.py". Please find below the application log when run with > DPDK 1.6.0:- > > ? > Network devices using DPDK-compatible driver > > :00:04.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused= > :00:08.0 '82540EM Gigabit Ethernet Controller' drv=igb_uio unused= > > Network devices using kernel driver > === > :00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio > > Other network devices > = > > ? > > ?Kindly note that in both cases, logs have been taken after loading IGB_UIO > prior to starting DPDK application. ? > > ?[/MANOJ]? > > Regards, Ok, so it appears that after running dpdk_nic_bind to bind the devices to igb_uio the differences between 1.6 and 1.7 are resolved for that part. The reason why you explicitly need to bind the devices in 1.7 is due to this commit which removes the pci id table from the igb_uio driver. http://dpdk.org/browse/dpdk/commit/?id=629395b063e8278a05ea41908d1152fa68df098c As for the other errors you are seeing, I'm not sure of the cause, though they may be related to interrupt support for changes in link status. Can you perhaps use a debugger and find out what the file descriptor in question refers to? /Bruce
[dpdk-dev] [PATCH] Add in_flight_bitmask so as to use full 32 bits of tag.
On Mon, Nov 10, 2014 at 12:38:32PM +0200, Qinglai Xiao wrote: > User application is advocated to set the newly introduced union field > mbuf->hash.usr as flow id, which is uint32_t. > With introduction of in_flight_bitmask, the whole 32 bits of tag can > be used. > > Further more, this patch fixed the integer overflow when finding the > matched tags. > > Note that currently librte_distributor supports up to 64 worker > threads. If more workers are needed, the size of in_flight_bitmask and > the algorithm of finding matched tag must be revised. > > Signed-off-by: Qinglai Xiao Hi, this would probably be better as two patches rather than one. One patch to add the hash.usr field, and then use it in the distributor. The change to the distributor to add the bit mask to allow all bits of the tag to be used should then go as a separate patch. Regards, /Bruce > --- > app/test/test_distributor.c | 18 ++-- > app/test/test_distributor_perf.c |4 +- > lib/librte_distributor/rte_distributor.c | 45 + > lib/librte_distributor/rte_distributor.h |3 ++ > lib/librte_mbuf/rte_mbuf.h |1 + > 5 files changed, 47 insertions(+), 24 deletions(-) > > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c > index ce06436..9e8c06d 100644 > --- a/app/test/test_distributor.c > +++ b/app/test/test_distributor.c > @@ -120,7 +120,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) > /* now set all hash values in all buffers to zero, so all pkts go to the >* one worker thread */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > rte_distributor_process(d, bufs, BURST); > rte_distributor_flush(d); > @@ -142,7 +142,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) > if (rte_lcore_count() >= 3) { > clear_packet_count(); > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = (i & 1) << 8; > + bufs[i]->hash.usr = (i & 1) << 8; > > rte_distributor_process(d, bufs, BURST); > rte_distributor_flush(d); > @@ -167,7 +167,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) >* so load gets distributed */ > clear_packet_count(); > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = i; > + bufs[i]->hash.usr = i; > > rte_distributor_process(d, bufs, BURST); > rte_distributor_flush(d); > @@ -199,7 +199,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) > return -1; > } > for (i = 0; i < BIG_BATCH; i++) > - many_bufs[i]->hash.rss = i << 2; > + many_bufs[i]->hash.usr = i << 2; > > for (i = 0; i < BIG_BATCH/BURST; i++) { > rte_distributor_process(d, &many_bufs[i*BURST], BURST); > @@ -280,7 +280,7 @@ sanity_test_with_mbuf_alloc(struct rte_distributor *d, > struct rte_mempool *p) > while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) > rte_distributor_process(d, NULL, 0); > for (j = 0; j < BURST; j++) { > - bufs[j]->hash.rss = (i+j) << 1; > + bufs[j]->hash.usr = (i+j) << 1; > rte_mbuf_refcnt_set(bufs[j], 1); > } > > @@ -359,7 +359,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor > *d, > /* now set all hash values in all buffers to zero, so all pkts go to the >* one worker thread */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > rte_distributor_process(d, bufs, BURST); > /* at this point, we will have processed some packets and have a full > @@ -372,7 +372,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor > *d, > return -1; > } > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > /* get worker zero to quit */ > zero_quit = 1; > @@ -416,7 +416,7 @@ test_flush_with_worker_shutdown(struct rte_distributor *d, > /* now set all hash values in all buffers to zero, so all pkts go to the >* one worker thread */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > rte_distributor_process(d, bufs, BURST); > /* at this point, we will have processed some packets and have a full > @@ -488,7 +488,7 @@ quit_workers(struct rte_distributor *d, struct > rte_mempool *p) > zero_quit = 0; > quit = 1; > for (i = 0; i < num_workers; i++) > - bufs[i]->hash.rss = i << 1; > + bufs[i]->hash.usr = i << 1; > rte_distributor_process(d, bufs, num_workers); > > rte_mempool_put_bulk(p, (voi
[dpdk-dev] [PATCH] eal: map PCI memory resources after hugepages
By the way, the?pci_map_resource will check the mapaddr ==?requested_addr. So you must provide a usable?requested_addr. It's a reason I modified the?pci_map_resource().--From:?? Time:2014 Nov 10 (Mon) 18 : 04To:Burakov, Anatoly , dev at dpdk.org Subject:Re: [dpdk-dev] [PATCH] eal: map PCI memory resources after hugepages Of course you can take this job. Thanks you for your help.--From:?? Time:2014 Nov 10 (Mon) 18 : 01To:Burakov, Anatoly , dev at dpdk.org Cc:thomas.monjalon at 6wind.com Subject:Re: [PATCH] eal: map PCI memory resources after hugepages It is a default value when the requested_addr isn't exist, not an overide. When the?pci_map_resource is called at the primary process, the?requested_addr is NULL. The default value will be provided by?default_map_addr. When the?pci_map_resource is called at the secondery process, the?requested_addr is exist. Then everything isn't be changed.--From:Burakov, Anatoly Time:2014 Nov 10 (Mon) 17 : 54To:?? , dev at dpdk.org Cc:thomas.monjalon at 6wind.com Subject:RE: [PATCH] eal: map PCI memory resources after hugepages Hi Liang I don't think that overriding the value passed to pci_map_resource as argument is the way to go. While it results in less code, it looks weird, in my opinion at least, as I believe tracking the correctness of address being requested should be the responsibility of the caller, i.e. either UIO or VFIO code. Which is why I keep insisting that you make requested_pci_addr global to linuxapp EAL PCI section and put it into include/eal_pci_init.h. Would you mind if I made a patch for this issue based on your code? Thanks, Anatoly -Original Message- From: Liang Xu [mailto:liang...@cinfotech.cn] Sent: Saturday, November 8, 2014 3:32 AM To: dev at dpdk.org Cc: Burakov, Anatoly; thomas.monjalon at 6wind.com Subject: [PATCH] eal: map PCI memory resources after hugepages A multiple process DPDK application must mmap hugepages and pci resources into same virtual addresses. By default the virtual addresses chosen by the primary process automatically when calling the mmap. But sometime the chosen virtual addresses isn't usable at secondary process. Such as the secondary process linked with more libraries than primary process. The library has been mapped into this virtual address. The command line parameter 'base-virtaddr' has been added for this situation. If it's configured, the hugepages will be mapped into this base address. But the virtual address of pci resources mapped still does not refer to the parameter. In that case "EAL: pci_map_resource(): cannot mmap" will be got. This patch try to map pci resources after hugepages. So the error can be resolved by set base-virtaddr into free virtual address space. Signed-off-by: Liang Xu --- lib/librte_eal/linuxapp/eal/eal_pci.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index ddb0535..502eef2 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,14 +97,42 @@ error: return -1; } +static void * +pci_find_max_end_va(void) +{ + const struct rte_memseg *seg = rte_eal_get_physmem_layout(); + const struct rte_memseg *last = seg; + unsigned i = 0; + + for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) { + if (seg->addr == NULL) + break; + + if (seg->addr > last->addr) + last = seg; + + } + return RTE_PTR_ADD(last->addr, last->len); } + /* map a particular resource from a file */ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) { void *mapaddr; + /* By default the PCI memory resource will be mapped after hugepages */ + static void *default_map_addr; + if (NULL == requested_addr) { + if (NULL == default_map_addr) + default_map_addr = pci_find_max_end_va(); + mapaddr = default_map_addr; + } else { + mapaddr = requested_addr; + } + /* Map the PCI memory resource of device */ - mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, + mapaddr = mmap(mapaddr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); if (mapaddr == MAP_FAILED || (requested_addr != NULL && mapaddr != requested_addr)) { @@ -114,6 +142,8 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) strerror(errno), mapaddr); goto fail; } + if (NULL == requested_addr) + default_map_addr = RTE_PTR_ADD(mapaddr, size); R
[dpdk-dev] [PATCH 3/7] ENIC PMD Makefile
On Mon, Nov 10, 2014 at 09:59:45AM +, Sujith Sankar (ssujith) wrote: > Neil, > > If I move the DPDK patch that accommodates ENIC PMD (that is the one that > patches lib/Makefile) to the last in the series, builds between commits > would succeed, wouldn?t it? Moving that to the last is anyway needed. > correct, yes. Neil > Thanks, > -Sujith > > On 07/11/14 9:16 pm, "Sujith Sankar (ssujith)" wrote: > > >Hi Neil, > > > >Thanks for the comments. I shall work on the modifications that you have > >suggested and get back with V2. > > > >Regards, > >-Sujith > > > >On 07/11/14 5:04 pm, "Neil Horman" wrote: > > > >>On Sat, Nov 08, 2014 at 01:35:43AM +0530, Sujith Sankar wrote: > >>> Signed-off-by: Sujith Sankar > >>> --- > >>> lib/librte_pmd_enic/Makefile | 66 > >>> > >>> 1 file changed, 66 insertions(+) > >>> create mode 100644 lib/librte_pmd_enic/Makefile > >>> > >>> diff --git a/lib/librte_pmd_enic/Makefile > >>>b/lib/librte_pmd_enic/Makefile > >>> new file mode 100644 > >>> index 000..7605a8f > >>> --- /dev/null > >>> +++ b/lib/librte_pmd_enic/Makefile > >>> @@ -0,0 +1,66 @@ > >>> +# BSD LICENSE > >>> +# > >>> +# Copyright(c) 2010-2013 Intel Corporation. All rights reserved. > >>> +# All rights reserved. > >>> +# > >>> +# Redistribution and use in source and binary forms, with or without > >>> +# modification, are permitted provided that the following conditions > >>> +# are met: > >>> +# > >>> +# * Redistributions of source code must retain the above copyright > >>> +# notice, this list of conditions and the following disclaimer. > >>> +# * Redistributions in binary form must reproduce the above > >>>copyright > >>> +# notice, this list of conditions and the following disclaimer > >>>in > >>> +# the documentation and/or other materials provided with the > >>> +# distribution. > >>> +# * Neither the name of Intel Corporation nor the names of its > >>> +# contributors may be used to endorse or promote products > >>>derived > >>> +# from this software without specific prior written permission. > >>> +# > >>> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > >>>CONTRIBUTORS > >>> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > >>> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS > >>>FOR > >>> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > >>>COPYRIGHT > >>> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > >>>INCIDENTAL, > >>> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > >>> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > >>>USE, > >>> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON > >>>ANY > >>> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > >>>TORT > >>> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE > >>>USE > >>> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > >>>DAMAGE. > >>> +# > >>> + > >>> +include $(RTE_SDK)/mk/rte.vars.mk > >>> + > >>> +# > >>> +# library name > >>> +# > >>> +LIB = librte_pmd_enic.a > >>> + > >>> +CFLAGS += -I$(RTE_SDK)/lib/librte_hash/ > >>> +CFLAGS += -O3 -Wno-deprecated > >>> + > >>> +VPATH += $(RTE_SDK)/lib/librte_pmd_enic/src > >>> + > >>> +# > >>> +# all source are stored in SRCS-y > >>> +# > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_main.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_clsf.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_cq.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_wq.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_dev.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_intr.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rq.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_etherdev.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_res.c > >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rss.c > >>> + > >>> + > >>> +# this lib depends upon: > >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_eal > >>>lib/librte_ether > >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_mempool > >>>lib/librte_mbuf > >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_net > >>>lib/librte_malloc > >>> + > >>> +include $(RTE_SDK)/mk/rte.lib.mk > >>> + > >>> -- > >>> 1.9.1 > >>> > >>> > >> > >>Make this the last patch in your series, and merge it with the chunk from > >>the > >>last patch that adds the enic directory to the lib/Makefile, so that a > >>bisect > >>will build between these commits. > >> > >>Neil > >> > > > >
[dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages
Multi-process DPDK application must mmap hugepages and pci resources into the same virtual address space. By default the virtual addresses are chosen by the primary process automatically when calling the mmap. But sometimes the chosen virtual addresses aren't usable in secondary process - for example, secondary process is linked with more libraries than primary process, and the library occupies the same address space that the primary process has requested for PCI mappings. This patch makes EAL map PCI BARs right after the hugepages (instead of location chosen by mmap) in virtual memory. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Xu --- lib/librte_eal/linuxapp/eal/eal_pci.c | 19 +++ lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 - lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 13 +++-- lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 6 ++ 4 files changed, 44 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 5fe3961..dae8739 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,6 +97,25 @@ error: return -1; } +void * +pci_find_max_end_va(void) +{ + const struct rte_memseg *seg = rte_eal_get_physmem_layout(); + const struct rte_memseg *last = seg; + unsigned i = 0; + + for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) { + if (seg->addr == NULL) + break; + + if (seg->addr > last->addr) + last = seg; + + } + return RTE_PTR_ADD(last->addr, last->len); +} + + /* map a particular resource from a file */ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c index 7e62266..5090bf1 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c @@ -48,6 +48,8 @@ static int pci_parse_sysfs_value(const char *filename, uint64_t *val); +void *pci_map_addr = NULL; + #define OFF_MAX ((uint64_t)(off_t)-1) static int @@ -371,10 +373,15 @@ pci_uio_map_resource(struct rte_pci_device *dev) if (maps[j].addr != NULL) fail = 1; else { - mapaddr = pci_map_resource(NULL, fd, (off_t)offset, + if (pci_map_addr == NULL) + pci_map_addr = pci_find_max_end_va(); + + mapaddr = pci_map_resource(pci_map_addr, fd, (off_t)offset, (size_t)maps[j].size); if (mapaddr == NULL) fail = 1; + + pci_map_addr = RTE_PTR_ADD(pci_map_addr, maps[j].size); } if (fail) { diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index c776ddc..fb6ee7a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -720,8 +720,17 @@ pci_vfio_map_resource(struct rte_pci_device *dev) if (i == msix_bar) continue; - bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset, - reg.size); + if (internal_config.process_type == RTE_PROC_PRIMARY) { + if (pci_map_addr == NULL) + pci_map_addr = pci_find_max_end_va(); + + bar_addr = pci_map_resource(pci_map_addr, vfio_dev_fd, reg.offset, + reg.size); + pci_map_addr = RTE_PTR_ADD(pci_map_addr, reg.size); + } else { + bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset, + reg.size); + } if (bar_addr == NULL) { RTE_LOG(ERR, EAL, " %s mapping BAR%i failed: %s\n", pci_addr, i, diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h index d758bee..1070eb8 100644 --- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h +++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h @@ -59,6 +59,12 @@ struct mapped_pci_resource { TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource); extern struct mapped_pci_res_list *pci_res_list; +/* + * Helper function to map PCI resources right after hugepages in virtual memory + */ +extern void *pci_map_addr; +void *pci_find_max_end_va(void); + void *pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size); -- 1.8.1.4
[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
Hi Oliver, > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ > Sent: Friday, November 07, 2014 5:16 PM > To: Yong Wang; Liu, Jijiang > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum > offload > > Hello Yong, > > On 11/07/2014 01:43 AM, Yong Wang wrote: > >>> As to HW TX checksum offload, do you have special requirement for > >>> implementing TSO? > > > >> Yes. TSO implies TX TCP and IP checksum offload. > > > > Is this a general requirement or something specific to ixgbe/i40e? FWIW, > > vmxnet3 device does not support tx IP checksum offload but doe support > > TSO. In that case, we cannot leave IP checksum field as 0 (the correct > > checksum needs to be filled in the header) before passing it the the NIC > > when TSO is enabled. > > This is a good question because we need to define the proper API that > will work on other PMDs in the future. > > Indeed, there is a hardware specificity in ixgbe: when TSO is enabled, > the IP checksum flag must also be passed to the driver if it's IPv4. > From 82599 datasheets (7.2.3.2.4 Advanced Transmit Data Descriptor): > > IXSM (bit 0) - Insert IP Checksum: This field indicates that IP > checksum must be inserted. In IPv6 mode, it must be reset to 0b. > If DCMD.TSE and TUCMD.IPV4 are set, IXSM must be set as well. > If this bit is set, the packet should at least contain an > IP header. > > If we allow the user to give the TSO flag without the IP checksum > flag in mbuf flags, the ixgbe driver would have to set the IP checksum > flag in hardware descriptors if the packet is IPv4. The driver would > have to parse the IP header: this is not a problem as we already need > it for TCP checksum. > > To summarize, I think we have 3 options when transmitting a packet to be > segmented using TSO: > > - set IP checksum to 0 in the application: in this case, it would > require additional work in virtual drivers if the peer expects > to receive a packet with a valid IP checksum. But I'm wondering > what is the need for calculating a checksum when transmitting on > a virtual device (the peer receiving the packet knows that the > packet is not corrupted as it comes from memory). Moreover, if the > device advertise TSO, I assume it can also advertise IP checksum > offload. > > - calculate the IP checksum in the application. It would take additional > cycles although it may not be needed as the driver probably knows > how to calculate it. > > - if the driver supports both TSO and IP checksum, the 2 flags MUST > be given to the driver and the IP checksum must be set to 0 and the > checksum cannot be calculated in software. If the driver only > supports TSO, the checksum has to be calculated in software. > > Currently, I choosen the first solution, but I'm open to change the > design. Maybe the 3rd one is also a good solution. > > By the way, we had the same kind of discussion with Konstantin [1] > about what to do with the TCP checksum. My feeling is that setting it > to the pseudo-header checksum is the best we can do: > - linux does that > - many hardware requires that (this is not the case for ixgbe, which >need a pshdr checksum without the IP len) > - it can be reused if received by a virtual device and sent to a >physical device supporting TSO Yes, I remember that discussion. I still think we better avoid any read/write access of the packet data inside PMD TX routine. (packet header parsing and/or pseudo-header checksum calculations). As I said before - if different HW have different requirements of what have to be recalculated for HW TX offloads - why not introduce a new function dev_prep_tx(portid, queueid, mbuf[], num)? PMD developer can put all necessary calculations/updates of the packet data and related mbuf fields inside that function. It would be then a PMD responsibility to provide that function and it would be an app layer responsibility to call it for mbufs with TX offload flags before calling tx_burst(). Konstantin > > Best regards, > Olivier > > > [1] http://dpdk.org/ml/archives/dev/2014-May/002766.html
[dpdk-dev] [PATCH] Add in_flight_bitmask so as to use full 32 bits of tag.
OK thx Bruce, I will make 2 commits with one cover-letter. On Mon, Nov 10, 2014 at 1:09 PM, Bruce Richardson < bruce.richardson at intel.com> wrote: > On Mon, Nov 10, 2014 at 12:38:32PM +0200, Qinglai Xiao wrote: > > User application is advocated to set the newly introduced union field > > mbuf->hash.usr as flow id, which is uint32_t. > > With introduction of in_flight_bitmask, the whole 32 bits of tag can > > be used. > > > > Further more, this patch fixed the integer overflow when finding the > > matched tags. > > > > Note that currently librte_distributor supports up to 64 worker > > threads. If more workers are needed, the size of in_flight_bitmask and > > the algorithm of finding matched tag must be revised. > > > > Signed-off-by: Qinglai Xiao > > Hi, > > this would probably be better as two patches rather than one. One patch to > add the hash.usr field, and then use it in the distributor. The change to > the distributor to add the bit mask to allow all bits of the tag to be used > should then go as a separate patch. > > Regards, > /Bruce > > > > --- > > app/test/test_distributor.c | 18 ++-- > > app/test/test_distributor_perf.c |4 +- > > lib/librte_distributor/rte_distributor.c | 45 > + > > lib/librte_distributor/rte_distributor.h |3 ++ > > lib/librte_mbuf/rte_mbuf.h |1 + > > 5 files changed, 47 insertions(+), 24 deletions(-) > > > > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c > > index ce06436..9e8c06d 100644 > > --- a/app/test/test_distributor.c > > +++ b/app/test/test_distributor.c > > @@ -120,7 +120,7 @@ sanity_test(struct rte_distributor *d, struct > rte_mempool *p) > > /* now set all hash values in all buffers to zero, so all pkts go > to the > >* one worker thread */ > > for (i = 0; i < BURST; i++) > > - bufs[i]->hash.rss = 0; > > + bufs[i]->hash.usr = 0; > > > > rte_distributor_process(d, bufs, BURST); > > rte_distributor_flush(d); > > @@ -142,7 +142,7 @@ sanity_test(struct rte_distributor *d, struct > rte_mempool *p) > > if (rte_lcore_count() >= 3) { > > clear_packet_count(); > > for (i = 0; i < BURST; i++) > > - bufs[i]->hash.rss = (i & 1) << 8; > > + bufs[i]->hash.usr = (i & 1) << 8; > > > > rte_distributor_process(d, bufs, BURST); > > rte_distributor_flush(d); > > @@ -167,7 +167,7 @@ sanity_test(struct rte_distributor *d, struct > rte_mempool *p) > >* so load gets distributed */ > > clear_packet_count(); > > for (i = 0; i < BURST; i++) > > - bufs[i]->hash.rss = i; > > + bufs[i]->hash.usr = i; > > > > rte_distributor_process(d, bufs, BURST); > > rte_distributor_flush(d); > > @@ -199,7 +199,7 @@ sanity_test(struct rte_distributor *d, struct > rte_mempool *p) > > return -1; > > } > > for (i = 0; i < BIG_BATCH; i++) > > - many_bufs[i]->hash.rss = i << 2; > > + many_bufs[i]->hash.usr = i << 2; > > > > for (i = 0; i < BIG_BATCH/BURST; i++) { > > rte_distributor_process(d, &many_bufs[i*BURST], BURST); > > @@ -280,7 +280,7 @@ sanity_test_with_mbuf_alloc(struct rte_distributor > *d, struct rte_mempool *p) > > while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) > > rte_distributor_process(d, NULL, 0); > > for (j = 0; j < BURST; j++) { > > - bufs[j]->hash.rss = (i+j) << 1; > > + bufs[j]->hash.usr = (i+j) << 1; > > rte_mbuf_refcnt_set(bufs[j], 1); > > } > > > > @@ -359,7 +359,7 @@ sanity_test_with_worker_shutdown(struct > rte_distributor *d, > > /* now set all hash values in all buffers to zero, so all pkts go > to the > >* one worker thread */ > > for (i = 0; i < BURST; i++) > > - bufs[i]->hash.rss = 0; > > + bufs[i]->hash.usr = 0; > > > > rte_distributor_process(d, bufs, BURST); > > /* at this point, we will have processed some packets and have a > full > > @@ -372,7 +372,7 @@ sanity_test_with_worker_shutdown(struct > rte_distributor *d, > > return -1; > > } > > for (i = 0; i < BURST; i++) > > - bufs[i]->hash.rss = 0; > > + bufs[i]->hash.usr = 0; > > > > /* get worker zero to quit */ > > zero_quit = 1; > > @@ -416,7 +416,7 @@ test_flush_with_worker_shutdown(struct > rte_distributor *d, > > /* now set all hash values in all buffers to zero, so all pkts go > to the > >* one worker thread */ > > for (i = 0; i < BURST; i++) > > - bufs[i]->hash.rss = 0; > > + bufs[i]->hash.usr = 0; > > > > rte_distributor_process(d, bufs, BURST); > > /* at this point, we will have processed some packets and have a > full >
[dpdk-dev] [PATCH v2 0/2] Add in_flight_bitmask so as to use full 32 bits of tag
The patch series extends the tags used by librte_distributor from 31 bits to 32 bits. Besides, it fixes the integer overflow in the algorithm of finding matched tags. The newly introduced union field rte_mbuf.hash.usr stands as the flow identifier. User application is advocated to set this field for each mbuf before calling the distributor process rte_distributor_process. Qinglai Xiao (2): Add new union field usr in mbuf->hash. Add in_flight_bitmask so as to use full 32 bits of tag. app/test/test_distributor.c | 18 ++-- app/test/test_distributor_perf.c |4 +- lib/librte_distributor/rte_distributor.c | 45 ++ lib/librte_distributor/rte_distributor.h |4 ++ lib/librte_mbuf/rte_mbuf.h |1 + 5 files changed, 49 insertions(+), 23 deletions(-)
[dpdk-dev] [PATCH v2 1/2] Add new union field usr in mbuf->hash.
This field is added for librte_distributor. User of librte_distributor is advocated to set value of mbuf->hash.usr before calling rte_distributor_process. The value of usr is the tag which stands as identifier of flow. Signed-off-by: Qinglai Xiao --- app/test/test_distributor.c | 18 +- app/test/test_distributor_perf.c |4 ++-- lib/librte_distributor/rte_distributor.c |2 +- lib/librte_mbuf/rte_mbuf.h |1 + 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c index ce06436..9e8c06d 100644 --- a/app/test/test_distributor.c +++ b/app/test/test_distributor.c @@ -120,7 +120,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) /* now set all hash values in all buffers to zero, so all pkts go to the * one worker thread */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; rte_distributor_process(d, bufs, BURST); rte_distributor_flush(d); @@ -142,7 +142,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) if (rte_lcore_count() >= 3) { clear_packet_count(); for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = (i & 1) << 8; + bufs[i]->hash.usr = (i & 1) << 8; rte_distributor_process(d, bufs, BURST); rte_distributor_flush(d); @@ -167,7 +167,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) * so load gets distributed */ clear_packet_count(); for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = i; + bufs[i]->hash.usr = i; rte_distributor_process(d, bufs, BURST); rte_distributor_flush(d); @@ -199,7 +199,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p) return -1; } for (i = 0; i < BIG_BATCH; i++) - many_bufs[i]->hash.rss = i << 2; + many_bufs[i]->hash.usr = i << 2; for (i = 0; i < BIG_BATCH/BURST; i++) { rte_distributor_process(d, &many_bufs[i*BURST], BURST); @@ -280,7 +280,7 @@ sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p) while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) rte_distributor_process(d, NULL, 0); for (j = 0; j < BURST; j++) { - bufs[j]->hash.rss = (i+j) << 1; + bufs[j]->hash.usr = (i+j) << 1; rte_mbuf_refcnt_set(bufs[j], 1); } @@ -359,7 +359,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d, /* now set all hash values in all buffers to zero, so all pkts go to the * one worker thread */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; rte_distributor_process(d, bufs, BURST); /* at this point, we will have processed some packets and have a full @@ -372,7 +372,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d, return -1; } for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; /* get worker zero to quit */ zero_quit = 1; @@ -416,7 +416,7 @@ test_flush_with_worker_shutdown(struct rte_distributor *d, /* now set all hash values in all buffers to zero, so all pkts go to the * one worker thread */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = 0; + bufs[i]->hash.usr = 0; rte_distributor_process(d, bufs, BURST); /* at this point, we will have processed some packets and have a full @@ -488,7 +488,7 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p) zero_quit = 0; quit = 1; for (i = 0; i < num_workers; i++) - bufs[i]->hash.rss = i << 1; + bufs[i]->hash.usr = i << 1; rte_distributor_process(d, bufs, num_workers); rte_mempool_put_bulk(p, (void *)bufs, num_workers); diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c index b04864c..48ee344 100644 --- a/app/test/test_distributor_perf.c +++ b/app/test/test_distributor_perf.c @@ -159,7 +159,7 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p) } /* ensure we have different hash value for each pkt */ for (i = 0; i < BURST; i++) - bufs[i]->hash.rss = i; + bufs[i]->hash.usr = i; start = rte_rdtsc(); for (i = 0; i < (1hash.usr = i << 1; rte_distributor_process(d, bufs, num_workers); rte_mempool_put_bulk(p, (void *)bufs, num_workers); diff --git a/lib/librte_distributor/rte_distributor.c b/lib/libr
[dpdk-dev] [PATCH v2 2/2] Add in_flight_bitmask so as to use full 32 bits of tag.
With introduction of in_flight_bitmask, the whole 32 bits of tag can be used. Further more, this patch fixed the integer overflow when finding the matched tags. Note that currently librte_distributor supports up to 64 worker threads. If more workers are needed, the size of in_flight_bitmask and the algorithm of finding matched tag must be revised. Signed-off-by: Qinglai Xiao --- lib/librte_distributor/rte_distributor.c | 45 ++ lib/librte_distributor/rte_distributor.h |4 ++ 2 files changed, 37 insertions(+), 12 deletions(-) diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c index 3dfec4a..3dfccae 100644 --- a/lib/librte_distributor/rte_distributor.c +++ b/lib/librte_distributor/rte_distributor.c @@ -92,7 +92,13 @@ struct rte_distributor { unsigned num_workers; /**< Number of workers polling */ uint32_t in_flight_tags[RTE_MAX_LCORE]; - /**< Tracks the tag being processed per core, 0 == no pkt */ + /**< Tracks the tag being processed per core */ + uint64_t in_flight_bitmask; + /**< on/off bits for in-flight tags. +* Note that if RTE_MAX_LCORE is larger than 64 then +* the bitmask has to expand. +*/ + struct rte_distributor_backlog backlog[RTE_MAX_LCORE]; union rte_distributor_buffer bufs[RTE_MAX_LCORE]; @@ -189,6 +195,7 @@ static inline void handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) { d->in_flight_tags[wkr] = 0; + d->in_flight_bitmask &= ~(1UL << wkr); d->bufs[wkr].bufptr64 = 0; if (unlikely(d->backlog[wkr].count != 0)) { /* On return of a packet, we need to move the @@ -211,7 +218,10 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >> RTE_DISTRIB_FLAG_BITS)); } - /* recursive call */ + /* recursive call. +* Note that the tags were set before first level call +* to rte_distributor_process. +*/ rte_distributor_process(d, pkts, i); bl->count = bl->start = 0; } @@ -242,6 +252,7 @@ process_returns(struct rte_distributor *d) else { d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF; d->in_flight_tags[wkr] = 0; + d->in_flight_bitmask &= ~(1UL << wkr); } oldbuf = data >> RTE_DISTRIB_FLAG_BITS; } else if (data & RTE_DISTRIB_RETURN_BUF) { @@ -284,14 +295,18 @@ rte_distributor_process(struct rte_distributor *d, next_value = (((int64_t)(uintptr_t)next_mb) << RTE_DISTRIB_FLAG_BITS); /* -* Set the low bit on the tag, so we can guarantee that -* we never store a tag value of zero. That means we can -* use the zero-value to indicate that no packet is -* being processed by a worker. +* User is advocated to set tag vaue for each +* mbuf before calling rte_distributor_process. +* User defined tags are used to identify flows, +* or sessions. */ - new_tag = (next_mb->hash.usr | 1); + new_tag = next_mb->hash.usr; - uint32_t match = 0; + /* +* Note that if RTE_MAX_LCORE is larger than 64 then +* the size of match has to be expanded. +*/ + uint64_t match = 0; unsigned i; /* * to scan for a match use "xor" and "not" to get a 0/1 @@ -303,9 +318,12 @@ rte_distributor_process(struct rte_distributor *d, match |= (!(d->in_flight_tags[i] ^ new_tag) << i); + /* Only turned-on bits are considered as match */ + match &= d->in_flight_bitmask; + if (match) { next_mb = NULL; - unsigned worker = __builtin_ctz(match); + unsigned worker = __builtin_ctzl(match); if (add_to_backlog(&d->backlog[worker], next_value) < 0) next_idx--; @@ -322,6 +340,7 @@ rte_distributor_process(struct rte_distributor *d, else
[dpdk-dev] [PATCH v2 1/2] Add new union field usr in mbuf->hash.
On Mon, Nov 10, 2014 at 02:52:46PM +0200, Qinglai Xiao wrote: > This field is added for librte_distributor. User of librte_distributor > is advocated to set value of mbuf->hash.usr before calling > rte_distributor_process. The value of usr is the tag which stands as > identifier of flow. > > Signed-off-by: Qinglai Xiao Acked-by: Bruce Richardson > --- > app/test/test_distributor.c | 18 +- > app/test/test_distributor_perf.c |4 ++-- > lib/librte_distributor/rte_distributor.c |2 +- > lib/librte_mbuf/rte_mbuf.h |1 + > 4 files changed, 13 insertions(+), 12 deletions(-) > > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c > index ce06436..9e8c06d 100644 > --- a/app/test/test_distributor.c > +++ b/app/test/test_distributor.c > @@ -120,7 +120,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) > /* now set all hash values in all buffers to zero, so all pkts go to the >* one worker thread */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > rte_distributor_process(d, bufs, BURST); > rte_distributor_flush(d); > @@ -142,7 +142,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) > if (rte_lcore_count() >= 3) { > clear_packet_count(); > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = (i & 1) << 8; > + bufs[i]->hash.usr = (i & 1) << 8; > > rte_distributor_process(d, bufs, BURST); > rte_distributor_flush(d); > @@ -167,7 +167,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) >* so load gets distributed */ > clear_packet_count(); > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = i; > + bufs[i]->hash.usr = i; > > rte_distributor_process(d, bufs, BURST); > rte_distributor_flush(d); > @@ -199,7 +199,7 @@ sanity_test(struct rte_distributor *d, struct rte_mempool > *p) > return -1; > } > for (i = 0; i < BIG_BATCH; i++) > - many_bufs[i]->hash.rss = i << 2; > + many_bufs[i]->hash.usr = i << 2; > > for (i = 0; i < BIG_BATCH/BURST; i++) { > rte_distributor_process(d, &many_bufs[i*BURST], BURST); > @@ -280,7 +280,7 @@ sanity_test_with_mbuf_alloc(struct rte_distributor *d, > struct rte_mempool *p) > while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) > rte_distributor_process(d, NULL, 0); > for (j = 0; j < BURST; j++) { > - bufs[j]->hash.rss = (i+j) << 1; > + bufs[j]->hash.usr = (i+j) << 1; > rte_mbuf_refcnt_set(bufs[j], 1); > } > > @@ -359,7 +359,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor > *d, > /* now set all hash values in all buffers to zero, so all pkts go to the >* one worker thread */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > rte_distributor_process(d, bufs, BURST); > /* at this point, we will have processed some packets and have a full > @@ -372,7 +372,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor > *d, > return -1; > } > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > /* get worker zero to quit */ > zero_quit = 1; > @@ -416,7 +416,7 @@ test_flush_with_worker_shutdown(struct rte_distributor *d, > /* now set all hash values in all buffers to zero, so all pkts go to the >* one worker thread */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = 0; > + bufs[i]->hash.usr = 0; > > rte_distributor_process(d, bufs, BURST); > /* at this point, we will have processed some packets and have a full > @@ -488,7 +488,7 @@ quit_workers(struct rte_distributor *d, struct > rte_mempool *p) > zero_quit = 0; > quit = 1; > for (i = 0; i < num_workers; i++) > - bufs[i]->hash.rss = i << 1; > + bufs[i]->hash.usr = i << 1; > rte_distributor_process(d, bufs, num_workers); > > rte_mempool_put_bulk(p, (void *)bufs, num_workers); > diff --git a/app/test/test_distributor_perf.c > b/app/test/test_distributor_perf.c > index b04864c..48ee344 100644 > --- a/app/test/test_distributor_perf.c > +++ b/app/test/test_distributor_perf.c > @@ -159,7 +159,7 @@ perf_test(struct rte_distributor *d, struct rte_mempool > *p) > } > /* ensure we have different hash value for each pkt */ > for (i = 0; i < BURST; i++) > - bufs[i]->hash.rss = i; > + bufs[i]->hash.usr = i; > > start = rte_rdtsc(); > for (i = 0; i < (1< @@ -198,7 +198,7 @@ quit_workers(str
[dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages
Nak, there are issues with the patch. There is another patch already, but I'll submit it whenever Liang verifies it works with his setup. Thanks, Anatoly -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Anatoly Burakov Sent: Monday, November 10, 2014 11:35 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages Multi-process DPDK application must mmap hugepages and pci resources into the same virtual address space. By default the virtual addresses are chosen by the primary process automatically when calling the mmap. But sometimes the chosen virtual addresses aren't usable in secondary process - for example, secondary process is linked with more libraries than primary process, and the library occupies the same address space that the primary process has requested for PCI mappings. This patch makes EAL map PCI BARs right after the hugepages (instead of location chosen by mmap) in virtual memory. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Xu --- lib/librte_eal/linuxapp/eal/eal_pci.c | 19 +++ lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 - lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 13 +++-- lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 6 ++ 4 files changed, 44 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 5fe3961..dae8739 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -97,6 +97,25 @@ error: return -1; } +void * +pci_find_max_end_va(void) +{ + const struct rte_memseg *seg = rte_eal_get_physmem_layout(); + const struct rte_memseg *last = seg; + unsigned i = 0; + + for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) { + if (seg->addr == NULL) + break; + + if (seg->addr > last->addr) + last = seg; + + } + return RTE_PTR_ADD(last->addr, last->len); +} + + /* map a particular resource from a file */ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c index 7e62266..5090bf1 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c @@ -48,6 +48,8 @@ static int pci_parse_sysfs_value(const char *filename, uint64_t *val); +void *pci_map_addr = NULL; + #define OFF_MAX ((uint64_t)(off_t)-1) static int @@ -371,10 +373,15 @@ pci_uio_map_resource(struct rte_pci_device *dev) if (maps[j].addr != NULL) fail = 1; else { - mapaddr = pci_map_resource(NULL, fd, (off_t)offset, + if (pci_map_addr == NULL) + pci_map_addr = pci_find_max_end_va(); + + mapaddr = pci_map_resource(pci_map_addr, fd, (off_t)offset, (size_t)maps[j].size); if (mapaddr == NULL) fail = 1; + + pci_map_addr = RTE_PTR_ADD(pci_map_addr, maps[j].size); } if (fail) { diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index c776ddc..fb6ee7a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -720,8 +720,17 @@ pci_vfio_map_resource(struct rte_pci_device *dev) if (i == msix_bar) continue; - bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset, - reg.size); + if (internal_config.process_type == RTE_PROC_PRIMARY) { + if (pci_map_addr == NULL) + pci_map_addr = pci_find_max_end_va(); + + bar_addr = pci_map_resource(pci_map_addr, vfio_dev_fd, reg.offset, + reg.size); + pci_map_addr = RTE_PTR_ADD(pci_map_addr, reg.size); + } else { + bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset, + reg.size); + } if (bar_addr == NULL) { RTE_LOG(ERR, EAL, " %s mapping BAR%i failed: %s\n", pci_addr, i, diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h index d758bee..1070eb8 100644 --- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h +++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h @@ -59,6 +59,12 @@ struct mapped_pci
[dpdk-dev] [PATCH v2 2/2] Add in_flight_bitmask so as to use full 32 bits of tag.
On Mon, Nov 10, 2014 at 02:52:47PM +0200, Qinglai Xiao wrote: > With introduction of in_flight_bitmask, the whole 32 bits of tag can be > used. Further more, this patch fixed the integer overflow when finding > the matched tags. > Note that currently librte_distributor supports up to 64 worker threads. > If more workers are needed, the size of in_flight_bitmask and the > algorithm of finding matched tag must be revised. > > Signed-off-by: Qinglai Xiao > --- > lib/librte_distributor/rte_distributor.c | 45 > ++ > lib/librte_distributor/rte_distributor.h |4 ++ > 2 files changed, 37 insertions(+), 12 deletions(-) > > diff --git a/lib/librte_distributor/rte_distributor.c > b/lib/librte_distributor/rte_distributor.c > index 3dfec4a..3dfccae 100644 > --- a/lib/librte_distributor/rte_distributor.c > +++ b/lib/librte_distributor/rte_distributor.c > @@ -92,7 +92,13 @@ struct rte_distributor { > unsigned num_workers; /**< Number of workers polling */ > > uint32_t in_flight_tags[RTE_MAX_LCORE]; > - /**< Tracks the tag being processed per core, 0 == no pkt */ > + /**< Tracks the tag being processed per core */ > + uint64_t in_flight_bitmask; > + /**< on/off bits for in-flight tags. > + * Note that if RTE_MAX_LCORE is larger than 64 then > + * the bitmask has to expand. > + */ I would suggest for this that we break the link with RTE_MAX_LCORE. Instead, we can just enforce a hard limit on the distributor that it can only work with 64 worker cores. That should avoid any complications. I would suggest we do a further check in the create function something like the below: if (num_workers >= sizeof(d->in_flight_bitmask) * CHAR_BIT) { rte_errno = . } > + > struct rte_distributor_backlog backlog[RTE_MAX_LCORE]; > > union rte_distributor_buffer bufs[RTE_MAX_LCORE]; > @@ -189,6 +195,7 @@ static inline void > handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) > { > d->in_flight_tags[wkr] = 0; > + d->in_flight_bitmask &= ~(1UL << wkr); > d->bufs[wkr].bufptr64 = 0; > if (unlikely(d->backlog[wkr].count != 0)) { > /* On return of a packet, we need to move the > @@ -211,7 +218,10 @@ handle_worker_shutdown(struct rte_distributor *d, > unsigned wkr) > pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >> > RTE_DISTRIB_FLAG_BITS)); > } > - /* recursive call */ > + /* recursive call. > + * Note that the tags were set before first level call > + * to rte_distributor_process. > + */ > rte_distributor_process(d, pkts, i); > bl->count = bl->start = 0; > } > @@ -242,6 +252,7 @@ process_returns(struct rte_distributor *d) > else { > d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF; > d->in_flight_tags[wkr] = 0; > + d->in_flight_bitmask &= ~(1UL << wkr); > } > oldbuf = data >> RTE_DISTRIB_FLAG_BITS; > } else if (data & RTE_DISTRIB_RETURN_BUF) { > @@ -284,14 +295,18 @@ rte_distributor_process(struct rte_distributor *d, > next_value = (((int64_t)(uintptr_t)next_mb) > << RTE_DISTRIB_FLAG_BITS); > /* > - * Set the low bit on the tag, so we can guarantee that > - * we never store a tag value of zero. That means we can > - * use the zero-value to indicate that no packet is > - * being processed by a worker. > + * User is advocated to set tag vaue for each > + * mbuf before calling rte_distributor_process. > + * User defined tags are used to identify flows, > + * or sessions. >*/ > - new_tag = (next_mb->hash.usr | 1); > + new_tag = next_mb->hash.usr; > > - uint32_t match = 0; > + /* > + * Note that if RTE_MAX_LCORE is larger than 64 then > + * the size of match has to be expanded. > + */ > + uint64_t match = 0; > unsigned i; > /* >* to scan for a match use "xor" and "not" to get a 0/1 > @@ -303,9 +318,12 @@ rte_distributor_process(struct rte_distributor *d, > match |= (!(d->in_flight_tags[i] ^ new_tag) > << i); > > + /* Only turned-on bits are considered as match */ > + match &= d->in_flight_bitmas
[dpdk-dev] building shared library
Hi, is it possible to build a dpdk app as a shared library? I tried to put 'include $(RTE_SDK)/mk/rte.extshared.mk' in my Makefile (and define SHARED) and it builds .so lib, but all rte_* symbols are undefined. After that i tried adding: LDLIBS += -lrte_eal -lrte_mbuf -lrte_cmdline -lrte_timer -lrte_mempool -lrte_ring -lrte_pmd_ring -lethdev -lrte_malloc And now almost all symbols in .so file are defined (missing only rte_hexdump). I thought this was gonna be it. But after using this library, pci probe-ing fails since I don't have any pmd drivers registered, and rte_eth_dev_count() returns 0. But how are drivers supposed to be registered? When I use gdb with regular dpdk app (not shared library), I can see this: #0 0x0046fab0 in rte_eal_driver_register () #1 0x00418fb7 in devinitfn_bond_drv () #2 0x004f15ed in __libc_csu_init () #3 0x76efee55 in __libc_start_main (main=0x41ee65 , argc=1, argv=0x7fffe4f8, init=0x4f15a0 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffe4e8) at libc-start.c:246 #4 0x0041953c in _start () Ok, if I'm not mistaken, it seems driver registration is called before main. How is this accomplished? Cause in shared library build, I don't have this before main() and after rte_eal_init() (since driver list is empty) everything else fails. Any suggestions please? I'd really appreciate it... BR, Newman P.
[dpdk-dev] [PATCH v3] Add in_flight_bitmask so as to use full 32 bits of tag.
With introduction of in_flight_bitmask, the whole 32 bits of tag can be used. Further more, this patch fixed the integer overflow when finding the matched tags. The maximum number workers is now defined as 64, which is length of double-word. The link between number of workers and RTE_MAX_LCORE is now removed. Compile time check is added to ensure the RTE_DISTRIB_MAX_WORKERS is less than or equal to size of double-word. Signed-off-by: Qinglai Xiao --- lib/librte_distributor/rte_distributor.c | 64 ++ lib/librte_distributor/rte_distributor.h |4 ++ 2 files changed, 51 insertions(+), 17 deletions(-) diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c index 3dfec4a..2c5d61c 100644 --- a/lib/librte_distributor/rte_distributor.c +++ b/lib/librte_distributor/rte_distributor.c @@ -62,6 +62,13 @@ #define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1) /** + * Maximum number of workers allowed. + * Be aware of increasing the limit, becaus it is limited by how we track + * in-flight tags. See @in_flight_bitmask and @rte_distributor_process + */ +#define RTE_DISTRIB_MAX_WORKERS64 + +/** * Buffer structure used to pass the pointer data between cores. This is cache * line aligned, but to improve performance and prevent adjacent cache-line * prefetches of buffers for other workers, e.g. when worker 1's buffer is on @@ -91,11 +98,17 @@ struct rte_distributor { char name[RTE_DISTRIBUTOR_NAMESIZE]; /**< Name of the ring. */ unsigned num_workers; /**< Number of workers polling */ - uint32_t in_flight_tags[RTE_MAX_LCORE]; - /**< Tracks the tag being processed per core, 0 == no pkt */ - struct rte_distributor_backlog backlog[RTE_MAX_LCORE]; + uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS]; + /**< Tracks the tag being processed per core */ + uint64_t in_flight_bitmask; + /**< on/off bits for in-flight tags. +* Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then +* the bitmask has to expand. +*/ + + struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]; - union rte_distributor_buffer bufs[RTE_MAX_LCORE]; + union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS]; struct rte_distributor_returned_pkts returns; }; @@ -189,6 +202,7 @@ static inline void handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) { d->in_flight_tags[wkr] = 0; + d->in_flight_bitmask &= ~(1UL << wkr); d->bufs[wkr].bufptr64 = 0; if (unlikely(d->backlog[wkr].count != 0)) { /* On return of a packet, we need to move the @@ -211,7 +225,10 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >> RTE_DISTRIB_FLAG_BITS)); } - /* recursive call */ + /* recursive call. +* Note that the tags were set before first level call +* to rte_distributor_process. +*/ rte_distributor_process(d, pkts, i); bl->count = bl->start = 0; } @@ -242,6 +259,7 @@ process_returns(struct rte_distributor *d) else { d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF; d->in_flight_tags[wkr] = 0; + d->in_flight_bitmask &= ~(1UL << wkr); } oldbuf = data >> RTE_DISTRIB_FLAG_BITS; } else if (data & RTE_DISTRIB_RETURN_BUF) { @@ -284,14 +302,18 @@ rte_distributor_process(struct rte_distributor *d, next_value = (((int64_t)(uintptr_t)next_mb) << RTE_DISTRIB_FLAG_BITS); /* -* Set the low bit on the tag, so we can guarantee that -* we never store a tag value of zero. That means we can -* use the zero-value to indicate that no packet is -* being processed by a worker. +* User is advocated to set tag vaue for each +* mbuf before calling rte_distributor_process. +* User defined tags are used to identify flows, +* or sessions. */ - new_tag = (next_mb->hash.usr | 1); + new_tag = next_mb->hash.usr; - uint32_t match = 0; + /* +* Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 +* then the size of match has to be expanded. +*/ + uint64_t match = 0
[dpdk-dev] [PATCH v3] Add in_flight_bitmask so as to use full 32 bits of tag.
On Mon, Nov 10, 2014 at 04:44:02PM +0200, Qinglai Xiao wrote: > With introduction of in_flight_bitmask, the whole 32 bits of tag can be > used. Further more, this patch fixed the integer overflow when finding > the matched tags. > The maximum number workers is now defined as 64, which is length of > double-word. The link between number of workers and RTE_MAX_LCORE is > now removed. Compile time check is added to ensure the > RTE_DISTRIB_MAX_WORKERS is less than or equal to size of double-word. > > Signed-off-by: Qinglai Xiao Looks good to me. Just before I ack this, have you checked to see if there is any performance impact? /Bruce > --- > lib/librte_distributor/rte_distributor.c | 64 > ++ > lib/librte_distributor/rte_distributor.h |4 ++ > 2 files changed, 51 insertions(+), 17 deletions(-) > > diff --git a/lib/librte_distributor/rte_distributor.c > b/lib/librte_distributor/rte_distributor.c > index 3dfec4a..2c5d61c 100644 > --- a/lib/librte_distributor/rte_distributor.c > +++ b/lib/librte_distributor/rte_distributor.c > @@ -62,6 +62,13 @@ > #define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1) > > /** > + * Maximum number of workers allowed. > + * Be aware of increasing the limit, becaus it is limited by how we track > + * in-flight tags. See @in_flight_bitmask and @rte_distributor_process > + */ > +#define RTE_DISTRIB_MAX_WORKERS 64 > + > +/** > * Buffer structure used to pass the pointer data between cores. This is > cache > * line aligned, but to improve performance and prevent adjacent cache-line > * prefetches of buffers for other workers, e.g. when worker 1's buffer is on > @@ -91,11 +98,17 @@ struct rte_distributor { > char name[RTE_DISTRIBUTOR_NAMESIZE]; /**< Name of the ring. */ > unsigned num_workers; /**< Number of workers polling */ > > - uint32_t in_flight_tags[RTE_MAX_LCORE]; > - /**< Tracks the tag being processed per core, 0 == no pkt */ > - struct rte_distributor_backlog backlog[RTE_MAX_LCORE]; > + uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS]; > + /**< Tracks the tag being processed per core */ > + uint64_t in_flight_bitmask; > + /**< on/off bits for in-flight tags. > + * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then > + * the bitmask has to expand. > + */ > + > + struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]; > > - union rte_distributor_buffer bufs[RTE_MAX_LCORE]; > + union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS]; > > struct rte_distributor_returned_pkts returns; > }; > @@ -189,6 +202,7 @@ static inline void > handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) > { > d->in_flight_tags[wkr] = 0; > + d->in_flight_bitmask &= ~(1UL << wkr); > d->bufs[wkr].bufptr64 = 0; > if (unlikely(d->backlog[wkr].count != 0)) { > /* On return of a packet, we need to move the > @@ -211,7 +225,10 @@ handle_worker_shutdown(struct rte_distributor *d, > unsigned wkr) > pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >> > RTE_DISTRIB_FLAG_BITS)); > } > - /* recursive call */ > + /* recursive call. > + * Note that the tags were set before first level call > + * to rte_distributor_process. > + */ > rte_distributor_process(d, pkts, i); > bl->count = bl->start = 0; > } > @@ -242,6 +259,7 @@ process_returns(struct rte_distributor *d) > else { > d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF; > d->in_flight_tags[wkr] = 0; > + d->in_flight_bitmask &= ~(1UL << wkr); > } > oldbuf = data >> RTE_DISTRIB_FLAG_BITS; > } else if (data & RTE_DISTRIB_RETURN_BUF) { > @@ -284,14 +302,18 @@ rte_distributor_process(struct rte_distributor *d, > next_value = (((int64_t)(uintptr_t)next_mb) > << RTE_DISTRIB_FLAG_BITS); > /* > - * Set the low bit on the tag, so we can guarantee that > - * we never store a tag value of zero. That means we can > - * use the zero-value to indicate that no packet is > - * being processed by a worker. > + * User is advocated to set tag vaue for each > + * mbuf before calling rte_distributor_process. > + * User defined tags are used to identify flows, > + * or sessions. >*/ > - new_tag = (next_mb->hash.usr | 1); > + new_tag = next_mb->hash.usr; > > -
[dpdk-dev] [PATCH v3] Add in_flight_bitmask so as to use full 32 bits of tag.
Hi Bruce, Sorry I didn't. I will run a performance test tomorrow. thx & rgds, -qinglai On Mon, Nov 10, 2014 at 5:13 PM, Bruce Richardson < bruce.richardson at intel.com> wrote: > On Mon, Nov 10, 2014 at 04:44:02PM +0200, Qinglai Xiao wrote: > > With introduction of in_flight_bitmask, the whole 32 bits of tag can be > > used. Further more, this patch fixed the integer overflow when finding > > the matched tags. > > The maximum number workers is now defined as 64, which is length of > > double-word. The link between number of workers and RTE_MAX_LCORE is > > now removed. Compile time check is added to ensure the > > RTE_DISTRIB_MAX_WORKERS is less than or equal to size of double-word. > > > > Signed-off-by: Qinglai Xiao > > Looks good to me. > Just before I ack this, have you checked to see if there is any > performance impact? > > /Bruce > > > --- > > lib/librte_distributor/rte_distributor.c | 64 > ++ > > lib/librte_distributor/rte_distributor.h |4 ++ > > 2 files changed, 51 insertions(+), 17 deletions(-) > > > > diff --git a/lib/librte_distributor/rte_distributor.c > b/lib/librte_distributor/rte_distributor.c > > index 3dfec4a..2c5d61c 100644 > > --- a/lib/librte_distributor/rte_distributor.c > > +++ b/lib/librte_distributor/rte_distributor.c > > @@ -62,6 +62,13 @@ > > #define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1) > > > > /** > > + * Maximum number of workers allowed. > > + * Be aware of increasing the limit, becaus it is limited by how we > track > > + * in-flight tags. See @in_flight_bitmask and @rte_distributor_process > > + */ > > +#define RTE_DISTRIB_MAX_WORKERS 64 > > + > > +/** > > * Buffer structure used to pass the pointer data between cores. This > is cache > > * line aligned, but to improve performance and prevent adjacent > cache-line > > * prefetches of buffers for other workers, e.g. when worker 1's buffer > is on > > @@ -91,11 +98,17 @@ struct rte_distributor { > > char name[RTE_DISTRIBUTOR_NAMESIZE]; /**< Name of the ring. */ > > unsigned num_workers; /**< Number of workers > polling */ > > > > - uint32_t in_flight_tags[RTE_MAX_LCORE]; > > - /**< Tracks the tag being processed per core, 0 == no pkt > */ > > - struct rte_distributor_backlog backlog[RTE_MAX_LCORE]; > > + uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS]; > > + /**< Tracks the tag being processed per core */ > > + uint64_t in_flight_bitmask; > > + /**< on/off bits for in-flight tags. > > + * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 > then > > + * the bitmask has to expand. > > + */ > > + > > + struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]; > > > > - union rte_distributor_buffer bufs[RTE_MAX_LCORE]; > > + union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS]; > > > > struct rte_distributor_returned_pkts returns; > > }; > > @@ -189,6 +202,7 @@ static inline void > > handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) > > { > > d->in_flight_tags[wkr] = 0; > > + d->in_flight_bitmask &= ~(1UL << wkr); > > d->bufs[wkr].bufptr64 = 0; > > if (unlikely(d->backlog[wkr].count != 0)) { > > /* On return of a packet, we need to move the > > @@ -211,7 +225,10 @@ handle_worker_shutdown(struct rte_distributor *d, > unsigned wkr) > > pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >> > > RTE_DISTRIB_FLAG_BITS)); > > } > > - /* recursive call */ > > + /* recursive call. > > + * Note that the tags were set before first level call > > + * to rte_distributor_process. > > + */ > > rte_distributor_process(d, pkts, i); > > bl->count = bl->start = 0; > > } > > @@ -242,6 +259,7 @@ process_returns(struct rte_distributor *d) > > else { > > d->bufs[wkr].bufptr64 = > RTE_DISTRIB_GET_BUF; > > d->in_flight_tags[wkr] = 0; > > + d->in_flight_bitmask &= ~(1UL << wkr); > > } > > oldbuf = data >> RTE_DISTRIB_FLAG_BITS; > > } else if (data & RTE_DISTRIB_RETURN_BUF) { > > @@ -284,14 +302,18 @@ rte_distributor_process(struct rte_distributor *d, > > next_value = (((int64_t)(uintptr_t)next_mb) > > << RTE_DISTRIB_FLAG_BITS); > > /* > > - * Set the low bit on the tag, so we can guarantee > that > > - * we never store a tag value of zero. That means > we can > > - * use the zero-value to indicate that no packet is > > - * being processed by a worker. > > + * User is a
[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
Hello Konstantin, >> By the way, we had the same kind of discussion with Konstantin [1] >> about what to do with the TCP checksum. My feeling is that setting it >> to the pseudo-header checksum is the best we can do: >> - linux does that >> - many hardware requires that (this is not the case for ixgbe, which >>need a pshdr checksum without the IP len) >> - it can be reused if received by a virtual device and sent to a >>physical device supporting TSO > > Yes, I remember that discussion. > I still think we better avoid any read/write access of the packet data inside > PMD TX routine. > (packet header parsing and/or pseudo-header checksum calculations). > As I said before - if different HW have different requirements of what have > to be recalculated for HW TX offloads - > why not introduce a new function dev_prep_tx(portid, queueid, mbuf[], num)? > PMD developer can put all necessary calculations/updates of the packet data > and related mbuf fields inside that function. > It would be then a PMD responsibility to provide that function and it would > be an app layer responsibility to call it for > mbufs with TX offload flags before calling tx_burst(). I think I understand your point: you don't want to touch the packet in the PMD because the lcore that transmits the packet can be different than the one that built it. In this case (i.e. a pipeline case), reading or writing the packet can produce a cache miss, is it correct? >From an API perspective, it looks a bit more complex to have to call dev_prep_tx() before sending the packets if they have been flagged for offload processing. But I admit I have no other argument. I'll be happy to have more comments from other people on the list. I'm sending a first version of the patchset now as it's ready, it does not take in account this comment, but I'm open to add it in a v2 if there is a consensus on this. Now, knowing that: - adding dev_prep_tx() will also concern hw checksum (TCP L4 checksum already requires to set the TCP pseudo header checksum), so adding this will change the API of an existing feature - TSO is a new feature expected for 1.8 (which should be out soon) Do you think we need to include this for 1.8 or can we postpone your proposition for after the 1.8 release? Thank you for your comments, Regards, Olivier >> [1] http://dpdk.org/ml/archives/dev/2014-May/002766.html
[dpdk-dev] [PATCH 00/12] add TSO support
This series add TSO support in ixgbe DPDK driver. This is the third version of the series, but as the previous version [1] was posted several months ago and included a mbuf rework that is now in mainline, it can be considered as a new patch series. I'm open to comments on this patchset, especially on the API (see [2]). This series first fixes some bugs that were discovered during the development, adds some changes to the mbuf API (new l4_len and tso_segsz fields), adds TSO support in ixgbe, reworks testpmd csum forward engine, and finally adds TSO support in testpmd so it can be validated. The new fields added in mbuf try to be generic enough to apply to other hardware in the future. To delegate the TCP segmentation to the hardware, the user has to: - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies PKT_TX_TCP_CKSUM) - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum to 0 in the packet - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz - calculate the pseudo header checksum and set it in the TCP header, as required when doing hardware TCP checksum offload The test report will be added as an answer to this cover letter and could be linked in the concerned commits. [1] http://dpdk.org/ml/archives/dev/2014-May/002537.html [2] http://dpdk.org/ml/archives/dev/2014-November/007940.html Olivier Matz (12): igb/ixgbe: fix IP checksum calculation ixgbe: fix remaining pkt_flags variable size to 64 bits mbuf: move vxlan_cksum flag definition at the proper place mbuf: add help about TX checksum flags mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition mbuf: add functions to get the name of an ol_flag mbuf: generic support for TCP segmentation offload ixgbe: support TCP segmentation offload testpmd: fix use of offload flags in testpmd testpmd: rework csum forward engine testpmd: support TSO in csum forward engine testpmd: add a verbose mode csum forward engine app/test-pmd/cmdline.c | 243 ++-- app/test-pmd/config.c | 15 +- app/test-pmd/csumonly.c | 740 +++- app/test-pmd/macfwd.c | 5 +- app/test-pmd/macswap.c | 5 +- app/test-pmd/rxonly.c | 36 +- app/test-pmd/testpmd.c | 3 +- app/test-pmd/testpmd.h | 24 +- app/test-pmd/txonly.c | 9 +- examples/ipv4_multicast/main.c | 3 +- lib/librte_mbuf/rte_mbuf.h | 130 +-- lib/librte_pmd_e1000/igb_rxtx.c | 16 +- lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 3 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 222 --- lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 19 +- 15 files changed, 921 insertions(+), 552 deletions(-) -- 2.1.0
[dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits
Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the packet flags are now 64 bits wide. Some occurences were forgotten in the ixgbe driver. Signed-off-by: Olivier Matz --- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 78be7e6..042ee8a 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -817,7 +817,7 @@ end_of_tx: static inline uint64_t rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs) { - uint16_t pkt_flags; + uint64_t pkt_flags; static uint64_t ip_pkt_types_map[16] = { 0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT, @@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs) }; #ifdef RTE_LIBRTE_IEEE1588 - static uint32_t ip_pkt_etqf_map[8] = { + static uint64_t ip_pkt_etqf_map[8] = { 0, 0, 0, PKT_RX_IEEE1588_PTP, 0, 0, 0, 0, }; @@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq) struct igb_rx_entry *rxep; struct rte_mbuf *mb; uint16_t pkt_len; - uint16_t pkt_flags; + uint64_t pkt_flags; int s[LOOK_AHEAD], nb_dd; int i, j, nb_rx = 0; @@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_rx; uint16_t nb_hold; uint16_t data_len; - uint16_t pkt_flags; + uint64_t pkt_flags; nb_rx = 0; nb_hold = 0; @@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss); - pkt_flags = (uint16_t)(pkt_flags | + pkt_flags = (pkt_flags | rx_desc_status_to_pkt_flags(staterr)); - pkt_flags = (uint16_t)(pkt_flags | + pkt_flags = (pkt_flags | rx_desc_error_to_pkt_flags(staterr)); first_seg->ol_flags = pkt_flags; -- 2.1.0
[dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place
The tx mbuf flags are ordered from the highest value to the the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place. Signed-off-by: Olivier Matz --- lib/librte_mbuf/rte_mbuf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index e8f9bfc..be15168 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -96,7 +96,6 @@ extern "C" { #define PKT_TX_VLAN_PKT (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */ #define PKT_TX_IP_CKSUM (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */ -#define PKT_TX_VXLAN_CKSUM (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */ #define PKT_TX_IPV4_CSUM PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */ #define PKT_TX_IPV4 PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */ #define PKT_TX_IPV6 PKT_RX_IPV6_HDR /**< IPv6 packet */ @@ -114,9 +113,10 @@ extern "C" { #define PKT_TX_UDP_CKSUM (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */ #define PKT_TX_L4_MASK (3ULL << 52) /**< Mask for L4 cksum offload request. */ -/* Bit 51 - IEEE1588*/ #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */ +#define PKT_TX_VXLAN_CKSUM (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */ + /* Use final bit of flags to indicate a control mbuf */ #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ -- 2.1.0
[dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags
Describe how to use hardware checksum API. Signed-off-by: Olivier Matz --- lib/librte_mbuf/rte_mbuf.h | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index be15168..96e322b 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -95,19 +95,28 @@ extern "C" { #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */ #define PKT_TX_VLAN_PKT (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */ -#define PKT_TX_IP_CKSUM (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */ + +/** + * Enable hardware computation of IP cksum. To use it: + * - fill l2_len and l3_len in mbuf + * - set the flags PKT_TX_IP_CKSUM + * - set the ip checksum to 0 in IP header + */ +#define PKT_TX_IP_CKSUM (1ULL << 54) #define PKT_TX_IPV4_CSUM PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */ #define PKT_TX_IPV4 PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */ #define PKT_TX_IPV6 PKT_RX_IPV6_HDR /**< IPv6 packet */ -/* - * Bits 52+53 used for L4 packet type with checksum enabled. - * 00: Reserved - * 01: TCP checksum - * 10: SCTP checksum - * 11: UDP checksum +/** + * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved, + * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware + * L4 checksum offload, the user needs to: + * - fill l2_len and l3_len in mbuf + * - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM + * - calculate the pseudo header checksum and set it in the L4 header (only + *for TCP or UDP). For SCTP, set the crc field to 0. */ -#define PKT_TX_L4_NO_CKSUM (0ULL << 52) /**< Disable L4 cksum of TX pkt. */ +#define PKT_TX_L4_NO_CKSUM (0ULL << 52) /* Disable L4 cksum of TX pkt. */ #define PKT_TX_TCP_CKSUM (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */ #define PKT_TX_SCTP_CKSUM(2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */ #define PKT_TX_UDP_CKSUM (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */ -- 2.1.0
[dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
This definition is specific to Intel PMD drivers and its definition "indicate what bits required for building TX context" shows that it should not be in the generic rte_mbuf.h but in the PMD driver. Signed-off-by: Olivier Matz --- lib/librte_mbuf/rte_mbuf.h| 5 - lib/librte_pmd_e1000/igb_rxtx.c | 3 ++- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ++- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 96e322b..ff11b84 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -129,11 +129,6 @@ extern "C" { /* Use final bit of flags to indicate a control mbuf */ #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ -/** - * Bit Mask to indicate what bits required for building TX context - */ -#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK) - /* define a set of marker types that can be used to refer to set points in the * mbuf */ typedef void*MARKER[0]; /**< generic marker for a point in a structure */ diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c index 321493e..dbf5074 100644 --- a/lib/librte_pmd_e1000/igb_rxtx.c +++ b/lib/librte_pmd_e1000/igb_rxtx.c @@ -400,7 +400,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, ol_flags = tx_pkt->ol_flags; vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; - tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK; + tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | + PKT_TX_L4_MASK); /* If a Context Descriptor need be built . */ if (tx_ol_req) { diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 042ee8a..70ca254 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -580,7 +580,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, ol_flags = tx_pkt->ol_flags; /* If hardware offload required */ - tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK; + tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | + PKT_TX_L4_MASK); if (tx_ol_req) { vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; -- 2.1.0
[dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
In test-pmd (rxonly.c), the code is able to dump the list of ol_flags. The issue is that the list of flags in the application has to be synchronized with the flags defined in rte_mbuf.h. This patch introduces 2 new functions rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name() that returns the name of a flag from its mask. It also fixes rxonly.c to use this new functions and to display the proper flags. Signed-off-by: Olivier Matz --- app/test-pmd/rxonly.c | 36 lib/librte_mbuf/rte_mbuf.h | 60 ++ 2 files changed, 70 insertions(+), 26 deletions(-) diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index 4410c3d..e7cd7e2 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -71,26 +71,6 @@ #include "testpmd.h" -#define MAX_PKT_RX_FLAGS 13 -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = { - "VLAN_PKT", - "RSS_HASH", - "PKT_RX_FDIR", - "IP_CKSUM", - "IP_CKSUM_BAD", - - "IPV4_HDR", - "IPV4_HDR_EXT", - "IPV6_HDR", - "IPV6_HDR_EXT", - - "IEEE1588_PTP", - "IEEE1588_TMST", - - "TUNNEL_IPV4_HDR", - "TUNNEL_IPV6_HDR", -}; - static inline void print_ether_addr(const char *what, struct ether_addr *eth_addr) { @@ -219,12 +199,16 @@ pkt_burst_receive(struct fwd_stream *fs) printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue); printf("\n"); if (ol_flags != 0) { - int rxf; - - for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) { - if (ol_flags & (1 << rxf)) - printf(" PKT_RX_%s\n", - pkt_rx_flag_names[rxf]); + unsigned rxf; + const char *name; + + for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) { + if ((ol_flags & (1ULL << rxf)) == 0) + continue; + name = rte_get_rx_ol_flag_name(1ULL << rxf); + if (name == NULL) + continue; + printf(" %s\n", name); } } rte_pktmbuf_free(mb); diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index ff11b84..bcd8996 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -129,6 +129,66 @@ extern "C" { /* Use final bit of flags to indicate a control mbuf */ #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ +/** + * Bit Mask to indicate what bits required for building TX context + * Get the name of a RX offload flag + * + * @param mask + * The mask describing the flag. Usually only one bit must be set. + * Several bits can be given if they belong to the same mask. + * Ex: PKT_TX_L4_MASK. + * @return + * The name of this flag, or NULL if it's not a valid RX flag. + */ +static inline const char *rte_get_rx_ol_flag_name(uint64_t mask) +{ + switch (mask) { + case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT"; + case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH"; + case PKT_RX_FDIR: return "PKT_RX_FDIR"; + case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD"; + case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD"; + /* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */ + /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */ + /* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */ + /* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */ + /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */ + case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR"; + case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT"; + case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR"; + case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT"; + case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP"; + case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST"; + case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR"; + case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR"; + default: return NULL; + } +} + +/** + * Get the name of a TX offload flag + * + * @param mask + * The mask describing the flag. Usually only one bit must be set. + * Several bits can be given if they belong to the same mask. + * Ex: PKT_TX_L4_MASK. + * @return + * The name of this flag, or NULL if it's not a valid TX flag. + */ +static inline const char *rte_get_tx_ol_flag_name(uint64_t mask) +{ + switch (mask) { + case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT"; + case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM"; + case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM"; + case PKT_TX_SCTP_CKSUM: return
[dpdk-dev] [PATCH 08/12] ixgbe: support TCP segmentation offload
Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos (l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of TCP segmentation. In ixgbe, when doing TSO, the IP length must not be included in the TCP pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is used to fix the pseudo header checksum of the packet before giving it to the hardware. In the patch, the tx_desc_cksum_flags_to_olinfo() and tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them clearer. This should not impact performance as gcc (version 4.8 in my case) is smart enough to convert the tests into a code that does not contain any branch instruction. Signed-off-by: Olivier Matz --- lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 3 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 220 +--- lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 19 ++-- 3 files changed, 167 insertions(+), 75 deletions(-) diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c index 9c73a30..1ab433a 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c @@ -1961,7 +1961,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM | - DEV_TX_OFFLOAD_SCTP_CKSUM; + DEV_TX_OFFLOAD_SCTP_CKSUM | + DEV_TX_OFFLOAD_TCP_TSO; dev_info->default_rxconf = (struct rte_eth_rxconf) { .rx_thresh = { diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 54a0fc1..79f7395 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -354,62 +354,132 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts, return nb_tx; } +/* When doing TSO, the IP length must not be included in the pseudo + * header checksum of the packet given to the hardware */ +static inline void +ixgbe_fix_tcp_phdr_cksum(struct rte_mbuf *m) +{ + char *data; + uint16_t *cksum_ptr; + uint16_t prev_cksum; + uint16_t new_cksum; + uint16_t ip_len, ip_paylen; + uint32_t tmp; + uint8_t ip_version; + + /* get phdr cksum at offset 16 of TCP header */ + data = rte_pktmbuf_mtod(m, char *); + cksum_ptr = (uint16_t *)(data + m->l2_len + m->l3_len + 16); + prev_cksum = *cksum_ptr; + + /* get ip_version */ + ip_version = (*(uint8_t *)(data + m->l2_len)) >> 4; + + /* get ip_len at offset 2 of IP header or offset 4 of IPv6 header */ + if (ip_version == 4) { + /* override ip cksum to 0 */ + data[m->l2_len + 10] = 0; + data[m->l2_len + 11] = 0; + + ip_len = *(uint16_t *)(data + m->l2_len + 2); + ip_paylen = rte_cpu_to_be_16(rte_be_to_cpu_16(ip_len) - + m->l3_len); + } else { + ip_paylen = *(uint16_t *)(data + m->l2_len + 4); + } + + /* calculate the new phdr checksum that doesn't include ip_paylen */ + tmp = prev_cksum; + if (tmp < ip_paylen) + tmp += 0x; + tmp -= ip_paylen; + new_cksum = tmp; + + /* replace it in the packet */ + *cksum_ptr = new_cksum; +} + static inline void ixgbe_set_xmit_ctx(struct igb_tx_queue* txq, volatile struct ixgbe_adv_tx_context_desc *ctx_txd, - uint64_t ol_flags, uint32_t vlan_macip_lens) + uint64_t ol_flags, union ixgbe_tx_offload tx_offload) { uint32_t type_tucmd_mlhl; - uint32_t mss_l4len_idx; + uint32_t mss_l4len_idx = 0; uint32_t ctx_idx; - uint32_t cmp_mask; + uint32_t vlan_macip_lens; + union ixgbe_tx_offload tx_offload_mask; ctx_idx = txq->ctx_curr; - cmp_mask = 0; + tx_offload_mask.data = 0; type_tucmd_mlhl = 0; + /* Specify which HW CTX to upload. */ + mss_l4len_idx |= (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT); + if (ol_flags & PKT_TX_VLAN_PKT) { - cmp_mask |= TX_VLAN_CMP_MASK; + tx_offload_mask.vlan_tci = ~0; } - if (ol_flags & PKT_TX_IP_CKSUM) { - type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4; - cmp_mask |= TX_MACIP_LEN_CMP_MASK; - } + /* check if TCP segmentation required for this packet */ + if (ol_flags & PKT_TX_TCP_SEG) { + /* implies IP cksum and TCP cksum */ + type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 | + IXGBE_ADVTXD_TUCMD_L4T_TCP | + IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;; + + tx_offload_mask.l2_len = ~0; + tx_offload_mask.l3_len = ~0; + tx_offload_mask.l4
[dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine
The csum forward engine was becoming too complex to be used and extended (the next commits want to add the support of TSO): - no explaination about what the code does - code is not factorized, lots of code duplicated, especially between ipv4/ipv6 - user command line api: use of bitmasks that need to be calculated by the user - the user flags don't have the same semantic: - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum - for other (vxlan), it selects between hardware checksum or no checksum - the code relies too much on flags set by the driver without software alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to compare a software implementation with the hardware offload. This commit tries to fix these issues, and provide a simple definition of what is done by the forward engine: * Receive a burst of packets, and for supported packet types: * - modify the IPs * - reprocess the checksum in SW or HW, depending on testpmd command line *configuration * Then packets are transmitted on the output port. * * Supported packets are: * Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP . * Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP * * The network parser supposes that the packet is contiguous, which may * not be the case in real life. Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c | 151 --- app/test-pmd/config.c | 11 - app/test-pmd/csumonly.c | 668 ++-- app/test-pmd/testpmd.h | 17 +- 4 files changed, 423 insertions(+), 424 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 4c3fc76..0361e58 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void *parsed_result, "Disable hardware insertion of a VLAN header in" " packets sent on a port.\n\n" - "tx_checksum set (mask) (port_id)\n" - "Enable hardware insertion of checksum offload with" - " the 8-bit mask, 0~0xff, in packets sent on a port.\n" - "bit 0 - insert ip checksum offload if set\n" - "bit 1 - insert udp checksum offload if set\n" - "bit 2 - insert tcp checksum offload if set\n" - "bit 3 - insert sctp checksum offload if set\n" - "bit 4 - insert inner ip checksum offload if set\n" - "bit 5 - insert inner udp checksum offload if set\n" - "bit 6 - insert inner tcp checksum offload if set\n" - "bit 7 - insert inner sctp checksum offload if set\n" + "tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n" + "Enable hardware calculation of checksum with when" + " transmitting a packet using 'csum' forward engine.\n" "Please check the NIC datasheet for HW limits.\n\n" + "tx_checksum show (port_id)\n" + "Display tx checksum offload configuration\n\n" + "set fwd (%s)\n" "Set packet forwarding mode.\n\n" @@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = { /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */ -struct cmd_tx_cksum_set_result { +struct cmd_tx_cksum_result { cmdline_fixed_string_t tx_cksum; - cmdline_fixed_string_t set; - uint8_t cksum_mask; + cmdline_fixed_string_t mode; + cmdline_fixed_string_t proto; + cmdline_fixed_string_t hwsw; uint8_t port_id; }; static void -cmd_tx_cksum_set_parsed(void *parsed_result, +cmd_tx_cksum_parsed(void *parsed_result, __attribute__((unused)) struct cmdline *cl, __attribute__((unused)) void *data) { - struct cmd_tx_cksum_set_result *res = parsed_result; + struct cmd_tx_cksum_result *res = parsed_result; + int hw = 0; + uint16_t ol_flags, mask = 0; + struct rte_eth_dev_info dev_info; + + if (port_id_is_invalid(res->port_id)) { + printf("invalid port %d\n", res->port_id); + return; + } - tx_cksum_set(res->port_id, res->cksum_mask); + if (!strcmp(res->mode, "set")) { + + if (!strcmp(res->hwsw, "hw")) + hw = 1; + + if (!strcmp(res->proto, "ip")) { + mask = TESTPMD_TX_OFFLOAD_IP_CKSUM; + } else if (!strcmp(res->proto, "udp")) { + mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM; + } else if (!strcmp(res->proto, "tcp")) { + mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM; + }
[dpdk-dev] [PATCH 12/12] testpmd: add a verbose mode csum forward engine
If the user specifies 'set verbose 1' in testpmd command line, the csum forward engine will dump some informations about received and transmitted packets, especially which flags are set and what values are assigned to l2_len, l3_len, l4_len and tso_segsz. This can help someone implementing TSO or hardware checksum offload to understand how to configure the mbufs. Example of output for one packet: -- rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20 tx: m->l2_len=14 m->l3_len=20 m->l4_len=20 tx: m->tso_segsz=800 tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG -- Signed-off-by: Olivier Matz --- app/test-pmd/csumonly.c | 51 + 1 file changed, 51 insertions(+) diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 7995ff5..74521d4 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -575,6 +575,57 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) m->tso_segsz = tso_segsz; m->ol_flags = ol_flags; + /* if verbose mode is enabled, dump debug info */ + if (verbose_level > 0) { + struct { + uint64_t flag; + uint64_t mask; + } tx_flags[] = { + { PKT_TX_IP_CKSUM, PKT_TX_IP_CKSUM }, + { PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK }, + { PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK }, + { PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK }, + { PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM }, + { PKT_TX_TCP_SEG, PKT_TX_TCP_SEG }, + }; + unsigned j; + const char *name; + + printf("-\n"); + /* dump rx parsed packet info */ + printf("rx: l2_len=%d ethertype=%x l3_len=%d " + "l4_proto=%d l4_len=%d\n", + l2_len, rte_be_to_cpu_16(ethertype), + l3_len, l4_proto, l4_len); + if (tunnel == 1) + printf("rx: outer_l2_len=%d outer_ethertype=%x " + "outer_l3_len=%d\n", outer_l2_len, + rte_be_to_cpu_16(outer_ethertype), + outer_l3_len); + /* dump tx packet info */ + if ((testpmd_ol_flags & (TESTPMD_TX_OFFLOAD_IP_CKSUM | + TESTPMD_TX_OFFLOAD_UDP_CKSUM | + TESTPMD_TX_OFFLOAD_TCP_CKSUM | + TESTPMD_TX_OFFLOAD_SCTP_CKSUM)) || + tso_segsz != 0) + printf("tx: m->l2_len=%d m->l3_len=%d " + "m->l4_len=%d\n", + m->l2_len, m->l3_len, m->l4_len); + if ((tunnel == 1) && + (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)) + printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n", + m->inner_l2_len, m->inner_l3_len); + if (tso_segsz != 0) + printf("tx: m->tso_segsz=%d\n", m->tso_segsz); + printf("tx: flags="); + for (j = 0; j < sizeof(tx_flags)/sizeof(*tx_flags); j++) { + name = rte_get_tx_ol_flag_name(tx_flags[j].flag); + if ((m->ol_flags & tx_flags[j].mask) == + tx_flags[j].flag) + printf("%s ", name); + } + printf("\n"); + } } nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx); fs->tx_packets += nb_tx; -- 2.1.0
[dpdk-dev] [PATCH 11/12] testpmd: support TSO in csum forward engine
Add two new commands in testpmd: - tso set - tso show These commands can be used enable TSO when transmitting TCP packets in the csum forward engine. Ex: set fwd csum tx_checksum set ip hw 0 tso set 800 0 start Signed-off-by: Olivier Matz --- app/test-pmd/cmdline.c | 92 + app/test-pmd/csumonly.c | 53 +--- app/test-pmd/testpmd.h | 1 + 3 files changed, 133 insertions(+), 13 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 0361e58..5460415 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -318,6 +318,14 @@ static void cmd_help_long_parsed(void *parsed_result, "tx_checksum show (port_id)\n" "Display tx checksum offload configuration\n\n" + "tso set (segsize) (portid)\n" + "Enable TCP Segmentation Offload in csum forward" + " engine.\n" + "Please check the NIC datasheet for HW limits.\n\n" + + "tso show (portid)" + "Display the status of TCP Segmentation Offload.\n\n" + "set fwd (%s)\n" "Set packet forwarding mode.\n\n" @@ -2862,6 +2870,88 @@ cmdline_parse_inst_t cmd_tx_cksum_show = { }, }; +/* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */ +struct cmd_tso_set_result { + cmdline_fixed_string_t tso; + cmdline_fixed_string_t mode; + uint16_t tso_segsz; + uint8_t port_id; +}; + +static void +cmd_tso_set_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_tso_set_result *res = parsed_result; + struct rte_eth_dev_info dev_info; + + if (port_id_is_invalid(res->port_id)) + return; + + if (!strcmp(res->mode, "set")) + ports[res->port_id].tso_segsz = res->tso_segsz; + + if (ports[res->port_id].tso_segsz == 0) + printf("TSO is disabled\n"); + else + printf("TSO segment size is %d\n", + ports[res->port_id].tso_segsz); + + /* display warnings if configuration is not supported by the NIC */ + rte_eth_dev_info_get(res->port_id, &dev_info); + if ((ports[res->port_id].tso_segsz != 0) && + (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) == 0) { + printf("Warning: TSO enabled but not " + "supported by port %d\n", res->port_id); + } +} + +cmdline_parse_token_string_t cmd_tso_set_tso = + TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result, + tso, "tso"); +cmdline_parse_token_string_t cmd_tso_set_mode = + TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result, + mode, "set"); +cmdline_parse_token_num_t cmd_tso_set_tso_segsz = + TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result, + tso_segsz, UINT16); +cmdline_parse_token_num_t cmd_tso_set_portid = + TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result, + port_id, UINT8); + +cmdline_parse_inst_t cmd_tso_set = { + .f = cmd_tso_set_parsed, + .data = NULL, + .help_str = "Set TSO segment size for csum engine (0 to disable): " + "tso set ", + .tokens = { + (void *)&cmd_tso_set_tso, + (void *)&cmd_tso_set_mode, + (void *)&cmd_tso_set_tso_segsz, + (void *)&cmd_tso_set_portid, + NULL, + }, +}; + +cmdline_parse_token_string_t cmd_tso_show_mode = + TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result, + mode, "show"); + + +cmdline_parse_inst_t cmd_tso_show = { + .f = cmd_tso_set_parsed, + .data = NULL, + .help_str = "Show TSO segment size for csum engine: " + "tso show ", + .tokens = { + (void *)&cmd_tso_set_tso, + (void *)&cmd_tso_show_mode, + (void *)&cmd_tso_set_portid, + NULL, + }, +}; + /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */ struct cmd_set_flush_rx { cmdline_fixed_string_t set; @@ -7875,6 +7965,8 @@ cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid, (cmdline_parse_inst_t *)&cmd_tx_cksum_set, (cmdline_parse_inst_t *)&cmd_tx_cksum_show, + (cmdline_parse_inst_t *)&cmd_tso_set, + (cmdline_parse_inst_t *)&cmd_tso_show, (cmdline_parse_inst_t *)&cmd_link_flow_control_set, (cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx, (cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx, diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index abc525c..7995ff5 100644 --- a/app/test-pmd/csumon
[dpdk-dev] [PATCH 01/12] igb/ixgbe: fix IP checksum calculation
According to Intel? 82599 10 GbE Controller Datasheet (Table 7-38), both L2 and L3 lengths are needed to offload the IP checksum. Note that the e1000 driver does not need to be patched as it already contains the fix. Signed-off-by: Olivier Matz Acked-by: Konstantin Ananyev --- lib/librte_pmd_e1000/igb_rxtx.c | 2 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c index f09c525..321493e 100644 --- a/lib/librte_pmd_e1000/igb_rxtx.c +++ b/lib/librte_pmd_e1000/igb_rxtx.c @@ -262,7 +262,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq, if (ol_flags & PKT_TX_IP_CKSUM) { type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4; - cmp_mask |= TX_MAC_LEN_CMP_MASK; + cmp_mask |= TX_MACIP_LEN_CMP_MASK; } /* Specify which HW CTX to upload. */ diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 3a5a8ff..78be7e6 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -374,7 +374,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq, if (ol_flags & PKT_TX_IP_CKSUM) { type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4; - cmp_mask |= TX_MAC_LEN_CMP_MASK; + cmp_mask |= TX_MACIP_LEN_CMP_MASK; } /* Specify which HW CTX to upload. */ -- 2.1.0
[dpdk-dev] [PATCH 09/12] testpmd: fix use of offload flags in testpmd
In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible manners: - sometimes used with testpmd specific flags (0xff for checksums, and bit 11 for vlan) - sometimes assigned to m->ol_flags directly, which is wrong in case of checksum flags This commit replaces the hardcoded values by named definitions, which are not compatible with mbuf flags. The testpmd forward engines are fixed to use the flags properly. Signed-off-by: Olivier Matz --- app/test-pmd/config.c | 4 ++-- app/test-pmd/csumonly.c | 40 +++- app/test-pmd/macfwd.c | 5 - app/test-pmd/macswap.c | 5 - app/test-pmd/testpmd.h | 28 +--- app/test-pmd/txonly.c | 9 ++--- 6 files changed, 60 insertions(+), 31 deletions(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 9bc08f4..4b6fb91 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -1674,7 +1674,7 @@ tx_vlan_set(portid_t port_id, uint16_t vlan_id) return; if (vlan_id_is_invalid(vlan_id)) return; - ports[port_id].tx_ol_flags |= PKT_TX_VLAN_PKT; + ports[port_id].tx_ol_flags |= TESTPMD_TX_OFFLOAD_INSERT_VLAN; ports[port_id].tx_vlan_id = vlan_id; } @@ -1683,7 +1683,7 @@ tx_vlan_reset(portid_t port_id) { if (port_id_is_invalid(port_id)) return; - ports[port_id].tx_ol_flags &= ~PKT_TX_VLAN_PKT; + ports[port_id].tx_ol_flags &= ~TESTPMD_TX_OFFLOAD_INSERT_VLAN; } void diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 8d10bfd..743094a 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -322,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) /* Do not delete, this is required by HW*/ ipv4_hdr->hdr_checksum = 0; - if (tx_ol_flags & 0x1) { + if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) { /* HW checksum */ ol_flags |= PKT_TX_IP_CKSUM; } @@ -336,7 +336,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) if (l4_proto == IPPROTO_UDP) { udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb, unsigned char *) + l2_len + l3_len); - if (tx_ol_flags & 0x2) { + if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) { /* HW Offload */ ol_flags |= PKT_TX_UDP_CKSUM; if (ipv4_tunnel) @@ -358,7 +358,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) uint16_t len; /* Check if inner L3/L4 checkum flag is set */ - if (tx_ol_flags & 0xF0) + if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK) ol_flags |= PKT_TX_VXLAN_CKSUM; inner_l2_len = sizeof(struct ether_hdr); @@ -381,7 +381,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) unsigned char *) + len); inner_l4_proto = inner_ipv4_hdr->next_proto_id; - if (tx_ol_flags & 0x10) { + if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) { /* Do not delete, this is required by HW*/ inner_ipv4_hdr->hdr_checksum = 0; @@ -394,7 +394,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) unsigned char *) + len); inner_l4_proto = inner_ipv6_hdr->proto; } - if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) { + if ((inner_l4_proto == IPPROTO_UDP) && + (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) { /* HW Offload */ ol_flags |= PKT_TX_UDP_CKSUM; @@ -405,7 +406,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs) else if (eth_type == ETHER_TYPE_IPv6) inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr); - } else if ((inner_l4_proto == IPPROTO_TCP) &&
[dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload
Some of the NICs supported by DPDK have a possibility to accelerate TCP traffic by using segmentation offload. The application prepares a packet with valid TCP header with size up to 64K and deleguates the segmentation to the NIC. Implement the generic part of TCP segmentation offload in rte_mbuf. It introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes) and tso_segsz (MSS of packets). To delegate the TCP segmentation to the hardware, the user has to: - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies PKT_TX_TCP_CKSUM) - set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in the packet - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz - calculate the pseudo header checksum and set it in the TCP header, as required when doing hardware TCP checksum offload The API is inspired from ixgbe hardware (the next commit adds the support for ixgbe), but it seems generic enough to be used for other hw/drivers in the future. This commit also reworks the way l2_len and l3_len are used in igb and ixgbe drivers as the l2_l3_len is not available anymore in mbuf. Signed-off-by: Mirek Walukiewicz Signed-off-by: Olivier Matz --- app/test-pmd/testpmd.c| 3 ++- examples/ipv4_multicast/main.c| 3 ++- lib/librte_mbuf/rte_mbuf.h| 44 +++ lib/librte_pmd_e1000/igb_rxtx.c | 11 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +- 5 files changed, 50 insertions(+), 22 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 12adafa..a831e31 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -408,7 +408,8 @@ testpmd_mbuf_ctor(struct rte_mempool *mp, mb->ol_flags = 0; mb->data_off = RTE_PKTMBUF_HEADROOM; mb->nb_segs = 1; - mb->l2_l3_len = 0; + mb->l2_len = 0; + mb->l3_len = 0; mb->vlan_tci = 0; mb->hash.rss = 0; } diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c index de5e6be..a31d43d 100644 --- a/examples/ipv4_multicast/main.c +++ b/examples/ipv4_multicast/main.c @@ -302,7 +302,8 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone) /* copy metadata from source packet*/ hdr->port = pkt->port; hdr->vlan_tci = pkt->vlan_tci; - hdr->l2_l3_len = pkt->l2_l3_len; + hdr->l2_len = pkt->l2_len; + hdr->l3_len = pkt->l3_len; hdr->hash = pkt->hash; hdr->ol_flags = pkt->ol_flags; diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index bcd8996..f76b768 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -126,6 +126,19 @@ extern "C" { #define PKT_TX_VXLAN_CKSUM (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */ +/** + * TCP segmentation offload. To enable this offload feature for a + * packet to be transmitted on hardware supporting TSO: + * - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies + *PKT_TX_TCP_CKSUM) + * - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum + *to 0 in the packet + * - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz + * - calculate the pseudo header checksum and set it in the TCP header, + *as required when doing hardware TCP checksum offload + */ +#define PKT_TX_TCP_SEG (1ULL << 49) + /* Use final bit of flags to indicate a control mbuf */ #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ @@ -185,6 +198,7 @@ static inline const char *rte_get_tx_ol_flag_name(uint64_t mask) case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM"; case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST"; case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM"; + case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG"; default: return NULL; } } @@ -264,22 +278,18 @@ struct rte_mbuf { /* fields to support TX offloads */ union { - uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */ + uint64_t tx_offload; /**< combined for easy fetch */ struct { - uint16_t l3_len:9; /**< L3 (IP) Header Length. */ - uint16_t l2_len:7; /**< L2 (MAC) Header Length. */ - }; - }; + uint64_t l2_len:7; /**< L2 (MAC) Header Length. */ + uint64_t l3_len:9; /**< L3 (IP) Header Length. */ + uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */ + uint64_t tso_segsz:16; /**< TCP TSO segment size */ - /* fields for TX offloading of tunnels */ - union { - uint16_t inner_l2_l3_len; - /**< combined inner l2/l3 lengths as single var */ - struct { - uint16_t inner_l3_len:9; - /**< inne
[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
Hi Jijiang, On 11/10/2014 07:03 AM, Liu, Jijiang wrote: >> Another thing is surprising me. >> >> - if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the >>driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums. > If the flag is not set, and imply that it is not VXLAN packet, > and do TX checksum offload as regular packet. > >> - if PKT_TX_VXLAN_CKSUM is set, then the driver has to use >>inner_l{23}_len instead of l{23}_len for the same operation. > Your understanding is not fully correct. > The l{23}_len is still used for TX checksum offload, please refer to > i40e_txd_enable_checksum() implementation. This fields are part of public mbuf API. You cannot say to refer to i40e PMD code to understand how to use it. >> Adding PKT_TX_VXLAN_CKSUM changes the semantic of l2_len and l3_len. >> To fix this, I suggest to remove the new fields inner_l{23}_len then add >> outer_l{23}_len instead. Therefore, the semantic of l2_len and l3_len would >> not >> change, and a driver would always use the same field for a specific offload. > Oh... Does it mean you agree? >> For my TSO development, I will follow the current semantic. > For TSO, you still can use l{2,3} _len . > When I develop tunneling TSO, I will use inner_l3_len/inner_l4_len. I've just submitted a first version, please feel free to comment it. Regards, Olivier
[dpdk-dev] [PATCH v3] Add in_flight_bitmask so as to use full 32 bits of tag.
On Mon, Nov 10, 2014 at 05:52:26PM +0200, jigsaw wrote: > Hi Bruce, > > Sorry I didn't. I will run a performance test tomorrow. > > thx & > rgds, > -qinglai > On the assumption that no performance regressions show up... Acked-by: Bruce Richardson > On Mon, Nov 10, 2014 at 5:13 PM, Bruce Richardson < > bruce.richardson at intel.com> wrote: > > > On Mon, Nov 10, 2014 at 04:44:02PM +0200, Qinglai Xiao wrote: > > > With introduction of in_flight_bitmask, the whole 32 bits of tag can be > > > used. Further more, this patch fixed the integer overflow when finding > > > the matched tags. > > > The maximum number workers is now defined as 64, which is length of > > > double-word. The link between number of workers and RTE_MAX_LCORE is > > > now removed. Compile time check is added to ensure the > > > RTE_DISTRIB_MAX_WORKERS is less than or equal to size of double-word. > > > > > > Signed-off-by: Qinglai Xiao > > > > Looks good to me. > > Just before I ack this, have you checked to see if there is any > > performance impact? > > > > /Bruce > > > > > --- > > > lib/librte_distributor/rte_distributor.c | 64 > > ++ > > > lib/librte_distributor/rte_distributor.h |4 ++ > > > 2 files changed, 51 insertions(+), 17 deletions(-) > > > > > > diff --git a/lib/librte_distributor/rte_distributor.c > > b/lib/librte_distributor/rte_distributor.c > > > index 3dfec4a..2c5d61c 100644 > > > --- a/lib/librte_distributor/rte_distributor.c > > > +++ b/lib/librte_distributor/rte_distributor.c > > > @@ -62,6 +62,13 @@ > > > #define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1) > > > > > > /** > > > + * Maximum number of workers allowed. > > > + * Be aware of increasing the limit, becaus it is limited by how we > > track > > > + * in-flight tags. See @in_flight_bitmask and @rte_distributor_process > > > + */ > > > +#define RTE_DISTRIB_MAX_WORKERS 64 > > > + > > > +/** > > > * Buffer structure used to pass the pointer data between cores. This > > is cache > > > * line aligned, but to improve performance and prevent adjacent > > cache-line > > > * prefetches of buffers for other workers, e.g. when worker 1's buffer > > is on > > > @@ -91,11 +98,17 @@ struct rte_distributor { > > > char name[RTE_DISTRIBUTOR_NAMESIZE]; /**< Name of the ring. */ > > > unsigned num_workers; /**< Number of workers > > polling */ > > > > > > - uint32_t in_flight_tags[RTE_MAX_LCORE]; > > > - /**< Tracks the tag being processed per core, 0 == no pkt > > */ > > > - struct rte_distributor_backlog backlog[RTE_MAX_LCORE]; > > > + uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS]; > > > + /**< Tracks the tag being processed per core */ > > > + uint64_t in_flight_bitmask; > > > + /**< on/off bits for in-flight tags. > > > + * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 > > then > > > + * the bitmask has to expand. > > > + */ > > > + > > > + struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]; > > > > > > - union rte_distributor_buffer bufs[RTE_MAX_LCORE]; > > > + union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS]; > > > > > > struct rte_distributor_returned_pkts returns; > > > }; > > > @@ -189,6 +202,7 @@ static inline void > > > handle_worker_shutdown(struct rte_distributor *d, unsigned wkr) > > > { > > > d->in_flight_tags[wkr] = 0; > > > + d->in_flight_bitmask &= ~(1UL << wkr); > > > d->bufs[wkr].bufptr64 = 0; > > > if (unlikely(d->backlog[wkr].count != 0)) { > > > /* On return of a packet, we need to move the > > > @@ -211,7 +225,10 @@ handle_worker_shutdown(struct rte_distributor *d, > > unsigned wkr) > > > pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >> > > > RTE_DISTRIB_FLAG_BITS)); > > > } > > > - /* recursive call */ > > > + /* recursive call. > > > + * Note that the tags were set before first level call > > > + * to rte_distributor_process. > > > + */ > > > rte_distributor_process(d, pkts, i); > > > bl->count = bl->start = 0; > > > } > > > @@ -242,6 +259,7 @@ process_returns(struct rte_distributor *d) > > > else { > > > d->bufs[wkr].bufptr64 = > > RTE_DISTRIB_GET_BUF; > > > d->in_flight_tags[wkr] = 0; > > > + d->in_flight_bitmask &= ~(1UL << wkr); > > > } > > > oldbuf = data >> RTE_DISTRIB_FLAG_BITS; > > > } else if (data & RTE_DISTRIB_RETURN_BUF) { > > > @@ -284,14 +302,18 @@ rte_distributor_process(struct rte_distributor *d, > > > next_value = (((int64_t)(uintptr_t)next_mb) > > > << RTE_DISTRIB_FLAG_B
[dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits
On Mon, Nov 10, 2014 at 04:59:16PM +0100, Olivier Matz wrote: > Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the > packet flags are now 64 bits wide. Some occurences were forgotten in > the ixgbe driver. > > Signed-off-by: Olivier Matz Acked-by: Bruce Richardson > --- > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > index 78be7e6..042ee8a 100644 > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > @@ -817,7 +817,7 @@ end_of_tx: > static inline uint64_t > rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs) > { > - uint16_t pkt_flags; > + uint64_t pkt_flags; > > static uint64_t ip_pkt_types_map[16] = { > 0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT, > @@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs) > }; > > #ifdef RTE_LIBRTE_IEEE1588 > - static uint32_t ip_pkt_etqf_map[8] = { > + static uint64_t ip_pkt_etqf_map[8] = { > 0, 0, 0, PKT_RX_IEEE1588_PTP, > 0, 0, 0, 0, > }; > @@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq) > struct igb_rx_entry *rxep; > struct rte_mbuf *mb; > uint16_t pkt_len; > - uint16_t pkt_flags; > + uint64_t pkt_flags; > int s[LOOK_AHEAD], nb_dd; > int i, j, nb_rx = 0; > > @@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct > rte_mbuf **rx_pkts, > uint16_t nb_rx; > uint16_t nb_hold; > uint16_t data_len; > - uint16_t pkt_flags; > + uint64_t pkt_flags; > > nb_rx = 0; > nb_hold = 0; > @@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct > rte_mbuf **rx_pkts, > first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); > hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); > pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss); > - pkt_flags = (uint16_t)(pkt_flags | > + pkt_flags = (pkt_flags | > rx_desc_status_to_pkt_flags(staterr)); > - pkt_flags = (uint16_t)(pkt_flags | > + pkt_flags = (pkt_flags | > rx_desc_error_to_pkt_flags(staterr)); > first_seg->ol_flags = pkt_flags; > > -- > 2.1.0 >
[dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place
On Mon, Nov 10, 2014 at 04:59:17PM +0100, Olivier Matz wrote: > The tx mbuf flags are ordered from the highest value to the > the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place. > > Signed-off-by: Olivier Matz Acked-by: Bruce Richardson > --- > lib/librte_mbuf/rte_mbuf.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index e8f9bfc..be15168 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -96,7 +96,6 @@ extern "C" { > > #define PKT_TX_VLAN_PKT (1ULL << 55) /**< TX packet is a 802.1q VLAN > packet. */ > #define PKT_TX_IP_CKSUM (1ULL << 54) /**< IP cksum of TX pkt. computed > by NIC. */ > -#define PKT_TX_VXLAN_CKSUM (1ULL << 50) /**< TX checksum of VXLAN computed > by NIC */ > #define PKT_TX_IPV4_CSUM PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. > */ > #define PKT_TX_IPV4 PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum > offload. */ > #define PKT_TX_IPV6 PKT_RX_IPV6_HDR /**< IPv6 packet */ > @@ -114,9 +113,10 @@ extern "C" { > #define PKT_TX_UDP_CKSUM (3ULL << 52) /**< UDP cksum of TX pkt. computed > by NIC. */ > #define PKT_TX_L4_MASK (3ULL << 52) /**< Mask for L4 cksum offload > request. */ > > -/* Bit 51 - IEEE1588*/ > #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to > timestamp. */ > > +#define PKT_TX_VXLAN_CKSUM (1ULL << 50) /**< TX checksum of VXLAN computed > by NIC */ > + > /* Use final bit of flags to indicate a control mbuf */ > #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ > > -- > 2.1.0 >
[dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags
On Mon, Nov 10, 2014 at 04:59:18PM +0100, Olivier Matz wrote: > Describe how to use hardware checksum API. > > Signed-off-by: Olivier Matz Acked-by: Bruce Richardson > --- > lib/librte_mbuf/rte_mbuf.h | 25 + > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index be15168..96e322b 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -95,19 +95,28 @@ extern "C" { > #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 > header. */ > > #define PKT_TX_VLAN_PKT (1ULL << 55) /**< TX packet is a 802.1q VLAN > packet. */ > -#define PKT_TX_IP_CKSUM (1ULL << 54) /**< IP cksum of TX pkt. computed > by NIC. */ > + > +/** > + * Enable hardware computation of IP cksum. To use it: > + * - fill l2_len and l3_len in mbuf > + * - set the flags PKT_TX_IP_CKSUM > + * - set the ip checksum to 0 in IP header > + */ > +#define PKT_TX_IP_CKSUM (1ULL << 54) > #define PKT_TX_IPV4_CSUM PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. > */ > #define PKT_TX_IPV4 PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum > offload. */ > #define PKT_TX_IPV6 PKT_RX_IPV6_HDR /**< IPv6 packet */ > > -/* > - * Bits 52+53 used for L4 packet type with checksum enabled. > - * 00: Reserved > - * 01: TCP checksum > - * 10: SCTP checksum > - * 11: UDP checksum > +/** > + * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved, > + * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware > + * L4 checksum offload, the user needs to: > + * - fill l2_len and l3_len in mbuf > + * - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM > + * - calculate the pseudo header checksum and set it in the L4 header (only > + *for TCP or UDP). For SCTP, set the crc field to 0. > */ > -#define PKT_TX_L4_NO_CKSUM (0ULL << 52) /**< Disable L4 cksum of TX pkt. */ > +#define PKT_TX_L4_NO_CKSUM (0ULL << 52) /* Disable L4 cksum of TX pkt. */ > #define PKT_TX_TCP_CKSUM (1ULL << 52) /**< TCP cksum of TX pkt. computed > by NIC. */ > #define PKT_TX_SCTP_CKSUM(2ULL << 52) /**< SCTP cksum of TX pkt. > computed by NIC. */ > #define PKT_TX_UDP_CKSUM (3ULL << 52) /**< UDP cksum of TX pkt. computed > by NIC. */ > -- > 2.1.0 >
[dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
On Mon, Nov 10, 2014 at 04:59:19PM +0100, Olivier Matz wrote: > This definition is specific to Intel PMD drivers and its definition > "indicate what bits required for building TX context" shows that it > should not be in the generic rte_mbuf.h but in the PMD driver. > > Signed-off-by: Olivier Matz > --- > lib/librte_mbuf/rte_mbuf.h| 5 - > lib/librte_pmd_e1000/igb_rxtx.c | 3 ++- > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ++- > 3 files changed, 4 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 96e322b..ff11b84 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -129,11 +129,6 @@ extern "C" { > /* Use final bit of flags to indicate a control mbuf */ > #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ > > -/** > - * Bit Mask to indicate what bits required for building TX context > - */ > -#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | > PKT_TX_L4_MASK) > - > /* define a set of marker types that can be used to refer to set points in > the > * mbuf */ > typedef void*MARKER[0]; /**< generic marker for a point in a structure > */ > diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c > index 321493e..dbf5074 100644 > --- a/lib/librte_pmd_e1000/igb_rxtx.c > +++ b/lib/librte_pmd_e1000/igb_rxtx.c > @@ -400,7 +400,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf > **tx_pkts, > ol_flags = tx_pkt->ol_flags; > vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; > vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; > - tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK; > + tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | > + PKT_TX_L4_MASK); > Rather than make the change like this, might it be clearer just to copy-paste the macro definition into this file (perhaps as IGB_TX_OFFLOAD_MASK). Similarly with ixgbe below? /Bruce > /* If a Context Descriptor need be built . */ > if (tx_ol_req) { > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > index 042ee8a..70ca254 100644 > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > @@ -580,7 +580,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, > ol_flags = tx_pkt->ol_flags; > > /* If hardware offload required */ > - tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK; > + tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | > + PKT_TX_L4_MASK); > if (tx_ol_req) { > vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; > vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; > -- > 2.1.0 >
[dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
On Mon, Nov 10, 2014 at 04:59:20PM +0100, Olivier Matz wrote: > In test-pmd (rxonly.c), the code is able to dump the list of ol_flags. > The issue is that the list of flags in the application has to be > synchronized with the flags defined in rte_mbuf.h. > > This patch introduces 2 new functions rte_get_rx_ol_flag_name() > and rte_get_tx_ol_flag_name() that returns the name of a flag from > its mask. It also fixes rxonly.c to use this new functions and to > display the proper flags. Good idea. Couple of minor comments below. /Bruce > > Signed-off-by: Olivier Matz > --- > app/test-pmd/rxonly.c | 36 > lib/librte_mbuf/rte_mbuf.h | 60 > ++ > 2 files changed, 70 insertions(+), 26 deletions(-) > > diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c > index 4410c3d..e7cd7e2 100644 > --- a/app/test-pmd/rxonly.c > +++ b/app/test-pmd/rxonly.c > @@ -71,26 +71,6 @@ > > #include "testpmd.h" > > -#define MAX_PKT_RX_FLAGS 13 > -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = { > - "VLAN_PKT", > - "RSS_HASH", > - "PKT_RX_FDIR", > - "IP_CKSUM", > - "IP_CKSUM_BAD", > - > - "IPV4_HDR", > - "IPV4_HDR_EXT", > - "IPV6_HDR", > - "IPV6_HDR_EXT", > - > - "IEEE1588_PTP", > - "IEEE1588_TMST", > - > - "TUNNEL_IPV4_HDR", > - "TUNNEL_IPV6_HDR", > -}; > - > static inline void > print_ether_addr(const char *what, struct ether_addr *eth_addr) > { > @@ -219,12 +199,16 @@ pkt_burst_receive(struct fwd_stream *fs) > printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue); > printf("\n"); > if (ol_flags != 0) { > - int rxf; > - > - for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) { > - if (ol_flags & (1 << rxf)) > - printf(" PKT_RX_%s\n", > -pkt_rx_flag_names[rxf]); > + unsigned rxf; > + const char *name; > + > + for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) { > + if ((ol_flags & (1ULL << rxf)) == 0) > + continue; > + name = rte_get_rx_ol_flag_name(1ULL << rxf); > + if (name == NULL) > + continue; > + printf(" %s\n", name); > } > } > rte_pktmbuf_free(mb); > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index ff11b84..bcd8996 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -129,6 +129,66 @@ extern "C" { > /* Use final bit of flags to indicate a control mbuf */ > #define CTRL_MBUF_FLAG (1ULL << 63) /**< Mbuf contains control data */ > > +/** > + * Bit Mask to indicate what bits required for building TX context I don't understand this first line - is it accidentally included? > + * Get the name of a RX offload flag > + * > + * @param mask > + * The mask describing the flag. Usually only one bit must be set. > + * Several bits can be given if they belong to the same mask. > + * Ex: PKT_TX_L4_MASK. TX mask given as an example for a function for RX flags is confusing. > + * @return > + * The name of this flag, or NULL if it's not a valid RX flag. > + */ > +static inline const char *rte_get_rx_ol_flag_name(uint64_t mask) > +{ > + switch (mask) { > + case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT"; > + case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH"; > + case PKT_RX_FDIR: return "PKT_RX_FDIR"; > + case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD"; > + case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD"; > + /* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */ > + /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */ > + /* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */ > + /* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */ > + /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */ > + case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR"; > + case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT"; > + case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR"; > + case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT"; > + case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP"; > + case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST"; > + case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR"; > + case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR"; > + default: return NULL; > + } > +} > + > +/** > + * Get the name of a TX offload flag > + * > + * @param mask > + * The mask describing the flag. Usually only one bit must be set. > + * Several bits can be given if they belong to the same mask. > + * Ex: PKT_TX_L4
[dpdk-dev] [PATCH v4 00/10] VM Power Management
> From: Carew, Alan > > > Did you make any progress in Qemu/KVM community? > > We need to be sync'ed up with them to be sure we share the same goal. > > I want also to avoid using a solution which doesn't fit with their plan. > > Remember that we already had this problem with ivshmem which was > > planned to be dropped. > > . . . > > Unfortunately, I have not yet received any feedback: > http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html Just to add to what Alan said above, this capability does not exist in qemu at the moment, and based on there having been no feedback on the qemu mailing list so far, I think it's reasonable to assume that it will not be implemented in the immediate future. The VM Power Management feature has also been designed to allow easy migration to a qemu-based solution when this is supported in future. Therefore, I'd be in favour of accepting this feature into DPDK now. It's true that the implementation is a work-around, but there have been similar cases in DPDK in the past. One recent example that comes to mind is userspace vhost. The original implementation could also be considered a work-around, but it met the needs of many in the community. Now, with support for vhost-user in qemu 2.1, that implementation is being improved. I'd see VM Power Management following a similar path when this capability is supported in qemu. Tim
[dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
Hi Bruce, Thank you for the review. On 11/10/2014 06:29 PM, Bruce Richardson wrote: >> +/** >> + * Bit Mask to indicate what bits required for building TX context > > I don't understand this first line - is it accidentally included? Right, it's a mistake, I'll remove this line. >> + * Get the name of a RX offload flag >> + * >> + * @param mask >> + * The mask describing the flag. Usually only one bit must be set. >> + * Several bits can be given if they belong to the same mask. >> + * Ex: PKT_TX_L4_MASK. > TX mask given as an example for a function for RX flags is confusing. I'll remove the last two lines of the description as there is no example for RX flags. Regards, Olivier
[dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
Hi Bruce, On 11/10/2014 06:14 PM, Bruce Richardson wrote: >> --- a/lib/librte_pmd_e1000/igb_rxtx.c >> +++ b/lib/librte_pmd_e1000/igb_rxtx.c >> @@ -400,7 +400,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf >> **tx_pkts, >> ol_flags = tx_pkt->ol_flags; >> vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; >> vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; >> -tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK; >> +tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | >> +PKT_TX_L4_MASK); >> > > Rather than make the change like this, might it be clearer just to copy-paste > the macro definition into this file (perhaps as IGB_TX_OFFLOAD_MASK). > Similarly > with ixgbe below? As this definition was used only once per PMD, I thought it was clearer to remove the definition. But... someone did the same comment than you internally, so I'll change it in next version! Regards, Olivier
[dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD
Rashmin, Since I do need the jumbo, I use the vmxnet3-plugin you described, i.e. (1) sudo insmod ./vmxnet3-usermap.ko enable_shm=2,2 num_rqs=1,1 num_rxds=2048 num_txds=2048 and (2) when running the application, use in the args list: "-d", "librte_pmd_vmxnet3.so" Does the above two piece mean vmxnet3-plugin I do see my vmxnet3 device from the dump,rte_eal_pci_dump(); but the 'nb_ports' in DPDK never gets incremented rte_eth_dev_count() returns zero. so all the other api fails, if (port_id >= nb_ports) { PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); return; } :03:00.0 - vendor:15ad device:7b0 d2404000 1000 d2403000 1000 d240 2000 d440 0001 :0b:00.0 - vendor:15ad device:7b0 d2504000 1000 d2503000 1000 d250 2000 d450 0001 DPDK: No Ethernet ports (rte_eth_dev_count() returns zero) PMD: rte_eth_dev_info_get: Invalid port_id=0 PMD: rte_eth_dev_configure: Invalid port_id=0 PMD: rte_eth_dev_info_get: Invalid port_id=0 PMD: rte_eth_dev_configure: Invalid port_id=0 So when using not using DPDK PMD for VMXNET3, what am i missing, for the the DPDK to know the nb_ports, How will the rte_eth_dev_start(portid) in DPDK library know ? rte_pmd_init_all() will not have the init the Intel DPDK PMD , RTE_LIBRTE_VMXNET3_PMD = n in config. #ifdef RTE_LIBRTE_VMXNET3_PMD if ((ret = rte_vmxnet3_pmd_init()) != 0) { RTE_LOG(ERR, PMD, "Cannot init vmxnet3 PMD\n"); return (ret); } If I make RTE_LIBRTE_VMXNET3_PMD = y, then I am using the Intel DPDK PMD and no jumbo. Thanks, aziz On Fri, Nov 7, 2014 at 8:53 AM, Patel, Rashmin N wrote: > Hi Aziz, > > Yes, you're right DPDK VMXNET3-PMD in /lib/librte_pmd_vmxnet3 does not > support mbuf chaining today. But it's a standalone bsd driver just like any > other pmd in that directory, it does not need vmxnet3-usermap.ko module. > > Now there is another vmxnet3 solution in a separate branch as a plugin, > which must have vmxnet3-usermap.ko linux module(1), and a user space > interface piece(2) to tie it to any DPDK application in the main branch. > (1) and (2) makes the solution which is known as vmxnet3-plugin. It's been > there for a long time just like virtio-plugin, I don't know who uses it, > but community can *reply* here if there is still any need of a separate > solution that way. > > I'm in favor of consolidating all those version into one elegant solution > by grabbing best features from all of them and maintain one copy. I'm sure > that developers contributing from VMware would also support that idea > because then it makes easy to maintain and debug and bug fix and most > importantly avoid such confusion in future. > > Thanks, > Rashmin > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Aziz Hajee > Sent: Thursday, November 06, 2014 5:47 PM > To: dev at dpdk.org > Subject: [dpdk-dev] vmware vmxnet3-usermap AND DPDK VMXNET3 PMD > > I am using the dpdk1.6.0r1 > I could not find a complete clarification, sorry if missed. > VMXNET3 PMD > > I have enabled the VMXNET3 PMD in the dpdk. > # Compile burst-oriented VMXNET3 PMD driver # > > CONFIG_RTE_LIBRTE_VMXNET3_PMD=y > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_INIT=y > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_RX=n > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX=n > CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n > The Intel DPDK VMXNET3 PMD driver does not support mbuf chaining, and I > have to set NOMULTSEGS for the vmxnet3 interface init to succeed. > tx_conf.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS Is there a later version of > DPDK that supports multiseg for the dpdk > VMXNET3 PMD. > > vmware vmxnet3-usermap AND DPDK VMXNET3 PMD > = > Is the vmxnet3-usermap.ko module driver also needed ? (appears that I > need, otherwise the eal initialise fails. > sudo insmod ./vmxnet3-usermap.ko enable_shm=2,2 num_rqs=1,1 num_rxds=2048 > num_txds=2048 > > I do not understand if VMXNET3 PMD is there, what is the purpose of > /vmxnet3-usermap.ko/vmxnet3-usermap.ko ? > > From some responses i saw that the following ifdef RTE_EAL_UNBIND_PORTS is > also need to be removed in lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c ? > { > .name = "rte_vmxnet3_pmd", > .id_table = pci_id_vmxnet3_map, -#ifdef RTE_EAL_UNBIND_PORTS > +// #ifdef RTE_EAL_UNBIND_PORTS > .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO, -#endif > +// #endif > }, > .eth_dev_init = eth_vmxnet3_dev_init, > .dev_private_size = sizeof(struct vmxnet3_adapter), > > thanks, > -aziz >
[dpdk-dev] building shared library
On Mon, Nov 10, 2014 at 03:22:40PM +0100, Newman Poborsky wrote: > is it possible to build a dpdk app as a shared library? Yes it will work, with a bit of performance loss from the .so symbol lookup overhead. You have to set some of the build config options to get it to work though. Matthew.