[dpdk-dev] [PATCH 07/12] pmd/ixgbe: add dev_ptype_info_get implementation
On 1/5/2016 2:12 AM, Ananyev, Konstantin wrote: > >> -Original Message- >> From: Tan, Jianfeng >> Sent: Thursday, December 31, 2015 6:53 AM >> To: dev at dpdk.org >> Cc: Zhang, Helin; Ananyev, Konstantin; Tan, Jianfeng >> Subject: [PATCH 07/12] pmd/ixgbe: add dev_ptype_info_get implementation >> >> Signed-off-by: Jianfeng Tan >> --- >> drivers/net/ixgbe/ixgbe_ethdev.c | 50 >> >> drivers/net/ixgbe/ixgbe_ethdev.h | 2 ++ >> drivers/net/ixgbe/ixgbe_rxtx.c | 5 +++- >> 3 files changed, 56 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >> b/drivers/net/ixgbe/ixgbe_ethdev.c >> index 4c4c6df..de5c3a9 100644 >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >> @@ -166,6 +166,8 @@ static int ixgbe_dev_queue_stats_mapping_set(struct >> rte_eth_dev *eth_dev, >> uint8_t is_rx); >> static void ixgbe_dev_info_get(struct rte_eth_dev *dev, >> struct rte_eth_dev_info *dev_info); >> +static int ixgbe_dev_ptype_info_get(struct rte_eth_dev *dev, >> +uint32_t ptype_mask, uint32_t ptypes[]); >> static void ixgbevf_dev_info_get(struct rte_eth_dev *dev, >> struct rte_eth_dev_info *dev_info); >> static int ixgbe_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu); >> @@ -428,6 +430,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { >> .xstats_reset = ixgbe_dev_xstats_reset, >> .queue_stats_mapping_set = ixgbe_dev_queue_stats_mapping_set, >> .dev_infos_get= ixgbe_dev_info_get, >> +.dev_ptype_info_get = ixgbe_dev_ptype_info_get, >> .mtu_set = ixgbe_dev_mtu_set, >> .vlan_filter_set = ixgbe_vlan_filter_set, >> .vlan_tpid_set= ixgbe_vlan_tpid_set, >> @@ -512,6 +515,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = { >> .xstats_reset = ixgbevf_dev_stats_reset, >> .dev_close= ixgbevf_dev_close, >> .dev_infos_get= ixgbevf_dev_info_get, >> +.dev_ptype_info_get = ixgbe_dev_ptype_info_get, >> .mtu_set = ixgbevf_dev_set_mtu, >> .vlan_filter_set = ixgbevf_vlan_filter_set, >> .vlan_strip_queue_set = ixgbevf_vlan_strip_queue_set, >> @@ -2829,6 +2833,52 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct >> rte_eth_dev_info *dev_info) >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; >> } >> >> +static int >> +ixgbe_dev_ptype_info_get(struct rte_eth_dev *dev, uint32_t ptype_mask, >> +uint32_t ptypes[]) >> +{ >> +int num = 0; >> + >> +if ((dev->rx_pkt_burst == ixgbe_recv_pkts) >> +|| (dev->rx_pkt_burst == >> ixgbe_recv_pkts_lro_single_alloc) >> +|| (dev->rx_pkt_burst == ixgbe_recv_pkts_lro_bulk_alloc) >> +|| (dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc) >> + ) { > > As I remember vector RX for ixgbe sets up packet_type properly too. Hi Konstantin, Yes, Helin also reminds me about that. Going to add it in next version. Thanks, Jianfeng > >> +/* refers to ixgbe_rxd_pkt_info_to_pkt_type() */ >> +if ((ptype_mask & RTE_PTYPE_L2_MASK) == RTE_PTYPE_L2_MASK) >> +ptypes[num++] = RTE_PTYPE_L2_ETHER; >> + >> +if ((ptype_mask & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_MASK) { >> +ptypes[num++] = RTE_PTYPE_L3_IPV4; >> +ptypes[num++] = RTE_PTYPE_L3_IPV4_EXT; >> +ptypes[num++] = RTE_PTYPE_L3_IPV6; >> +ptypes[num++] = RTE_PTYPE_L3_IPV6_EXT; >> +} >> + >> +if ((ptype_mask & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_MASK) { >> +ptypes[num++] = RTE_PTYPE_L4_SCTP; >> +ptypes[num++] = RTE_PTYPE_L4_TCP; >> +ptypes[num++] = RTE_PTYPE_L4_UDP; >> +} >> + >> +if ((ptype_mask & RTE_PTYPE_TUNNEL_MASK) == >> RTE_PTYPE_TUNNEL_MASK) >> +ptypes[num++] = RTE_PTYPE_TUNNEL_IP; >> + >> +if ((ptype_mask & RTE_PTYPE_INNER_L3_MASK) == >> RTE_PTYPE_INNER_L3_MASK) { >> +ptypes[num++] = RTE_PTYPE_INNER_L3_IPV6; >> +ptypes[num++] = RTE_PTYPE_INNER_L3_IPV6_EXT; >> +} >> + >> +if ((ptype_mask & RTE_PTYPE_INNER_L4_MASK) == >> RTE_PTYPE_INNER_L4_MASK) { >> +ptypes[num++] = RTE_PTYPE_INNER_L4_TCP; >> +ptypes[num++] = RTE_PTYPE_INNER_L4_UDP; >> +} >> +} else >> +num = -ENOTSUP; >> + >> +return num; >> +} >> + >> static void >> ixgbevf_dev_info_get(struct rte_eth_dev *dev, >> struct rte_eth_dev_info *dev_info) >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h >> b/drivers/net/ixgbe/ixgbe_ethdev.h >> index d26771a..2479830 100644 >> --- a/drivers/ne
[dpdk-dev] [PATCH v2 4/4] virtio: check if any kernel driver is manipulating the virtio device
On Mon, Jan 04, 2016 at 05:56:49PM +, Xie, Huawei wrote: > On 1/5/2016 1:24 AM, Stephen Hemminger wrote: > > On Mon, 4 Jan 2016 01:56:13 +0800 > > Huawei Xie wrote: > > > >> + if (pci_dev->kdrv != RTE_KDRV_NONE) { > >> + PMD_INIT_LOG(INFO, > >> + "kernel driver is manipulating this device." \ > >> + " Please unbind the kernel driver."); > > Splitting strings in general is a bad idea since it makes it harder to find > > log messages. > > Also the first clause is lower case and the second is captialized. > Got it. This is to avoid 80 char warning. Will put it in one line to > make it friendly for searching. I agree with Stephen that _in general_ it's a bad idea. But for this case, I think it's okay, as it'd be enough to locate the code by searching "manipulating this device", or "unbind the kernel driver", or other combinations. I mean, nobody would try searching with: "kernel driver is manipulating this device. Please unbind the kernel driver." Right? --yliu > The first clause is lower is because it actually follows "%s():". > > > > Lastly, the backslash continuation is unnecessary here and will cause > > checkpatch warning. > > >
[dpdk-dev] [PATCH] fix checkpatch errors
> -Original Message- > From: Xie, Huawei > Sent: Monday, January 4, 2016 9:52 AM > To: dev at dpdk.org > Cc: Mcnamara, John; Tan, Jianfeng; Xie, Huawei > Subject: [PATCH] fix checkpatch errors > > Signed-off-by: Huawei Xie ... > mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name)); > - return (rte_mempool_lookup((const char *)pool_name)); > + return rte_mempool_lookup((const char *)pool_name); Hi Huawei, Assume this patch is to solve below error (reported by checkpatch): ERROR: return is not a function, parentheses are not required So maybe above fix is not necessary? Involve more people to discuss. And please include the error message in the commit message. Thanks, Jianfeng
[dpdk-dev] [PATCH] fix checkpatch errors
On Tue, Jan 05, 2016 at 02:21:12AM +, Tan, Jianfeng wrote: > > > > -Original Message- > > From: Xie, Huawei > > Sent: Monday, January 4, 2016 9:52 AM > > To: dev at dpdk.org > > Cc: Mcnamara, John; Tan, Jianfeng; Xie, Huawei > > Subject: [PATCH] fix checkpatch errors > > > > Signed-off-by: Huawei Xie > ... > > mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name)); > > - return (rte_mempool_lookup((const char *)pool_name)); > > + return rte_mempool_lookup((const char *)pool_name); > > Hi Huawei, > > Assume this patch is to solve below error (reported by checkpatch): > ERROR: return is not a function, parentheses are not required > > So maybe above fix is not necessary? Involve more people to discuss. This fix is good to me. > And please include the error message in the commit message. +1 --yliu
[dpdk-dev] [PATCH] fix checkpatch errors
On 1/5/2016 10:21 AM, Tan, Jianfeng wrote: > >> -Original Message- >> From: Xie, Huawei >> Sent: Monday, January 4, 2016 9:52 AM >> To: dev at dpdk.org >> Cc: Mcnamara, John; Tan, Jianfeng; Xie, Huawei >> Subject: [PATCH] fix checkpatch errors >> >> Signed-off-by: Huawei Xie > ... >> mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name)); >> -return (rte_mempool_lookup((const char *)pool_name)); >> +return rte_mempool_lookup((const char *)pool_name); > Hi Huawei, > > Assume this patch is to solve below error (reported by checkpatch): > ERROR: return is not a function, parentheses are not required > > So maybe above fix is not necessary? Involve more people to discuss. Yes, Almost all of the 800 errors are check patch errors. The parentheses for some logic expressions, like return val == 0, return function, are also removed. At least in this patch, they are not needed. > > And please include the error message in the commit message. > > Thanks, > Jianfeng > >
[dpdk-dev] [PATCH 12/12] examples/l3fwd: add option to parse ptype
> -Original Message- > From: Ananyev, Konstantin > Sent: Tuesday, January 5, 2016 2:32 AM > To: Tan, Jianfeng; dev at dpdk.org > Cc: Zhang, Helin > Subject: RE: [PATCH 12/12] examples/l3fwd: add option to parse ptype > > > Hi Jianfeng, > > -Original Message- > > From: Tan, Jianfeng > > Sent: Thursday, December 31, 2015 6:53 AM > > To: dev at dpdk.org > > Cc: Zhang, Helin; Ananyev, Konstantin; Tan, Jianfeng > > Subject: [PATCH 12/12] examples/l3fwd: add option to parse ptype > > > > Firstly, use rte_eth_dev_get_ptype_info() API to check if device will > > parse needed packet type. If not, specifying the newly added option, > > --parse-ptype to do it in the callback softly. > > > > Signed-off-by: Jianfeng Tan > > --- > > examples/l3fwd/main.c | 86 > +++ > > 1 file changed, 86 insertions(+) > > > > diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c > > index 5b0c2dd..ccbdce3 100644 > > --- a/examples/l3fwd/main.c > > +++ b/examples/l3fwd/main.c > > @@ -174,6 +174,7 @@ static __m128i val_eth[RTE_MAX_ETHPORTS]; > > static uint32_t enabled_port_mask = 0; > > static int promiscuous_on = 0; /**< Ports set in promiscuous mode off by > default. */ > > static int numa_on = 1; /**< NUMA is enabled by default. */ > > +static int parse_ptype = 0; /**< parse packet type using rx callback */ > > > > #if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH) > > static int ipv6 = 0; /**< ipv6 is false by default. */ > > @@ -2022,6 +2023,7 @@ parse_eth_dest(const char *optarg) > > #define CMD_LINE_OPT_IPV6 "ipv6" > > #define CMD_LINE_OPT_ENABLE_JUMBO "enable-jumbo" > > #define CMD_LINE_OPT_HASH_ENTRY_NUM "hash-entry-num" > > +#define CMD_LINE_OPT_PARSE_PTYPE "parse-ptype" > > > > /* Parse the argument given in the command line of the application */ > > static int > > @@ -2038,6 +2040,7 @@ parse_args(int argc, char **argv) > > {CMD_LINE_OPT_IPV6, 0, 0, 0}, > > {CMD_LINE_OPT_ENABLE_JUMBO, 0, 0, 0}, > > {CMD_LINE_OPT_HASH_ENTRY_NUM, 1, 0, 0}, > > + {CMD_LINE_OPT_PARSE_PTYPE, 0, 0, 0}, > > {NULL, 0, 0, 0} > > }; > > > > @@ -2125,6 +2128,12 @@ parse_args(int argc, char **argv) > > } > > } > > #endif > > + if (!strncmp(lgopts[option_index].name, > CMD_LINE_OPT_PARSE_PTYPE, > > + sizeof(CMD_LINE_OPT_PARSE_PTYPE))) { > > + printf("soft parse-ptype is enabled \n"); > > + parse_ptype = 1; > > + } > > + > > break; > > > > default: > > @@ -2559,6 +2568,75 @@ check_all_ports_link_status(uint8_t port_num, > uint32_t port_mask) > > } > > } > > > > +static int > > +check_packet_type_ok(int portid) > > +{ > > + int i; > > + int ret; > > + uint32_t ptypes[RTE_PTYPE_L3_MAX_NUM]; > > + int ptype_l3_ipv4 = 0, ptype_l3_ipv6 = 0; > > + > > + ret = rte_eth_dev_get_ptype_info(portid, RTE_PTYPE_L3_MASK, > ptypes); > > + for (i = 0; i < ret; ++i) { > > + if (ptypes[i] & RTE_PTYPE_L3_IPV4) > > + ptype_l3_ipv4 = 1; > > + if (ptypes[i] & RTE_PTYPE_L3_IPV6) > > + ptype_l3_ipv6 = 1; > > + } > > + > > + if (ptype_l3_ipv4 == 0) > > + printf("port %d cannot parse RTE_PTYPE_L3_IPV4\n", portid); > > + > > + if (ptype_l3_ipv6 == 0) > > + printf("port %d cannot parse RTE_PTYPE_L3_IPV6\n", portid); > > + > > + if (ptype_l3_ipv4 || ptype_l3_ipv6) > > + return 1; > > + > > + return 0; > > +} > > +static inline void > > +parse_packet_type(struct rte_mbuf *m) > > +{ > > + struct ether_hdr *eth_hdr; > > + struct vlan_hdr *vlan_hdr; > > + uint32_t packet_type = 0; > > + uint16_t ethertype; > > + > > + eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); > > + ethertype = rte_be_to_cpu_16(eth_hdr->ether_type); > > + if (ethertype == ETHER_TYPE_VLAN) { > > I don't think either LPM or EM support packets with VLAN right now. > So, probably there is no need to support it here. Good to know. Will remove it. > > > + vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1); > > + ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto); > > + } > > + switch (ethertype) { > > + case ETHER_TYPE_IPv4: > > + packet_type |= RTE_PTYPE_L3_IPV4_EXT_UNKNOWN; > > + break; > > + case ETHER_TYPE_IPv6: > > + packet_type |= RTE_PTYPE_L3_IPV6_EXT_UNKNOWN; > > + break; > > + default: > > + break; > > + } > > + > > + m->packet_type = packet_type; > > Probably: > m->packet_type |= packet_type; > in case HW supports some other packet types. I agree. Will fix it. > > > +} > > + > > +static uint16_t > > +cb_parse_packet_type(uint8_t port __rte_unused, > > + uint16_t queue __rte_unused, > > + struct rte_mbuf *pkts[], > > + uint16_t nb_pkts, > >
[dpdk-dev] [PATCH 08/12] pmd/mlx4: add dev_ptype_info_get implementation
On 1/4/2016 7:11 PM, Adrien Mazarguil wrote: > Hi Jianfeng, > > I'm only commenting the mlx4/mlx5 bits in this message, see below. > > On Thu, Dec 31, 2015 at 02:53:15PM +0800, Jianfeng Tan wrote: >> Signed-off-by: Jianfeng Tan >> --- >> drivers/net/mlx4/mlx4.c | 27 +++ >> 1 file changed, 27 insertions(+) >> >> diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c >> index 207bfe2..85afa32 100644 >> --- a/drivers/net/mlx4/mlx4.c >> +++ b/drivers/net/mlx4/mlx4.c >> @@ -2836,6 +2836,8 @@ rxq_cleanup(struct rxq *rxq) >>* @param flags >>* RX completion flags returned by poll_length_flags(). >>* >> + * @note: fix mlx4_dev_ptype_info_get() if any change here. >> + * >>* @return >>* Packet type for struct rte_mbuf. >>*/ >> @@ -4268,6 +4270,30 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct >> rte_eth_dev_info *info) >> priv_unlock(priv); >> } >> >> +static int >> +mlx4_dev_ptype_info_get(struct rte_eth_dev *dev, uint32_t ptype_mask, >> +uint32_t ptypes[]) >> +{ >> +int num = 0; >> + >> +if ((dev->rx_pkt_burst == mlx4_rx_burst) >> +|| (dev->rx_pkt_burst == mlx4_rx_burst_sp)) { >> +/* refers to rxq_cq_to_pkt_type() */ >> +if ((ptype_mask & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_MASK) { >> +ptypes[num++] = RTE_PTYPE_L3_IPV4; >> +ptypes[num++] = RTE_PTYPE_L3_IPV6; >> +} >> + >> +if ((ptype_mask & RTE_PTYPE_INNER_L3_MASK) == >> RTE_PTYPE_INNER_L3_MASK) { >> +ptypes[num++] = RTE_PTYPE_INNER_L3_IPV4; >> +ptypes[num++] = RTE_PTYPE_INNER_L3_IPV6; >> +} >> +} else >> +num = -ENOTSUP; >> + >> +return num; >> +} > I think checking for mlx4_rx_burst and mlx4_rx_burst_sp is unnecessary at > the moment, all RX burst functions do update the packet_type field, no need > for extra complexity. > > Same comment for mlx5. Hi Mazarguil, My original thought is that rx_pkt_burst could be also set as removed_rx_burst, which does not make sense indeed because it's only possible when the device is closed. Another consideration is to keep same style with other devices. Each kind of device could have several rx burst functions. So current implementation can keep extensibility to add new rx burst functions. How do you think of it? Thanks, Jianfeng > >> + >> /** >>* DPDK callback to get device statistics. >>* >> @@ -4989,6 +5015,7 @@ static const struct eth_dev_ops mlx4_dev_ops = { >> .stats_reset = mlx4_stats_reset, >> .queue_stats_mapping_set = NULL, >> .dev_infos_get = mlx4_dev_infos_get, >> +.dev_ptypes_info_get = mlx4_dev_ptype_info_get, >> .vlan_filter_set = mlx4_vlan_filter_set, >> .vlan_tpid_set = NULL, >> .vlan_strip_queue_set = NULL, >> -- >> 2.1.4 >>
[dpdk-dev] Traffic scheduling in DPDK
Thanks Jasvinder , I am running the below command ./build/qos_sched -c 0xe -n 1 -- --pfc "0,1,3,2" --cfg ./profile.cfg Bound two 1G physical ports to DPDK , and started running the above command with the default profile mentioned in profile.cfg . I am using lcore 3 and 2 for RX and TX. It was not successful, getting the below error. APP: Initializing port 0... PMD: eth_igb_rx_queue_setup(): sw_ring=0x7f5b20ba2240 hw_ring=0x7f5b20ba2680 dma_addr=0xbf87a2680 PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7f5b20b910c0 hw_ring=0x7f5b20b92100 dma_addr=0xbf8792100 PMD: eth_igb_start(): << done: Link Up - speed 1000 Mbps - full-duplex APP: Initializing port 1... PMD: eth_igb_rx_queue_setup(): sw_ring=0x7f5b20b80a40 hw_ring=0x7f5b20b80e80 dma_addr=0xbf8780e80 PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16. PMD: eth_igb_tx_queue_setup(): sw_ring=0x7f5b20b6f8c0 hw_ring=0x7f5b20b70900 dma_addr=0xbf8770900 PMD: eth_igb_start(): << done: Link Up - speed 1000 Mbps - full-duplex SCHED: Low level config for pipe profile 0: Token bucket: period = 3277, credits per period = 8, size = 100 Traffic classes: period = 500, credits per period = [12207, 12207, 12207, 12207] Traffic class 3 oversubscription: weight = 0 WRR cost: [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1] EAL: Error - exiting with code: 1 Cause: Unable to config sched subport 0, err=-2 Please, tell me whether I am missing any other configuration. Thanks, Uday -Original Message- From: Singh, Jasvinder [mailto:jasvinder.si...@intel.com] Sent: Monday, January 04, 2016 9:26 PM To: Ravulakollu Udaya Kumar (WT01 - Product Engineering Service); dev at dpdk.org Subject: RE: [dpdk-dev] Traffic scheduling in DPDK Hi Uday, > I have an issue in running qos_sched application in DPDK .Could > someone tell me how to run the command and what each parameter does > In the below mentioned text. > > Application mandatory parameters: > --pfc "RX PORT, TX PORT, RX LCORE, WT LCORE" : Packet flow configuration >multiple pfc can be configured in command line RX PORT - Specifies the packets receive port TX PORT - Specifies the packets transmit port RXCORE - Specifies the Core used for Packet reception and Classification stage of the QoS application. WTCORE- Specifies the Core used for Packet enqueue/dequeue operation (QoS scheduling) and subsequently transmitting the packets out. Multiple pfc can be specified depending upon the number of instances of qos sched required in application. For example- in order to run two instance, following can be used- ./build/qos_sched -c 0x7e -n 4 -- --pfc "0,1,2,3,4" --pfc "2,3,5,6" --cfg "profile.cfg" First instance of qos sched receives packets from port 0 and transmits its packets through port 1 ,while second qos sched will receives packets from port 2 and transmit through port 3. In case of single qos sched instance, following can be used- ./build/qos_sched -c 0x1e -n 4 -- --pfc "0,1,2,3,4" --cfg "profile.cfg" Thanks, Jasvinder The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
[dpdk-dev] [PATCH] virtio: fix rx ring descriptor starvation
On 12/17/2015 7:18 PM, Tom Kiely wrote: > > > On 11/25/2015 05:32 PM, Xie, Huawei wrote: >> On 11/13/2015 5:33 PM, Tom Kiely wrote: >>> If all rx descriptors are processed while transient >>> mbuf exhaustion is present, the rx ring ends up with >>> no available descriptors. Thus no packets are received >>> on that ring. Since descriptor refill is performed post >>> rx descriptor processing, in this case no refill is >>> ever subsequently performed resulting in permanent rx >>> traffic drop. >>> >>> Signed-off-by: Tom Kiely >>> --- >>> drivers/net/virtio/virtio_rxtx.c |6 -- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/virtio/virtio_rxtx.c >>> b/drivers/net/virtio/virtio_rxtx.c >>> index 5770fa2..a95e234 100644 >>> --- a/drivers/net/virtio/virtio_rxtx.c >>> +++ b/drivers/net/virtio/virtio_rxtx.c >>> @@ -586,7 +586,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf >>> **rx_pkts, uint16_t nb_pkts) >>> if (likely(num > DESC_PER_CACHELINE)) >>> num = num - ((rxvq->vq_used_cons_idx + num) % >>> DESC_PER_CACHELINE); >>> -if (num == 0) >>> +/* Refill free descriptors even if no pkts recvd */ >>> +if (num == 0 && virtqueue_full(rxvq)) >> Should the return condition be that no used buffers and we have avail >> descs in avail ring, i.e, >> num == 0 && rxvq->vq_free_cnt != rxvq->vq_nentries >> >> rather than >> num == 0 && rxvq->vq_free_cnt == 0 > Yes we could do that but I don't see a good reason to wait until the > vq_free_cnt == vq_nentries > before attempting the refill. The existing code will attempt refill > even if only 1 packet was received > and the free count is small. To me it seems safer to extend that to > try refill even if no packet was received > but the free count is non-zero. The existing code attempt to refill only if 1 packet was received. If we want to refill even no packet was received, then the strict condition should be num == 0 && rxvq->vq_free_cnt != rxvq->vq_nentries The safer condition, what you want to use, should be num == 0 && !virtqueue_full(...) rather than num == 0 && virtqueue_full(...) We could simplify things a bit, just remove this check, if the following receiving code already takes care of the "num == 0" condition. I find virtqueue_full is confusing, maybe we could change it to some other meaningful name. > >Tom > >>> return 0; >>> num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num); >>> @@ -683,7 +684,8 @@ virtio_recv_mergeable_pkts(void *rx_queue, >>> virtio_rmb(); >>> -if (nb_used == 0) >>> +/* Refill free descriptors even if no pkts recvd */ >>> +if (nb_used == 0 && virtqueue_full(rxvq)) >>> return 0; >>> PMD_RX_LOG(DEBUG, "used:%d\n", nb_used); > >
[dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio ring
On 1/5/2016 2:42 PM, Xie, Huawei wrote: > This patch removes the internal lockless enqueue implmentation. > DPDK doesn't support receiving/transmitting packets from/to the same > queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK > applications normally have their own lock implmentation when enqueue > packets to the same queue of a port. > > The atomic cmpset is a costly operation. This patch should help > performance a bit. > > Signed-off-by: Huawei Xie This patch modifies the API's behavior, which is also a trivial ABI change. In my opinion, application shouldn't rely on previous behavior. Anyway, i am checking how to declare the ABI change.
[dpdk-dev] [PATCH] pmd/virtio: fix cannot start virtio dev after stop
Fix the issue that virtio device cannot be started after stopped. The field, hw->started, should be changed by virtio_dev_start/stop instead of virtio_dev_close. Signed-off-by: Jianfeng Tan --- drivers/net/virtio/virtio_ethdev.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d928339..07fe271 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -490,11 +490,13 @@ virtio_dev_close(struct rte_eth_dev *dev) PMD_INIT_LOG(DEBUG, "virtio_dev_close"); + if (hw->started == 1) + virtio_dev_stop(eth_dev); + /* reset the NIC */ if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC) vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); vtpci_reset(hw); - hw->started = 0; virtio_dev_free_mbufs(dev); virtio_free_queues(dev); } @@ -1408,10 +1410,9 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev) if (rte_eal_process_type() == RTE_PROC_SECONDARY) return -EPERM; - if (hw->started == 1) { - virtio_dev_stop(eth_dev); - virtio_dev_close(eth_dev); - } + /* Close it anyway since there's no way to know if closed */ + virtio_dev_close(eth_dev); + pci_dev = eth_dev->pci_dev; eth_dev->dev_ops = NULL; @@ -1615,6 +1616,8 @@ virtio_dev_stop(struct rte_eth_dev *dev) PMD_INIT_LOG(DEBUG, "stop"); + hw->started = 0; + if (dev->data->dev_conf.intr_conf.lsc) rte_intr_disable(&dev->pci_dev->intr_handle); -- 2.1.4
[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
On 10/26/2015 10:06 PM, Xie, Huawei wrote: > On 10/19/2015 1:16 PM, Stephen Hemminger wrote: >> This is a tested version of the virtio Tx performance improvements >> that I posted earlier on the list, and described at the DPDK Userspace >> meeting in Dublin. Together they get a 25% performance improvement for >> both small packet and large multi-segment packet case when testing >> from DPDK guest application to Linux KVM host. >> >> Stephen Hemminger (5): >> virtio: clean up space checks on xmit >> virtio: don't use unlikely for normal tx stuff >> virtio: use indirect ring elements >> virtio: use any layout on transmit >> virtio: optimize transmit enqueue > There is one open why merge-able header is used in tx path. Since old > implementation is also using the merge-able header in tx path if this > feature is negotiated, i choose to ack the patch and address this later > if not now. > > Acked-by: Huawei Xie Thomas: This patch isn't in the patchwork. Does Stephen need to send a new one? > > > >
[dpdk-dev] [PATCH v2 0/1] change hugepage sorting to avoid overlapping memcpy
Hi, I want to catch up with the patch about the overlapping memory areas/hugepage sorting. I have incorporated the qsort patch from Jay and made the suggested changes. So this fixes both the valgrind warning about the overlapping memcpy and possible performance problems due to the bubblesort. Best Regards, Ralf --- Ralf Hoffmann (1): change hugepage sorting to avoid overlapping memcpy lib/librte_eal/linuxapp/eal/eal_memory.c | 60 1 file changed, 14 insertions(+), 46 deletions(-) -- 2.5.0
[dpdk-dev] [PATCH v2 1/1] change hugepage sorting to avoid overlapping memcpy
with only one hugepage or already sorted hugepage addresses, the sort function called memcpy with same src and dst pointer. Debugging with valgrind will issue a warning about overlapping area. This patch changes the sort method to qsort to avoid this behavior, according to original patch from Jay Rolette . The separate sort function is no longer necessary. Signed-off-by: Ralf Hoffmann --- v2: * incorporate patch from http://dpdk.org/dev/patchwork/patch/2061/ to use qsort instead of bubble sort, original patch by Jay Rolette lib/librte_eal/linuxapp/eal/eal_memory.c | 60 1 file changed, 14 insertions(+), 46 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 846fd31..a96d10a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -701,54 +701,23 @@ error: return -1; } -/* - * Sort the hugepg_tbl by physical address (lower addresses first on x86, - * higher address first on powerpc). We use a slow algorithm, but we won't - * have millions of pages, and this is only done at init time. - */ static int -sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi) +cmp_physaddr(const void *a, const void *b) { - unsigned i, j; - int compare_idx; - uint64_t compare_addr; - struct hugepage_file tmp; - - for (i = 0; i < hpi->num_pages[0]; i++) { - compare_addr = 0; - compare_idx = -1; - - /* -* browse all entries starting at 'i', and find the -* entry with the smallest addr -*/ - for (j=i; j< hpi->num_pages[0]; j++) { - - if (compare_addr == 0 || -#ifdef RTE_ARCH_PPC_64 - hugepg_tbl[j].physaddr > compare_addr) { +#ifndef RTE_ARCH_PPC_64 + const struct hugepage_file *p1 = (const struct hugepage_file *)a; + const struct hugepage_file *p2 = (const struct hugepage_file *)b; #else - hugepg_tbl[j].physaddr < compare_addr) { + /* PowerPC needs memory sorted in reverse order from x86 */ + const struct hugepage_file *p1 = (const struct hugepage_file *)b; + const struct hugepage_file *p2 = (const struct hugepage_file *)a; #endif - compare_addr = hugepg_tbl[j].physaddr; - compare_idx = j; - } - } - - /* should not happen */ - if (compare_idx == -1) { - RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", __func__); - return -1; - } - - /* swap the 2 entries in the table */ - memcpy(&tmp, &hugepg_tbl[compare_idx], - sizeof(struct hugepage_file)); - memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i], - sizeof(struct hugepage_file)); - memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file)); - } - return 0; + if (p1->physaddr < p2->physaddr) + return -1; + else if (p1->physaddr > p2->physaddr) + return 1; + else + return 0; } /* @@ -1195,8 +1164,7 @@ rte_eal_hugepage_init(void) goto fail; } - if (sort_by_physaddr(&tmp_hp[hp_offset], hpi) < 0) - goto fail; + qsort(&tmp_hp[hp_offset], hpi->num_pages[0], sizeof(struct hugepage_file), cmp_physaddr); #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS /* remap all hugepages into single file segments */ -- 2.5.0
[dpdk-dev] [PATCH v3 0/1] eal/linux: change hugepage sorting to avoid overlapping memcpy
Hi again, I forgot to correctly set the commit title, so this is v3. Best Regards, Ralf --- Ralf Hoffmann (1): change hugepage sorting to avoid overlapping memcpy lib/librte_eal/linuxapp/eal/eal_memory.c | 60 1 file changed, 14 insertions(+), 46 deletions(-) -- 2.5.0
[dpdk-dev] [PATCH v3 1/1] eal/linux: change hugepage sorting to avoid overlapping memcpy
with only one hugepage or already sorted hugepage addresses, the sort function called memcpy with same src and dst pointer. Debugging with valgrind will issue a warning about overlapping area. This patch changes the sort method to qsort to avoid this behavior, according to original patch from Jay Rolette . The separate sort function is no longer necessary. Signed-off-by: Ralf Hoffmann --- v3: * set commit title to eal/linux v2: * incorporate patch from http://dpdk.org/dev/patchwork/patch/2061/ to use qsort instead of bubble sort, original patch by Jay Rolette lib/librte_eal/linuxapp/eal/eal_memory.c | 60 1 file changed, 14 insertions(+), 46 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 846fd31..a96d10a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -701,54 +701,23 @@ error: return -1; } -/* - * Sort the hugepg_tbl by physical address (lower addresses first on x86, - * higher address first on powerpc). We use a slow algorithm, but we won't - * have millions of pages, and this is only done at init time. - */ static int -sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi) +cmp_physaddr(const void *a, const void *b) { - unsigned i, j; - int compare_idx; - uint64_t compare_addr; - struct hugepage_file tmp; - - for (i = 0; i < hpi->num_pages[0]; i++) { - compare_addr = 0; - compare_idx = -1; - - /* -* browse all entries starting at 'i', and find the -* entry with the smallest addr -*/ - for (j=i; j< hpi->num_pages[0]; j++) { - - if (compare_addr == 0 || -#ifdef RTE_ARCH_PPC_64 - hugepg_tbl[j].physaddr > compare_addr) { +#ifndef RTE_ARCH_PPC_64 + const struct hugepage_file *p1 = (const struct hugepage_file *)a; + const struct hugepage_file *p2 = (const struct hugepage_file *)b; #else - hugepg_tbl[j].physaddr < compare_addr) { + /* PowerPC needs memory sorted in reverse order from x86 */ + const struct hugepage_file *p1 = (const struct hugepage_file *)b; + const struct hugepage_file *p2 = (const struct hugepage_file *)a; #endif - compare_addr = hugepg_tbl[j].physaddr; - compare_idx = j; - } - } - - /* should not happen */ - if (compare_idx == -1) { - RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", __func__); - return -1; - } - - /* swap the 2 entries in the table */ - memcpy(&tmp, &hugepg_tbl[compare_idx], - sizeof(struct hugepage_file)); - memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i], - sizeof(struct hugepage_file)); - memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file)); - } - return 0; + if (p1->physaddr < p2->physaddr) + return -1; + else if (p1->physaddr > p2->physaddr) + return 1; + else + return 0; } /* @@ -1195,8 +1164,7 @@ rte_eal_hugepage_init(void) goto fail; } - if (sort_by_physaddr(&tmp_hp[hp_offset], hpi) < 0) - goto fail; + qsort(&tmp_hp[hp_offset], hpi->num_pages[0], sizeof(struct hugepage_file), cmp_physaddr); #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS /* remap all hugepages into single file segments */ -- 2.5.0
[dpdk-dev] Traffic scheduling in DPDK
Hi Uday, > > Thanks Jasvinder , I am running the below command > > ./build/qos_sched -c 0xe -n 1 -- --pfc "0,1,3,2" --cfg ./profile.cfg > > Bound two 1G physical ports to DPDK , and started running the above > command with the default profile mentioned in profile.cfg . > I am using lcore 3 and 2 for RX and TX. It was not successful, getting the > below error. > > APP: Initializing port 0... PMD: eth_igb_rx_queue_setup(): > sw_ring=0x7f5b20ba2240 hw_ring=0x7f5b20ba2680 dma_addr=0xbf87a2680 > PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, > consider setting the TX WTHRESH value to 4, 8, or 16. > PMD: eth_igb_tx_queue_setup(): sw_ring=0x7f5b20b910c0 > hw_ring=0x7f5b20b92100 dma_addr=0xbf8792100 > PMD: eth_igb_start(): << > done: Link Up - speed 1000 Mbps - full-duplex > APP: Initializing port 1... PMD: eth_igb_rx_queue_setup(): > sw_ring=0x7f5b20b80a40 hw_ring=0x7f5b20b80e80 dma_addr=0xbf8780e80 > PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, > consider setting the TX WTHRESH value to 4, 8, or 16. > PMD: eth_igb_tx_queue_setup(): sw_ring=0x7f5b20b6f8c0 > hw_ring=0x7f5b20b70900 dma_addr=0xbf8770900 > PMD: eth_igb_start(): << > done: Link Up - speed 1000 Mbps - full-duplex > SCHED: Low level config for pipe profile 0: > Token bucket: period = 3277, credits per period = 8, size = 100 > Traffic classes: period = 500, credits per period = [12207, 12207, > 12207, > 12207] > Traffic class 3 oversubscription: weight = 0 > WRR cost: [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1] > EAL: Error - exiting with code: 1 > Cause: Unable to config sched subport 0, err=-2 In default profile.cfg, It is assumed that all the nic ports have 10 Gbps rate. The above error occurs when subport's tb_rate (10Gbps) is found more than NIC port's capacity (1 Gbps). Therefore, you need to use either 10 Gbps ports in your application or have to amend the profile.cfg to work with 1 Gbps port. Please refer to DPDK QoS framework document for more details on various parameters - http://dpdk.org/doc/guides/prog_guide/qos_framework.html > -Original Message- > From: Singh, Jasvinder [mailto:jasvinder.singh at intel.com] > Sent: Monday, January 04, 2016 9:26 PM > To: Ravulakollu Udaya Kumar (WT01 - Product Engineering Service); > dev at dpdk.org > Subject: RE: [dpdk-dev] Traffic scheduling in DPDK > > Hi Uday, > > > > I have an issue in running qos_sched application in DPDK .Could > > someone tell me how to run the command and what each parameter does > > In the below mentioned text. > > > > Application mandatory parameters: > > --pfc "RX PORT, TX PORT, RX LCORE, WT LCORE" : Packet flow > configuration > >multiple pfc can be configured in command line > > > RX PORT - Specifies the packets receive port TX PORT - Specifies the packets > transmit port RXCORE - Specifies the Core used for Packet reception and > Classification stage of the QoS application. > WTCORE- Specifies the Core used for Packet enqueue/dequeue operation > (QoS scheduling) and subsequently transmitting the packets out. > > Multiple pfc can be specified depending upon the number of instances of > qos sched required in application. For example- in order to run two instance, > following can be used- > > ./build/qos_sched -c 0x7e -n 4 -- --pfc "0,1,2,3,4" --pfc "2,3,5,6" --cfg > "profile.cfg" > > First instance of qos sched receives packets from port 0 and transmits its > packets through port 1 ,while second qos sched will receives packets from > port 2 and transmit through port 3. In case of single qos sched instance, > following can be used- > > ./build/qos_sched -c 0x1e -n 4 -- --pfc "0,1,2,3,4" --cfg "profile.cfg" > > > Thanks, > Jasvinder > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the > intended recipient, you should not disseminate, distribute or copy this e- > mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. WARNING: Computer viruses can be > transmitted via email. The recipient should check this email and any > attachments for the presence of viruses. The company accepts no liability for > any damage caused by any virus transmitted by this email. www.wipro.com
[dpdk-dev] [PATCH] mk: fix examples build failure
Hi Michael: Seems the examples makefile seems to be broken, easy to reproduce on master branch, below is the outputs on Ubuntu 14.04 amd64 version: ~/work/dpdk$ export RTE_SDK=/home/steeven/work/dpdk ~/work/dpdk$ cd /home/steeven/work/dpdk/examples/helloworld/ ~/work/dpdk/examples/helloworld$ export RTE_TARGET=x86_64-native-linuxapp-gcc ~/work/dpdk/examples/helloworld$ make /home/steeven/work/dpdk/mk/internal/rte.extvars.mk:57: *** Cannot find .config in /home/xueming/work/dpdk. Stop. ~/work/dpdk/examples/helloworld$ cd ../cmdline/ ~/work/dpdk/examples/cmdline$ make /home/steeven/work/dpdk/mk/internal/rte.extvars.mk:57: *** Cannot find .config in /home/xueming/work/dpdk. Stop. Thanks, Steeven On Mon, Dec 28, 2015 at 12:20 PM, Qiu, Michael wrote: > On 12/24/2015 8:38 PM, steeven lee wrote: >> 1. Fix examples build failure >> 2. make build as default output folder name >> >> Signed-off-by: steeven >> --- >> mk/internal/rte.extvars.mk | 4 ++-- >> mk/rte.extsubdir.mk| 2 +- >> 2 files changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/mk/internal/rte.extvars.mk b/mk/internal/rte.extvars.mk >> index 040d39f..cabef0a 100644 >> --- a/mk/internal/rte.extvars.mk >> +++ b/mk/internal/rte.extvars.mk >> @@ -52,9 +52,9 @@ RTE_EXTMK ?= $(RTE_SRCDIR)/Makefile >> export RTE_EXTMK >> >> # RTE_SDK_BIN must point to .config, include/ and lib/. >> -RTE_SDK_BIN := $(RTE_SDK)/$(RTE_TARGET) >> +RTE_SDK_BIN := $(RTE_SDK)/build >> ifeq ($(wildcard $(RTE_SDK_BIN)/.config),) >> -$(error Cannot find .config in $(RTE_SDK)) >> +$(error Cannot find .config in $(RTE_SDK_BIN)) >> endif >> >> # >> diff --git a/mk/rte.extsubdir.mk b/mk/rte.extsubdir.mk >> index f50f006..819020a 100644 >> --- a/mk/rte.extsubdir.mk >> +++ b/mk/rte.extsubdir.mk >> @@ -46,7 +46,7 @@ $(DIRS-y): >> @echo "== $@" >> $(Q)$(MAKE) -C $(@) \ >> M=$(CURDIR)/$(@)/Makefile \ >> - O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/$(@)/$(RTE_TARGET) \ >> + O=$(BASE_OUTPUT)/$(CUR_SUBDIR)/build \ >> BASE_OUTPUT=$(BASE_OUTPUT) \ >> CUR_SUBDIR=$(CUR_SUBDIR)/$(@) \ >> S=$(CURDIR)/$(@) \ > > Could you show your compile error log? And how to reproduce it? > > Thanks, > Michael
[dpdk-dev] [PATCH] fix checkpatch errors
> -Original Message- > From: Tan, Jianfeng > Sent: Tuesday, January 5, 2016 2:21 AM > To: Xie, Huawei; dev at dpdk.org > Cc: Mcnamara, John; Stephen Hemminger; Yuanhan Liu > Subject: RE: [PATCH] fix checkpatch errors > > > > > -Original Message- > > From: Xie, Huawei > > Sent: Monday, January 4, 2016 9:52 AM > > To: dev at dpdk.org > > Cc: Mcnamara, John; Tan, Jianfeng; Xie, Huawei > > Subject: [PATCH] fix checkpatch errors > > > > Signed-off-by: Huawei Xie > ... > > mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name)); > > - return (rte_mempool_lookup((const char *)pool_name)); > > + return rte_mempool_lookup((const char *)pool_name); > > Hi Huawei, > > Assume this patch is to solve below error (reported by checkpatch): > ERROR: return is not a function, parentheses are not required > > So maybe above fix is not necessary? Involve more people to discuss. > > And please include the error message in the commit message. Hi Huawei, The fix looks good and there was a similar patch applied previously for lib (from Ferruh): 6307b909b8e0 ("lib: remove extra parenthesis after return") However, the commit message could be better. Maybe something like the above: "remove extra parentheses". John --
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
Hi Hanoch, On 01/04/2016 03:43 PM, Hanoch Haim (hhaim) wrote: > Hi Oliver, > > Let's take your drawing as a reference and add my question > The use case is sending a duplicate multicast packet by many threads. > I can split it to x threads to do the job and with atomic-ref (my multicast > not mbuf) count it until it reaches zero. > > In my following example the two cores (0 and 1) sending the indirect m1/m2 do > alloc/attach/send > > core0 | core1 > - > |--- > m_const=rte_pktmbuf_alloc(mp) | >| > while true: | while True: >m1 =rte_pktmbuf_alloc(mp_64) |m2 =rte_pktmbuf_alloc(mp_64) >rte_pktmbuf_attach(m1, m_const) |rte_pktmbuf_attach(m1, > m_const) >tx_burst(m1) |tx_burst(m2) > > Is this example is not valid? For me, m_const is not expected to be used concurrently on several cores. By "used", I mean calling a function that modifies the mbuf, which is the case for rte_pktmbuf_attach(). > BTW this is our workaround > > >core0 | core1 > - > |--- > m_const=rte_pktmbuf_alloc(mp) | > rte_mbuf_refcnt_update(m_const,1)| <<-- workaround > | > while true: | while True: >m1 =rte_pktmbuf_alloc(mp_64) |m2 =rte_pktmbuf_alloc(mp_64) >rte_pktmbuf_attach(m1, m_const) |rte_pktmbuf_attach(m1, m_const) >tx_burst(m1) |tx_burst(m2) This workaround indeed solves the issue. Another solution would be to protect the call to attach() with a lock, or call all the rte_pktmbuf_attach() on the same core. I'm open to discuss this behavior for rte_pktmbuf_attach() function (should concurrent calls be allowed or not). In any case, we may want to better document it in the doxygen API comments. Regards, Olivier
[dpdk-dev] [PATCH v2 1/3] librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET
Macros RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET are blocking the secondary process from using the APIs. API access should be given to both secondary and primary. Reported-by: Sean Harte Signed-off-by: Reshma Pattan --- v2: * Removed checkpatch fixes of lib/librte_ether/rte_ethdev.h from this patch. lib/librte_ether/rte_ethdev.c | 50 + 1 files changed, 1 insertions(+), 49 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ed971b4..5849102 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -711,10 +711,6 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -741,10 +737,6 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -771,10 +763,6 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t tx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -801,10 +789,6 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -874,10 +858,6 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, struct rte_eth_dev_info dev_info; int diag; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); if (nb_rx_q > RTE_MAX_QUEUES_PER_PORT) { @@ -1059,10 +1039,6 @@ rte_eth_dev_start(uint8_t port_id) struct rte_eth_dev *dev; int diag; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1096,10 +1072,6 @@ rte_eth_dev_stop(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_RET(); - RTE_ETH_VALID_PORTID_OR_RET(port_id); dev = &rte_eth_devices[port_id]; @@ -1121,10 +1093,6 @@ rte_eth_dev_set_link_up(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1138,10 +1106,6 @@ rte_eth_dev_set_link_down(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1155,10 +1119,6 @@ rte_eth_dev_close(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_RET(); - RTE_ETH_VALID_PORTID_OR_RET(port_id); dev = &rte_eth_devices[port_id]; @@ -1183,10 +1143,6 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id, struct rte_eth_dev *dev; struct rte_eth_dev_info dev_info; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1266,10 +1222,6 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id, struct rte_eth_dev *dev;
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
Hi Oliver, Thank you for the fast response and it would be great to open a discussion on that. In general our project can leverage your optimization and I think it is great (we should have thought about it) . We can use it using the workaround I described. However, for me it seems odd that rte_pktmbuf_attach () that does not *change* anything in m_const, except of the *atomic* ref counter does not work in parallel. The example I gave is a classic use case of rte_pktmbuf_attach (multicast ) and I don't see why it wouldn't work after your optimization. Do you have a pointer to the documentation that state that that you can't call the atomic ref counter from more than one thread? Thanks, Hanoh -Original Message- From: Olivier MATZ [mailto:olivier.m...@6wind.com] Sent: Tuesday, January 05, 2016 12:58 PM To: Hanoch Haim (hhaim); bruce.richardson at intel.com Cc: dev at dpdk.org; Ido Barnea (ibarnea); Itay Marom (imarom) Subject: Re: [dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update Hi Hanoch, On 01/04/2016 03:43 PM, Hanoch Haim (hhaim) wrote: > Hi Oliver, > > Let's take your drawing as a reference and add my question The use > case is sending a duplicate multicast packet by many threads. > I can split it to x threads to do the job and with atomic-ref (my multicast > not mbuf) count it until it reaches zero. > > In my following example the two cores (0 and 1) sending the indirect > m1/m2 do alloc/attach/send > > core0 | core1 > - > |--- > m_const=rte_pktmbuf_alloc(mp) | >| > while true: | while True: >m1 =rte_pktmbuf_alloc(mp_64) |m2 =rte_pktmbuf_alloc(mp_64) >rte_pktmbuf_attach(m1, m_const) |rte_pktmbuf_attach(m1, > m_const) >tx_burst(m1) |tx_burst(m2) > > Is this example is not valid? For me, m_const is not expected to be used concurrently on several cores. By "used", I mean calling a function that modifies the mbuf, which is the case for rte_pktmbuf_attach(). > BTW this is our workaround > > >core0 | core1 > - > |--- > m_const=rte_pktmbuf_alloc(mp) | > rte_mbuf_refcnt_update(m_const,1)| <<-- workaround > | > while true: | while True: >m1 =rte_pktmbuf_alloc(mp_64) |m2 =rte_pktmbuf_alloc(mp_64) >rte_pktmbuf_attach(m1, m_const) |rte_pktmbuf_attach(m1, m_const) >tx_burst(m1) |tx_burst(m2) This workaround indeed solves the issue. Another solution would be to protect the call to attach() with a lock, or call all the rte_pktmbuf_attach() on the same core. I'm open to discuss this behavior for rte_pktmbuf_attach() function (should concurrent calls be allowed or not). In any case, we may want to better document it in the doxygen API comments. Regards, Olivier
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
Hi Hanoch, On 01/05/2016 12:11 PM, Hanoch Haim (hhaim) wrote: > Hi Oliver, > Thank you for the fast response and it would be great to open a discussion on > that. > In general our project can leverage your optimization and I think it is great > (we should have thought about it) . We can use it using the workaround I > described. > However, for me it seems odd that rte_pktmbuf_attach () that does not > *change* anything in m_const, except of the *atomic* ref counter does not > work in parallel. > The example I gave is a classic use case of rte_pktmbuf_attach (multicast ) > and I don't see why it wouldn't work after your optimization. > > Do you have a pointer to the documentation that state that that you can't > call the atomic ref counter from more than one thread? Unfortunately it's not documented yet, but it's something we should better describe. Regards, Olivier
[dpdk-dev] [RFC PATCH 1/3] fm10k: enable FTAG based forwarding
This patch enables reading sglort info into mbuf for RX and inserting an FTAG at the beginning of the packet for TX. The vlan_tci_outer field selected from rte_mbuf structure for sglort is not used in fm10k now. In FTAG based forwarding mode, the switch will forward packets according to glort info in FTAG rather than mac and vlan table. To activate this feature, user needs to turn CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD to y in common_linuxapp or common_bsdapp. Currently this feature is supported only on PF. Signed-off-by: Wang Xiao W --- config/common_bsdapp | 1 + config/common_linuxapp | 1 + drivers/net/fm10k/fm10k_ethdev.c | 5 + drivers/net/fm10k/fm10k_rxtx.c | 17 + drivers/net/fm10k/fm10k_rxtx_vec.c | 9 + 5 files changed, 33 insertions(+) diff --git a/config/common_bsdapp b/config/common_bsdapp index ed7c31c..451f81a 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -208,6 +208,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n # # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD diff --git a/config/common_linuxapp b/config/common_linuxapp index 74bc515..c928bce 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -207,6 +207,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n # # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index e4aed94..d5c376a 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -668,6 +668,11 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev) PMD_INIT_LOG(ERR, "failed to disable queue %d", i); return -1; } +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD + /* enable use of FTAG bit in Tx descriptor, register is RO for VF */ + if (hw->mac.type == fm10k_mac_pf) + FM10K_WRITE_REG(hw, FM10K_PFVTCTL(i), FM10K_PFVTCTL_FTAG_DESC_ENABLE); +#endif /* set location and size for descriptor ring */ FM10K_WRITE_REG(hw, FM10K_TDBAL(i), diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c index e958865..f87987d 100644 --- a/drivers/net/fm10k/fm10k_rxtx.c +++ b/drivers/net/fm10k/fm10k_rxtx.c @@ -152,6 +152,13 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, */ mbuf->ol_flags |= PKT_RX_VLAN_PKT; mbuf->vlan_tci = desc.w.vlan; +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD + /** +* mbuf->vlan_tci_outer is an idle field in fm10k driver, +* so it can be selected to store sglort value. +*/ + mbuf->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort); +#endif rx_pkts[count] = mbuf; if (++next_dd == q->nb_desc) { @@ -307,6 +314,13 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, */ mbuf->ol_flags |= PKT_RX_VLAN_PKT; first_seg->vlan_tci = desc.w.vlan; +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD + /** +* mbuf->vlan_tci_outer is an idle field in fm10k driver, +* so it can be selected to store sglort value. +*/ + first_seg->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort); +#endif /* Prefetch data of first segment, if configured to do so. */ rte_packet_prefetch((char *)first_seg->buf_addr + @@ -432,6 +446,9 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, struct rte_mbuf *mb) q->nb_free -= mb->nb_segs; q->hw_ring[q->next_free].flags = 0; +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD + q->hw_ring[q->next_free].flags |= FM10K_TXD_FLAG_FTAG; +#endif /* set checksum flags on first descriptor of packet. SCTP checksum * offload is not supported, but we do not explicitly check for this * case in favor of greatly simplified processing. */ diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 2a57eef..0b0f2e3 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -198,7 +198,12 @@ fm10k_rx_vec_condition_check(struct rte_eth_dev *dev) rxmode->header_split == 1) return -1; +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD + return -1; +#else return 0; +#endif + #else RTE_SET_USED(dev); return -1; @@ -648,7 +653,11 @@ fm10k_tx_vec_condition_check(struct fm10k_tx_queue *txq) if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != FM10K_SIMPLE_TX_FLAG)
[dpdk-dev] [RFC PATCH 2/3] fm10k: add a unit test for FTAG based forwarding
This patch adds a unit test case for FTAG functional test. Before running the test, set PORT0_GLORT and PORT1_GLORT environment variables, and ensure two fm10k ports are used for dpdk, glort info for each port can be shown in TestPoint. In the unit test, a packet will be forwarded to the target port by the switch without changing the destination mac address. Signed-off-by: Wang Xiao W --- app/test/Makefile | 1 + app/test/test_fm10k_ftag.c | 253 + 2 files changed, 254 insertions(+) create mode 100644 app/test/test_fm10k_ftag.c diff --git a/app/test/Makefile b/app/test/Makefile index ec33e1a..d72be8d 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -57,6 +57,7 @@ SRCS-y += test_memzone.c SRCS-y += test_ring.c SRCS-y += test_ring_perf.c SRCS-y += test_pmd_perf.c +SRCS-$(CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD) += test_fm10k_ftag.c ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) SRCS-y += test_table.c diff --git a/app/test/test_fm10k_ftag.c b/app/test/test_fm10k_ftag.c new file mode 100644 index 000..325a652 --- /dev/null +++ b/app/test/test_fm10k_ftag.c @@ -0,0 +1,253 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include "test.h" + +#define RX_RING_SIZE 128 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +struct fm10k_ftag { + uint16_t swpri_type_user; + uint16_t vlan; + uint16_t sglort; + uint16_t dglort; +}; + +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +/* + * Initializes a given port using global settings and with the RX buffers + * coming from the mbuf_pool passed as a parameter. + */ +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8
[dpdk-dev] [RFC PATCH 0/3] fm10k: enable FTAG based forwarding
This is a RFC patch set for FTAG based forwarding feature of RRC. Wang Xiao W (3): fm10k: enable FTAG based forwarding fm10k: add a unit test for FTAG based forwarding doc: add introduction for fm10k FTAG based forwarding app/test/Makefile | 1 + app/test/test_fm10k_ftag.c | 253 + config/common_bsdapp | 1 + config/common_linuxapp | 1 + doc/guides/nics/fm10k.rst | 13 ++ drivers/net/fm10k/fm10k_ethdev.c | 5 + drivers/net/fm10k/fm10k_rxtx.c | 17 +++ drivers/net/fm10k/fm10k_rxtx_vec.c | 9 ++ 8 files changed, 300 insertions(+) create mode 100644 app/test/test_fm10k_ftag.c -- 1.9.3
[dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k FTAG based forwarding
Add a brief introduction on FTAG, describes what's FTAG and how it works in forwarding, introduction on how to run fm10k with FTAG is also included. Signed-off-by: Wang Xiao W --- doc/guides/nics/fm10k.rst | 13 + 1 file changed, 13 insertions(+) diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst index 4206b7f..d82bf41 100644 --- a/doc/guides/nics/fm10k.rst +++ b/doc/guides/nics/fm10k.rst @@ -34,6 +34,19 @@ FM10K Poll Mode Driver The FM10K poll mode driver library provides support for the Intel FM1 (FM10K) family of 40GbE/100GbE adapters. +FTAG Based Forwarding of FM10K +-- +FTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs +support the addition of a Fabric Tag (FTAG) to carry special information. +The FTAG is placed at the beginning of the frame, it contains information such +as where the packet comes from and goes, the vlan tag. In FTAG based forwarding +mode, the switch logic forwards packets according to glort (global resource tag) +information, other than the mac and vlan table. Now this feature works only on +PF. + +To enable this feature, turn CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD to y in the +configuration file. A unit test case fm10k_ftag_autotest is for reference, it shows +how to read sglort info on RX and how to make an FTAG on TX. Limitations --- -- 1.9.3
[dpdk-dev] [PATCH 1/8] bond: use existing enslaved device queues
On 04/12/15 17:14, Stephen Hemminger wrote: > From: Eric Kinzie > > This solves issues when an active device is added to a bond. > > If a device to be enslaved already has transmit and/or receive queues > allocated, use those and then create any additional queues that are > necessary. > > Signed-off-by: Eric Kinzie > Signed-off-by: Stephen Hemminger > --- ... > Acked-by: Declan Doherty
[dpdk-dev] [PATCH 2/8] bond mode 4: copy entire config structure
On 04/12/15 17:14, Stephen Hemminger wrote: > From: Eric Kinzie > > Copy all needed fields from the mode8023ad_private structure in > bond_mode_8023ad_conf_get(). This help ensure that a subsequent call > to rte_eth_bond_8023ad_setup() is not passed uninitialized data that > would result in either incorrect behavior or a failed sanity check. > > Fixes: 46fb43683679 ("bond: add mode 4") > > Signed-off-by: Eric Kinzie > Signed-off-by: Stephen Hemminger > --- ... > Acked-by: Declan Doherty
[dpdk-dev] [PATCH 3/8] bond mode 4: do not ignore multicast
On 04/12/15 17:14, Stephen Hemminger wrote: > From: Eric Kinzie > > The bonding PMD in mode 4 puts all enslaved interfaces into promiscuous > mode in order to receive LACPDUs and must filter unwanted packets > after the traffic has been "collected". Allow broadcast and multicast > through so that ARP and IPv6 neighbor discovery continue to work. > > Fixes: 46fb43683679 ("bond: add mode 4") > > Signed-off-by: Eric Kinzie > Signed-off-by: Stephen Hemminger > --- ... > Acked-by: Declan Doherty
[dpdk-dev] [PATCH 4/8] bond mode 4: allow external state machine
On 04/12/15 17:14, Stephen Hemminger wrote: > From: Eric Kinzie > > Provide functions to allow an external 802.3ad state machine to transmit > and recieve LACPDUs and to set the collection/distribution flags on > slave interfaces. > > Signed-off-by: Eric Kinzie > Signed-off-by: Stephen Hemminger > --- ... > Acked-by: Declan Doherty
[dpdk-dev] [PATCH 5/8] bond: active slaves with no primary
On 04/12/15 17:14, Stephen Hemminger wrote: > From: Eric Kinzie > > If the link state of a slave is "up" when added, it is added to the list > of active slaves but, even if it is the only slave, is not selected as > the primary interface. Generally, handling of link state interrupts > selects an interface to be primary, but only if the active count is zero. > This change avoids the situation where there are active slaves but > no primary. > > Signed-off-by: Eric Kinzie > Signed-off-by: Stephen Hemminger > --- ... > Acked-by: Declan Doherty
[dpdk-dev] [PATCH 6/8] bond: handle slaves with fewer queues than bonding device
On 04/12/15 19:18, Eric Kinzie wrote: > On Fri Dec 04 19:36:09 +0100 2015, Andriy Berestovskyy wrote: >> Hi guys, >> I'm not quite sure if we can support less TX queues on a slave that easy: >> >>> queue_id = bond_slave_txqid(internals, i, bd_tx_q->queue_id); >>> num_tx_slave = rte_eth_tx_burst(slaves[i], queue_id, >>> slave_bufs[i], slave_nb_pkts[i]); >> >> It seems that two different lcores might end up writing to the same >> slave queue at the same time, isn't it? >> >> Regards, >> Andriy > > Andriy, I think you're probably right about this. Perhaps it should > instead refuse to add or refuse to activate a slave with too few > tx queues. Could probably fix this with another layer of buffering > so that an lcore with a valid tx queue could pick up the mbufs later, > but this doesn't seem very appealing. > > Eric > > >> On Fri, Dec 4, 2015 at 6:14 PM, Stephen Hemminger >> wrote: >>> From: Eric Kinzie >>> >>> In the event that the bonding device has a greater number of tx and/or rx >>> queues than the slave being added, track the queue limits of the slave. >>> On receive, ignore queue identifiers beyond what the slave interface >>> can support. During transmit, pick a different queue id to use if the >>> intended queue is not available on the slave. >>> >>> Signed-off-by: Eric Kinzie >>> Signed-off-by: Stephen Hemminger >>> --- ... I don't there is any straight forward way of supporting slaves with different numbers of queues, the initial library was written with the assumption that the number of tx/rx queues would always be the same on each slave. This is why,when a slave is added to a bonded device we reconfigure the queues. For features like RSS we have to have the same number of rx queues otherwise the flow distribution to an application could change in the case of a fail over event. Also by supporting different numbers of queues between slaves we would be no longer be supporting the standard behavior of ethdevs in DPDK were we expect that by using different queues we don't require locking to be thread safe.
[dpdk-dev] [PATCH 8/8] bond: do not activate slave twice
On 04/12/15 17:14, Stephen Hemminger wrote: > From: Eric Kinzie > > The current code for detecting link during slave addition can cause a > slave interface to be activated twice -- once during slave_configure() > and again at the end of __eth_bond_slave_add_lock_free(). This will > either cause the active slave count to be incorrect or will cause the > 802.3ad activation function to panic. Ensure that the interface is not > activated more than once. > > Signed-off-by: Eric Kinzie > Signed-off-by: Stephen Hemminger > --- ... > Acked-by: Declan Doherty
[dpdk-dev] [PATCH] af_packet: make the device detachable
Fix memory leak when detaching virtual device. Set dev_flags to RTE_ETH_DEV_DETACHABLE and implement pmd_af_packet_drv.uninit method. Copy device name to ethdev->data to make it compatibile with rte_eth_dev_allocated(). Signed-off-by: Wojciech Zmuda --- drivers/net/af_packet/rte_eth_af_packet.c | 29 - 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index 767f36b..7ef65ff 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -667,11 +667,13 @@ rte_pmd_init_internals(const char *name, data->nb_tx_queues = (uint16_t)nb_queues; data->dev_link = pmd_link; data->mac_addrs = &(*internals)->eth_addr; + strncpy(data->name, + (*eth_dev)->data->name, strlen((*eth_dev)->data->name)); (*eth_dev)->data = data; (*eth_dev)->dev_ops = &ops; (*eth_dev)->driver = NULL; - (*eth_dev)->data->dev_flags = 0; + (*eth_dev)->data->dev_flags = RTE_ETH_DEV_DETACHABLE; (*eth_dev)->data->drv_name = drivername; (*eth_dev)->data->kdrv = RTE_KDRV_NONE; (*eth_dev)->data->numa_node = numa_node; @@ -836,10 +838,35 @@ exit: return ret; } +static int +rte_pmd_af_packet_devuninit(const char *name) +{ + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(INFO, PMD, "Closing AF_PACKET ethdev on numa socket %u\n", + rte_socket_id()); + + if (name == NULL) + return -1; + + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) + return -1; + + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data); + + rte_eth_dev_release_port(eth_dev); + + return 0; +} + static struct rte_driver pmd_af_packet_drv = { .name = "eth_af_packet", .type = PMD_VDEV, .init = rte_pmd_af_packet_devinit, + .uninit = rte_pmd_af_packet_devuninit, }; PMD_REGISTER_DRIVER(pmd_af_packet_drv); -- 1.9.1
[dpdk-dev] [PATCH 4/4] virtio: check if any kernel driver is manipulating the device
On 01/04/2016 11:02 AM, Xie, Huawei wrote: > On 12/25/2015 6:33 PM, Xie, Huawei wrote: >> virtio PMD could use IO port to configure the virtio device without >> using uio driver. >> >> There are two issues with previous implementation: >> 1) virtio PMD will take over each virtio device blindly even if some >> are not intended for DPDK. >> 2) driver conflict between virtio PMD and virtio-net kernel driver. >> >> This patch checks if there is any kernel driver manipulating the virtio >> device before virtio PMD uses IO port to configure the device. >> >> Fixes: da978dfdc43b ("virtio: use port IO to get PCI resource") >> >> Signed-off-by: Huawei Xie >> --- >> drivers/net/virtio/virtio_ethdev.c | 7 +++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/drivers/net/virtio/virtio_ethdev.c >> b/drivers/net/virtio/virtio_ethdev.c >> index 00015ef..504346a 100644 >> --- a/drivers/net/virtio/virtio_ethdev.c >> +++ b/drivers/net/virtio/virtio_ethdev.c >> @@ -1138,6 +1138,13 @@ static int virtio_resource_init_by_ioports(struct >> rte_pci_device *pci_dev) >> int found = 0; >> size_t linesz; >> >> +if (pci_dev->kdrv != RTE_KDRV_NONE) { >> +PMD_INIT_LOG(ERR, > Better change ERR to INFO and revise the message followed, since user > might not want to use this device for DPDK. Indeed. The whole point of this exercise is to have a clear way of telling DPDK which virtio devices it should (and should not) use, so it should just act accordingly and shut up. >> +"%s(): kernel driver is manipulating this device." \ >> +" Please unbind the kernel driver.", __func__); I'd suggest just dropping the whole message, DPDK doesn't log such messages for any other devices either. That, or make it a generic debug-level log in pci_scan_one(). - Panu -
[dpdk-dev] [PATCH 6/8] bond: handle slaves with fewer queues than bonding device
A common usage scenario is to bond a vnic like virtio which typically has only a single rx queue with a VF device that has multiple receive queues. This is done to do live migration On Jan 5, 2016 05:47, "Declan Doherty" wrote: > On 04/12/15 19:18, Eric Kinzie wrote: > >> On Fri Dec 04 19:36:09 +0100 2015, Andriy Berestovskyy wrote: >> >>> Hi guys, >>> I'm not quite sure if we can support less TX queues on a slave that easy: >>> >>> queue_id = bond_slave_txqid(internals, i, bd_tx_q->queue_id); num_tx_slave = rte_eth_tx_burst(slaves[i], queue_id, slave_bufs[i], slave_nb_pkts[i]); >>> >>> It seems that two different lcores might end up writing to the same >>> slave queue at the same time, isn't it? >>> >>> Regards, >>> Andriy >>> >> >> Andriy, I think you're probably right about this. Perhaps it should >> instead refuse to add or refuse to activate a slave with too few >> tx queues. Could probably fix this with another layer of buffering >> so that an lcore with a valid tx queue could pick up the mbufs later, >> but this doesn't seem very appealing. >> >> Eric >> >> >> On Fri, Dec 4, 2015 at 6:14 PM, Stephen Hemminger >>> wrote: >>> From: Eric Kinzie In the event that the bonding device has a greater number of tx and/or rx queues than the slave being added, track the queue limits of the slave. On receive, ignore queue identifiers beyond what the slave interface can support. During transmit, pick a different queue id to use if the intended queue is not available on the slave. Signed-off-by: Eric Kinzie Signed-off-by: Stephen Hemminger --- >>> ... > > > I don't there is any straight forward way of supporting slaves with > different numbers of queues, the initial library was written with the > assumption that the number of tx/rx queues would always be the same on each > slave. This is why,when a slave is added to a bonded device we reconfigure > the queues. For features like RSS we have to have the same number of rx > queues otherwise the flow distribution to an application could change in > the case of a fail over event. Also by supporting different numbers of > queues between slaves we would be no longer be supporting the standard > behavior of ethdevs in DPDK were we expect that by using different queues > we don't require locking to be thread safe. > > >
[dpdk-dev] [PATCH 01/12] ethdev: add API to query what/if packet type is set
On Mon, Jan 04, 2016 at 02:36:14PM +, Ananyev, Konstantin wrote: > > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil > > Sent: Monday, January 04, 2016 11:38 AM > > To: Tan, Jianfeng > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH 01/12] ethdev: add API to query what/if > > packet type is set > > > > I'm not sure about the usefulness of this new callback, but one issue I see > > with rte_eth_dev_get_ptype_info() is that determining the proper size for > > ptypes[] according to a mask is awkward. For instance suppose > > RTE_PTYPE_L4_MASK is redefined to a different size at some point, the caller > > must dynamically adjust its ptypes[] array size to avoid a possible > > overflow, just in case. > > > > I suggest one of these solutions: > > > > - A callback to query for a single type at once instead (easiest method in > > my opinion). > > > > - An additional argument with the number of entries in ptypes[], in which > > case rte_eth_dev_get_ptype_info() should return the number of entries that > > would have been filled regardless, a bit like snprintf(). > > +1 for the second option. > Also not sure you really need: RTE_PTYPE_*_MAX_NUM macros. > Konstantin +1 for the second option. But see below. > > > > On Thu, Dec 31, 2015 at 02:53:08PM +0800, Jianfeng Tan wrote: > > > Add a new API rte_eth_dev_get_ptype_info to query what/if packet type will > > > be set by current rx burst function. > > > > > > Signed-off-by: Jianfeng Tan > > > --- > > > lib/librte_ether/rte_ethdev.c | 12 > > > lib/librte_ether/rte_ethdev.h | 22 ++ > > > lib/librte_mbuf/rte_mbuf.h| 13 + > > > 3 files changed, 47 insertions(+) > > > > > > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c > > > index ed971b4..1885374 100644 > > > --- a/lib/librte_ether/rte_ethdev.c > > > +++ b/lib/librte_ether/rte_ethdev.c > > > @@ -1614,6 +1614,18 @@ rte_eth_dev_info_get(uint8_t port_id, struct > > > rte_eth_dev_info *dev_info) > > > dev_info->driver_name = dev->data->drv_name; > > > } > > > > > > +int > > > +rte_eth_dev_get_ptype_info(uint8_t port_id, uint32_t ptype_mask, > > > + uint32_t ptypes[]) > > > +{ > > > + struct rte_eth_dev *dev; > > > + > > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); > > > + dev = &rte_eth_devices[port_id]; > > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_ptype_info_get, -ENOTSUP); > > > + return (*dev->dev_ops->dev_ptype_info_get)(dev, ptype_mask, ptypes); > > > +} > > > + > > > void > > > rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr) > > > { > > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h > > > index bada8ad..e97b632 100644 > > > --- a/lib/librte_ether/rte_ethdev.h > > > +++ b/lib/librte_ether/rte_ethdev.h > > > @@ -1021,6 +1021,10 @@ typedef void (*eth_dev_infos_get_t)(struct > > > rte_eth_dev *dev, > > > struct rte_eth_dev_info *dev_info); > > > /**< @internal Get specific informations of an Ethernet device. */ > > > > > > +typedef int (*eth_dev_ptype_info_get_t)(struct rte_eth_dev *dev, > > > + uint32_t ptype_mask, uint32_t ptypes[]); > > > +/**< @internal Get ptype info of eth_rx_burst_t. */ > > > + > > > typedef int (*eth_queue_start_t)(struct rte_eth_dev *dev, > > > uint16_t queue_id); > > > /**< @internal Start rx and tx of a queue of an Ethernet device. */ > > > @@ -1347,6 +1351,7 @@ struct eth_dev_ops { > > > eth_queue_stats_mapping_set_t queue_stats_mapping_set; > > > /**< Configure per queue stat counter mapping. */ > > > eth_dev_infos_get_tdev_infos_get; /**< Get device info. */ > > > + eth_dev_ptype_info_get_t dev_ptype_info_get; /** Get ptype info */ > > > mtu_set_t mtu_set; /**< Set MTU. */ > > > vlan_filter_set_t vlan_filter_set; /**< Filter VLAN Setup. */ > > > vlan_tpid_set_tvlan_tpid_set; /**< Outer VLAN TPID > > > Setup. */ > > > @@ -2273,6 +2278,23 @@ extern void rte_eth_dev_info_get(uint8_t port_id, > > >struct rte_eth_dev_info *dev_info); > > > > > > /** > > > + * Retrieve the contextual information of an Ethernet device. > > > + * > > > + * @param port_id > > > + * The port identifier of the Ethernet device. > > > + * @param ptype_mask > > > + * A hint of what kind of packet type which the caller is interested in > > > + * @param ptypes > > > + * An array of packet types to be filled with > > > + * @return > > > + * - (>=0) if successful. Indicate number of valid values in ptypes > > > array. > > > + * - (-ENOTSUP) if hardware-assisted VLAN stripping not configured. > > > + * - (-ENODEV) if *port_id* invalid. > > > + */ > > > +extern int rte_eth_dev_get_ptype_info(uint8_t port_id, > > > + uint32_t ptype_mask, uint32_t ptypes[]); > > > + > > > +/** > >
[dpdk-dev] [PATCH 08/12] pmd/mlx4: add dev_ptype_info_get implementation
On Tue, Jan 05, 2016 at 11:08:04AM +0800, Tan, Jianfeng wrote: > > > On 1/4/2016 7:11 PM, Adrien Mazarguil wrote: > >Hi Jianfeng, > > > >I'm only commenting the mlx4/mlx5 bits in this message, see below. > > > >On Thu, Dec 31, 2015 at 02:53:15PM +0800, Jianfeng Tan wrote: > >>Signed-off-by: Jianfeng Tan > >>--- > >> drivers/net/mlx4/mlx4.c | 27 +++ > >> 1 file changed, 27 insertions(+) > >> > >>diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c > >>index 207bfe2..85afa32 100644 > >>--- a/drivers/net/mlx4/mlx4.c > >>+++ b/drivers/net/mlx4/mlx4.c > >>@@ -2836,6 +2836,8 @@ rxq_cleanup(struct rxq *rxq) > >> * @param flags > >> * RX completion flags returned by poll_length_flags(). > >> * > >>+ * @note: fix mlx4_dev_ptype_info_get() if any change here. > >>+ * > >> * @return > >> * Packet type for struct rte_mbuf. > >> */ > >>@@ -4268,6 +4270,30 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct > >>rte_eth_dev_info *info) > >>priv_unlock(priv); > >> } > >>+static int > >>+mlx4_dev_ptype_info_get(struct rte_eth_dev *dev, uint32_t ptype_mask, > >>+ uint32_t ptypes[]) Note this line is not properly indented (uint32_t should be aligned like the rest of the file). > >>+{ > >>+ int num = 0; > >>+ > >>+ if ((dev->rx_pkt_burst == mlx4_rx_burst) > >>+ || (dev->rx_pkt_burst == mlx4_rx_burst_sp)) { I prefer operators/separators at the end of the previous line, indentation should be fixed as well. > >>+ /* refers to rxq_cq_to_pkt_type() */ > >>+ if ((ptype_mask & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_MASK) { > >>+ ptypes[num++] = RTE_PTYPE_L3_IPV4; > >>+ ptypes[num++] = RTE_PTYPE_L3_IPV6; > >>+ } > >>+ > >>+ if ((ptype_mask & RTE_PTYPE_INNER_L3_MASK) == > >>RTE_PTYPE_INNER_L3_MASK) { > >>+ ptypes[num++] = RTE_PTYPE_INNER_L3_IPV4; > >>+ ptypes[num++] = RTE_PTYPE_INNER_L3_IPV6; > >>+ } > >>+ } else > >>+ num = -ENOTSUP; > >>+ > >>+ return num; > >>+} > >I think checking for mlx4_rx_burst and mlx4_rx_burst_sp is unnecessary at > >the moment, all RX burst functions do update the packet_type field, no need > >for extra complexity. > > > >Same comment for mlx5. > > Hi Mazarguil, > > My original thought is that rx_pkt_burst could be also set as > removed_rx_burst, which does not make sense indeed > because it's only possible when the device is closed. Yes, indeed. > Another consideration is to keep same style with other devices. Each > kind of device could have several rx burst functions. > So current implementation can keep extensibility to add new rx burst > functions. How do you think of it? OK, that makes sense. Please check my above comments about coding style/indents (I know I'm annoying). > >>+ > >> /** > >> * DPDK callback to get device statistics. > >> * > >>@@ -4989,6 +5015,7 @@ static const struct eth_dev_ops mlx4_dev_ops = { > >>.stats_reset = mlx4_stats_reset, > >>.queue_stats_mapping_set = NULL, > >>.dev_infos_get = mlx4_dev_infos_get, > >>+ .dev_ptypes_info_get = mlx4_dev_ptype_info_get, > >>.vlan_filter_set = mlx4_vlan_filter_set, > >>.vlan_tpid_set = NULL, > >>.vlan_strip_queue_set = NULL, > >>-- > >>2.1.4 > >> > -- Adrien Mazarguil 6WIND
[dpdk-dev] [PATCH] fix checkpatch errors
On 1/5/2016 6:22 PM, Mcnamara, John wrote: >> -Original Message- >> From: Tan, Jianfeng >> Sent: Tuesday, January 5, 2016 2:21 AM >> To: Xie, Huawei; dev at dpdk.org >> Cc: Mcnamara, John; Stephen Hemminger; Yuanhan Liu >> Subject: RE: [PATCH] fix checkpatch errors >> >> >> >>> -Original Message- >>> From: Xie, Huawei >>> Sent: Monday, January 4, 2016 9:52 AM >>> To: dev at dpdk.org >>> Cc: Mcnamara, John; Tan, Jianfeng; Xie, Huawei >>> Subject: [PATCH] fix checkpatch errors >>> >>> Signed-off-by: Huawei Xie >> ... >>> mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name)); >>> - return (rte_mempool_lookup((const char *)pool_name)); >>> + return rte_mempool_lookup((const char *)pool_name); >> Hi Huawei, >> >> Assume this patch is to solve below error (reported by checkpatch): >> ERROR: return is not a function, parentheses are not required >> >> So maybe above fix is not necessary? Involve more people to discuss. >> >> And please include the error message in the commit message. > Hi Huawei, > > The fix looks good and there was a similar patch applied previously for lib > (from Ferruh): > > 6307b909b8e0 ("lib: remove extra parenthesis after return") Oh yes, but no idea why Ferruh Yigit missed so many. I have greped the pattern, so this patch should fix almost all of them. > > However, the commit message could be better. Maybe something like the above: > "remove extra parentheses". OK. Weird that my commit message gets lost again. Will send a new one. > > John
[dpdk-dev] [PATCH] bnx2x: remove unused mbuf_alloc_size
The mbuf_alloc_size is leftover from BSD or some other code base. It is set but never used in DPDK driver. After that the related defines can also be eliminated. Signed-off-by: Stephen Hemminger --- drivers/net/bnx2x/bnx2x.c | 9 - drivers/net/bnx2x/bnx2x.h | 18 -- 2 files changed, 27 deletions(-) diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c index 67af5da..6ba6f44 100644 --- a/drivers/net/bnx2x/bnx2x.c +++ b/drivers/net/bnx2x/bnx2x.c @@ -2331,15 +2331,6 @@ static void bnx2x_set_fp_rx_buf_size(struct bnx2x_softc *sc) /* get the Rx buffer size for RX frames */ sc->fp[i].rx_buf_size = (IP_HEADER_ALIGNMENT_PADDING + ETH_OVERHEAD + sc->mtu); - - /* get the mbuf allocation size for RX frames */ - if (sc->fp[i].rx_buf_size <= MCLBYTES) { - sc->fp[i].mbuf_alloc_size = MCLBYTES; - } else if (sc->fp[i].rx_buf_size <= BNX2X_PAGE_SIZE) { - sc->fp[i].mbuf_alloc_size = PAGE_SIZE; - } else { - sc->fp[i].mbuf_alloc_size = MJUM9BYTES; - } } } diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h index 2abab0c..9682b8d 100644 --- a/drivers/net/bnx2x/bnx2x.h +++ b/drivers/net/bnx2x/bnx2x.h @@ -151,23 +151,6 @@ struct bnx2x_device_type { #define FW_PREFETCH_CNT 16U #define DROPLESS_FC_HEADROOM 100 -#ifndef MCLSHIFT -#define MCLSHIFT 11 -#endif -#define MCLBYTES (1 << MCLSHIFT) - -#if !defined(MJUMPAGESIZE) -#if BNX2X_PAGE_SIZE < 2048 -#define MJUMPAGESIZEMCLBYTES -#elif BNX2X_PAGE_SIZE <= 8192 -#define MJUMPAGESIZEBNX2X_PAGE_SIZE -#else -#define MJUMPAGESIZE(8 * 1024) -#endif -#endif -#define MJUM9BYTES (9 * 1024) -#define MJUM16BYTES (16 * 1024) - /* * Transmit Buffer Descriptor (tx_bd) definitions* */ @@ -402,7 +385,6 @@ struct bnx2x_fastpath { uint8_t fw_sb_id; /* status block number in FW */ uint32_t rx_buf_size; - int mbuf_alloc_size; int state; #define BNX2X_FP_STATE_CLOSED 0x01 -- 2.1.4
[dpdk-dev] [PATCH v3 1/3] librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET
Macros RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET are blocking the secondary process from using the APIs. API access should be given to both secondary and primary. Reported-by: Sean Harte Signed-off-by: Reshma Pattan --- v3: * Removed checkpatch fixes of lib/librte_ether/rte_ethdev.h from this patch lib/librte_ether/rte_ethdev.c | 50 + 1 files changed, 1 insertions(+), 49 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ed971b4..5849102 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -711,10 +711,6 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -741,10 +737,6 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -771,10 +763,6 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t tx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -801,10 +789,6 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -874,10 +858,6 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, struct rte_eth_dev_info dev_info; int diag; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); if (nb_rx_q > RTE_MAX_QUEUES_PER_PORT) { @@ -1059,10 +1039,6 @@ rte_eth_dev_start(uint8_t port_id) struct rte_eth_dev *dev; int diag; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1096,10 +1072,6 @@ rte_eth_dev_stop(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_RET(); - RTE_ETH_VALID_PORTID_OR_RET(port_id); dev = &rte_eth_devices[port_id]; @@ -1121,10 +1093,6 @@ rte_eth_dev_set_link_up(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1138,10 +1106,6 @@ rte_eth_dev_set_link_down(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1155,10 +1119,6 @@ rte_eth_dev_close(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_RET(); - RTE_ETH_VALID_PORTID_OR_RET(port_id); dev = &rte_eth_devices[port_id]; @@ -1183,10 +1143,6 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id, struct rte_eth_dev *dev; struct rte_eth_dev_info dev_info; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1266,10 +1222,6 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id, struct rte_eth_dev *dev; s
[dpdk-dev] [PATCH v3 3/3] librte_ether: fix rte_eth_dev_configure
User should be able to configure ethdev with zero rx/tx queues, but both should not be zero. After above change, rte_eth_dev_tx_queue_config, rte_eth_dev_rx_queue_config should allocate memory for rx/tx queues only when number of rx/tx queues are nonzero. Signed-off-by: Reshma Pattan --- lib/librte_ether/rte_ethdev.c | 36 1 files changed, 24 insertions(+), 12 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 5849102..a7647b6 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -673,7 +673,7 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) void **rxq; unsigned i; - if (dev->data->rx_queues == NULL) { /* first time configuration */ + if (dev->data->rx_queues == NULL && nb_queues != 0) { /* first time configuration */ dev->data->rx_queues = rte_zmalloc("ethdev->rx_queues", sizeof(dev->data->rx_queues[0]) * nb_queues, RTE_CACHE_LINE_SIZE); @@ -681,7 +681,7 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->nb_rx_queues = 0; return -(ENOMEM); } - } else { /* re-configure */ + } else if (dev->data->rx_queues != NULL && nb_queues != 0) { /* re-configure */ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, -ENOTSUP); rxq = dev->data->rx_queues; @@ -701,6 +701,13 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->rx_queues = rxq; + } else if (dev->data->rx_queues != NULL && nb_queues == 0) { + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, -ENOTSUP); + + rxq = dev->data->rx_queues; + + for (i = nb_queues; i < old_nb_queues; i++) + (*dev->dev_ops->rx_queue_release)(rxq[i]); } dev->data->nb_rx_queues = nb_queues; return 0; @@ -817,7 +824,7 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) void **txq; unsigned i; - if (dev->data->tx_queues == NULL) { /* first time configuration */ + if (dev->data->tx_queues == NULL && nb_queues != 0) { /* first time configuration */ dev->data->tx_queues = rte_zmalloc("ethdev->tx_queues", sizeof(dev->data->tx_queues[0]) * nb_queues, RTE_CACHE_LINE_SIZE); @@ -825,7 +832,7 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->nb_tx_queues = 0; return -(ENOMEM); } - } else { /* re-configure */ + } else if (dev->data->tx_queues != NULL && nb_queues != 0) { /* re-configure */ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, -ENOTSUP); txq = dev->data->tx_queues; @@ -845,6 +852,13 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->tx_queues = txq; + } else if (dev->data->tx_queues != NULL && nb_queues == 0) { + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, -ENOTSUP); + + txq = dev->data->tx_queues; + + for (i = nb_queues; i < old_nb_queues; i++) + (*dev->dev_ops->tx_queue_release)(txq[i]); } dev->data->nb_tx_queues = nb_queues; return 0; @@ -891,25 +905,23 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, * configured device. */ (*dev->dev_ops->dev_infos_get)(dev, &dev_info); + + if (nb_rx_q == 0 && nb_tx_q == 0) { + RTE_PMD_DEBUG_TRACE("ethdev port_id=%d both rx and tx queue cannot be 0\n", port_id); + return -EINVAL; + } + if (nb_rx_q > dev_info.max_rx_queues) { RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n", port_id, nb_rx_q, dev_info.max_rx_queues); return -EINVAL; } - if (nb_rx_q == 0) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_q == 0\n", port_id); - return -EINVAL; - } if (nb_tx_q > dev_info.max_tx_queues) { RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_queues=%d > %d\n", port_id, nb_tx_q, dev_info.max_tx_queues); return -EINVAL; } - if (nb_tx_q == 0) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_q == 0\n", port_id); - return -EINVAL; - } /* Copy the dev_conf parameter into the dev structure */ memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf)); -- 1.7.4
[dpdk-dev] [PATCH v3 0/3] fix RTE_PROC_PRIMARY_OR_ERR_RET RTE_PROC_PRIMARY_OR_RET
From: reshmapa Patches 1 and 2 removes RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET macro usage from rte_ether and rte_cryptodev libraries to allow API access to secondary process. Patch 3 allows users to configure ethdev with zero rx/tx queues, but both should not be zero. Fix rte_eth_dev_tx_queue_config, rte_eth_dev_rx_queue_config to allocate memory for rx/tx queues only when number of rx/tx queues are nonzero. v3: * Removed checkpatch fixes of lib/librte_ether/rte_ethdev.h from patch number 1. Reshma Pattan (3): librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET librte_cryptodev: remove RTE_PROC_PRIMARY_OR_RET librte_ether: fix rte_eth_dev_configure lib/librte_cryptodev/rte_cryptodev.c | 42 lib/librte_ether/rte_ethdev.c| 86 ++ 2 files changed, 25 insertions(+), 103 deletions(-) -- 1.7.4.1
[dpdk-dev] [PATCH v3 2/3] librte_cryptodev: remove RTE_PROC_PRIMARY_OR_RET
Macro RTE_PROC_PRIMARY_OR_ERR_RET blocking the secondary process from API usage. API access should be given to both secondary and primary. Signed-off-by: Reshma Pattan --- lib/librte_cryptodev/rte_cryptodev.c | 42 -- 1 files changed, 0 insertions(+), 42 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index f09f67e..207e92c 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -532,12 +532,6 @@ rte_cryptodev_queue_pair_start(uint8_t dev_id, uint16_t queue_pair_id) { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return -EINVAL; @@ -560,12 +554,6 @@ rte_cryptodev_queue_pair_stop(uint8_t dev_id, uint16_t queue_pair_id) { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return -EINVAL; @@ -593,12 +581,6 @@ rte_cryptodev_configure(uint8_t dev_id, struct rte_cryptodev_config *config) struct rte_cryptodev *dev; int diag; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return (-EINVAL); @@ -635,12 +617,6 @@ rte_cryptodev_start(uint8_t dev_id) CDEV_LOG_DEBUG("Start dev_id=%" PRIu8, dev_id); - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return (-EINVAL); @@ -670,12 +646,6 @@ rte_cryptodev_stop(uint8_t dev_id) { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_RET(); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return; @@ -701,12 +671,6 @@ rte_cryptodev_close(uint8_t dev_id) struct rte_cryptodev *dev; int retval; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-EINVAL); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return -1; @@ -747,12 +711,6 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id, { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return (-EINVAL); -- 1.7.4.1
[dpdk-dev] [PATCH 12/12] examples/l3fwd: add option to parse ptype
Hi Jianfeng, > > > > > > +static int > > > +check_packet_type_ok(int portid) > > > +{ > > > + int i; > > > + int ret; > > > + uint32_t ptypes[RTE_PTYPE_L3_MAX_NUM]; > > > + int ptype_l3_ipv4 = 0, ptype_l3_ipv6 = 0; > > > + > > > + ret = rte_eth_dev_get_ptype_info(portid, RTE_PTYPE_L3_MASK, > > ptypes); > > > + for (i = 0; i < ret; ++i) { > > > + if (ptypes[i] & RTE_PTYPE_L3_IPV4) > > > + ptype_l3_ipv4 = 1; > > > + if (ptypes[i] & RTE_PTYPE_L3_IPV6) > > > + ptype_l3_ipv6 = 1; > > > + } > > > + > > > + if (ptype_l3_ipv4 == 0) > > > + printf("port %d cannot parse RTE_PTYPE_L3_IPV4\n", portid); > > > + > > > + if (ptype_l3_ipv6 == 0) > > > + printf("port %d cannot parse RTE_PTYPE_L3_IPV6\n", portid); > > > + > > > + if (ptype_l3_ipv4 || ptype_l3_ipv6) > > > + return 1; Forgot one thing: I think it should be: if (ptype_l3_ipv4 && ptype_l3_ipv6) return 1; return 0; or just: return ptype_l3_ipv4 && ptype_l3_ipv6; Konstantin
[dpdk-dev] [PATCH 01/12] ethdev: add API to query what/if packet type is set
Hi Neilo, > -Original Message- > From: N?lio Laranjeiro [mailto:nelio.laranjeiro at 6wind.com] > Sent: Tuesday, January 05, 2016 4:14 PM > To: Tan, Jianfeng > Cc: Adrien Mazarguil; Ananyev, Konstantin; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 01/12] ethdev: add API to query what/if packet > type is set > > On Mon, Jan 04, 2016 at 02:36:14PM +, Ananyev, Konstantin wrote: > > > > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil > > > Sent: Monday, January 04, 2016 11:38 AM > > > To: Tan, Jianfeng > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH 01/12] ethdev: add API to query what/if > > > packet type is set > > > > > > I'm not sure about the usefulness of this new callback, but one issue I > > > see > > > with rte_eth_dev_get_ptype_info() is that determining the proper size for > > > ptypes[] according to a mask is awkward. For instance suppose > > > RTE_PTYPE_L4_MASK is redefined to a different size at some point, the > > > caller > > > must dynamically adjust its ptypes[] array size to avoid a possible > > > overflow, just in case. > > > > > > I suggest one of these solutions: > > > > > > - A callback to query for a single type at once instead (easiest method in > > > my opinion). > > > > > > - An additional argument with the number of entries in ptypes[], in which > > > case rte_eth_dev_get_ptype_info() should return the number of entries > > > that > > > would have been filled regardless, a bit like snprintf(). > > > > +1 for the second option. > > Also not sure you really need: RTE_PTYPE_*_MAX_NUM macros. > > Konstantin > > +1 for the second option. But see below. > > > > > > > On Thu, Dec 31, 2015 at 02:53:08PM +0800, Jianfeng Tan wrote: > > > > Add a new API rte_eth_dev_get_ptype_info to query what/if packet type > > > > will > > > > be set by current rx burst function. > > > > > > > > Signed-off-by: Jianfeng Tan > > > > --- > > > > lib/librte_ether/rte_ethdev.c | 12 > > > > lib/librte_ether/rte_ethdev.h | 22 ++ > > > > lib/librte_mbuf/rte_mbuf.h| 13 + > > > > 3 files changed, 47 insertions(+) > > > > > > > > diff --git a/lib/librte_ether/rte_ethdev.c > > > > b/lib/librte_ether/rte_ethdev.c > > > > index ed971b4..1885374 100644 > > > > --- a/lib/librte_ether/rte_ethdev.c > > > > +++ b/lib/librte_ether/rte_ethdev.c > > > > @@ -1614,6 +1614,18 @@ rte_eth_dev_info_get(uint8_t port_id, struct > > > > rte_eth_dev_info *dev_info) > > > > dev_info->driver_name = dev->data->drv_name; > > > > } > > > > > > > > +int > > > > +rte_eth_dev_get_ptype_info(uint8_t port_id, uint32_t ptype_mask, > > > > + uint32_t ptypes[]) > > > > +{ > > > > + struct rte_eth_dev *dev; > > > > + > > > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); > > > > + dev = &rte_eth_devices[port_id]; > > > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_ptype_info_get, > > > > -ENOTSUP); > > > > + return (*dev->dev_ops->dev_ptype_info_get)(dev, ptype_mask, > > > > ptypes); > > > > +} > > > > + > > > > void > > > > rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr) > > > > { > > > > diff --git a/lib/librte_ether/rte_ethdev.h > > > > b/lib/librte_ether/rte_ethdev.h > > > > index bada8ad..e97b632 100644 > > > > --- a/lib/librte_ether/rte_ethdev.h > > > > +++ b/lib/librte_ether/rte_ethdev.h > > > > @@ -1021,6 +1021,10 @@ typedef void (*eth_dev_infos_get_t)(struct > > > > rte_eth_dev *dev, > > > > struct rte_eth_dev_info *dev_info); > > > > /**< @internal Get specific informations of an Ethernet device. */ > > > > > > > > +typedef int (*eth_dev_ptype_info_get_t)(struct rte_eth_dev *dev, > > > > + uint32_t ptype_mask, uint32_t ptypes[]); > > > > +/**< @internal Get ptype info of eth_rx_burst_t. */ > > > > + > > > > typedef int (*eth_queue_start_t)(struct rte_eth_dev *dev, > > > > uint16_t queue_id); > > > > /**< @internal Start rx and tx of a queue of an Ethernet device. */ > > > > @@ -1347,6 +1351,7 @@ struct eth_dev_ops { > > > > eth_queue_stats_mapping_set_t queue_stats_mapping_set; > > > > /**< Configure per queue stat counter mapping. */ > > > > eth_dev_infos_get_tdev_infos_get; /**< Get device info. > > > > */ > > > > + eth_dev_ptype_info_get_t dev_ptype_info_get; /** Get ptype > > > > info */ > > > > mtu_set_t mtu_set; /**< Set MTU. */ > > > > vlan_filter_set_t vlan_filter_set; /**< Filter VLAN > > > > Setup. */ > > > > vlan_tpid_set_tvlan_tpid_set; /**< Outer VLAN > > > > TPID Setup. */ > > > > @@ -2273,6 +2278,23 @@ extern void rte_eth_dev_info_get(uint8_t port_id, > > > > struct rte_eth_dev_info *dev_info); > > > > > > > > /** > > > > + * Retrieve the contextual
[dpdk-dev] [PATCH] bnx2x: remove unused mbuf_alloc_size
> >The mbuf_alloc_size is leftover from BSD or some other code base. >It is set but never used in DPDK driver. After that the related defines >can also be eliminated. > >Signed-off-by: Stephen Hemminger >--- > drivers/net/bnx2x/bnx2x.c | 9 - > drivers/net/bnx2x/bnx2x.h | 18 -- > 2 files changed, 27 deletions(-) > >diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c >index 67af5da..6ba6f44 100644 >--- a/drivers/net/bnx2x/bnx2x.c >+++ b/drivers/net/bnx2x/bnx2x.c >@@ -2331,15 +2331,6 @@ static void bnx2x_set_fp_rx_buf_size(struct >bnx2x_softc *sc) > /* get the Rx buffer size for RX frames */ > sc->fp[i].rx_buf_size = > (IP_HEADER_ALIGNMENT_PADDING + ETH_OVERHEAD + sc->mtu); >- >- /* get the mbuf allocation size for RX frames */ >- if (sc->fp[i].rx_buf_size <= MCLBYTES) { >- sc->fp[i].mbuf_alloc_size = MCLBYTES; >- } else if (sc->fp[i].rx_buf_size <= BNX2X_PAGE_SIZE) { >- sc->fp[i].mbuf_alloc_size = PAGE_SIZE; >- } else { >- sc->fp[i].mbuf_alloc_size = MJUM9BYTES; >- } > } > } > >diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h >index 2abab0c..9682b8d 100644 >--- a/drivers/net/bnx2x/bnx2x.h >+++ b/drivers/net/bnx2x/bnx2x.h >@@ -151,23 +151,6 @@ struct bnx2x_device_type { > #define FW_PREFETCH_CNT 16U > #define DROPLESS_FC_HEADROOM 100 > >-#ifndef MCLSHIFT >-#define MCLSHIFT 11 >-#endif >-#define MCLBYTES (1 << MCLSHIFT) >- >-#if !defined(MJUMPAGESIZE) >-#if BNX2X_PAGE_SIZE < 2048 >-#define MJUMPAGESIZEMCLBYTES >-#elif BNX2X_PAGE_SIZE <= 8192 >-#define MJUMPAGESIZEBNX2X_PAGE_SIZE >-#else >-#define MJUMPAGESIZE(8 * 1024) >-#endif >-#endif >-#define MJUM9BYTES (9 * 1024) >-#define MJUM16BYTES (16 * 1024) >- > /* > * Transmit Buffer Descriptor (tx_bd) definitions* > */ >@@ -402,7 +385,6 @@ struct bnx2x_fastpath { > uint8_t fw_sb_id; /* status block number in FW */ > > uint32_t rx_buf_size; >- int mbuf_alloc_size; > > int state; > #define BNX2X_FP_STATE_CLOSED 0x01 >-- >2.1.4 > > Acked-by: Harish Patil Thanks, Harish This message and any attached documents contain information from the sending company or its parent company(s), subsidiaries, divisions or branch offices that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
[dpdk-dev] [PATCH 1/2] mlx4: add callback to set primary mac address
Signed-off-by: David Marchand --- drivers/net/mlx4/mlx4.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index 207bfe2..acc76d7 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -4432,6 +4432,22 @@ end: } /** + * DPDK callback to set the primary MAC address. + * + * @param dev + * Pointer to Ethernet device structure. + * @param mac_addr + * MAC address to register. + */ +static void +mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr) +{ + DEBUG("%p: setting primary MAC address", (void *)dev); + mlx4_mac_addr_remove(dev, 0); + mlx4_mac_addr_add(dev, mac_addr, 0, 0); +} + +/** * DPDK callback to enable promiscuous mode. * * @param dev @@ -5004,6 +5020,7 @@ static const struct eth_dev_ops mlx4_dev_ops = { .priority_flow_ctrl_set = NULL, .mac_addr_remove = mlx4_mac_addr_remove, .mac_addr_add = mlx4_mac_addr_add, + .mac_addr_set = mlx4_mac_addr_set, .mtu_set = mlx4_dev_set_mtu, .udp_tunnel_add = NULL, .udp_tunnel_del = NULL, -- 1.7.10.4
[dpdk-dev] [PATCH 2/2] mlx5: add callback to set primary mac address
Signed-off-by: David Marchand --- drivers/net/mlx5/mlx5.c |1 + drivers/net/mlx5/mlx5.h |1 + drivers/net/mlx5/mlx5_mac.c | 16 3 files changed, 18 insertions(+) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 821ee0f..30d88b5 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -162,6 +162,7 @@ static const struct eth_dev_ops mlx5_dev_ops = { .flow_ctrl_set = mlx5_dev_set_flow_ctrl, .mac_addr_remove = mlx5_mac_addr_remove, .mac_addr_add = mlx5_mac_addr_add, + .mac_addr_set = mlx5_mac_addr_set, .mtu_set = mlx5_dev_set_mtu, .reta_update = mlx5_dev_rss_reta_update, .reta_query = mlx5_dev_rss_reta_query, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index b84d31d..2f9a594 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -179,6 +179,7 @@ int priv_mac_addr_add(struct priv *, unsigned int, int priv_mac_addrs_enable(struct priv *); void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t, uint32_t); +void mlx5_mac_addr_set(struct rte_eth_dev *, struct ether_addr *); /* mlx5_rss.c */ diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c index e37ce06..b1f34d9 100644 --- a/drivers/net/mlx5/mlx5_mac.c +++ b/drivers/net/mlx5/mlx5_mac.c @@ -488,3 +488,19 @@ mlx5_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr, end: priv_unlock(priv); } + +/** + * DPDK callback to set primary MAC address. + * + * @param dev + * Pointer to Ethernet device structure. + * @param mac_addr + * MAC address to register. + */ +void +mlx5_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr) +{ + DEBUG("%p: setting primary MAC address", (void *)dev); + mlx5_mac_addr_remove(dev, 0); + mlx5_mac_addr_add(dev, mac_addr, 0, 0); +} -- 1.7.10.4
[dpdk-dev] time to kill rte_pci_dev_ids.h
Has anyone looked at getting rid of rte_pci_dev_ids.h? The current method with #ifdef's and putting all devices in one file really doesn't scale well. Something more like other OS's where the data is only in each device driver would be better.
[dpdk-dev] [PATCH] vhost: fix leak of fds and mmaps
The common vhost code only supported a single mmap per device. vhost-user worked around this by saving the address/length/fd of each mmap after the end of the rte_virtio_memory struct. This only works if the vhost-user code frees dev->mem, since the common code is unaware of the extra info. The VHOST_USER_RESET_OWNER message is one situation where the common code frees dev->mem and leaks the fds and mappings. This happens every time I shut down a VM. The new code does not keep the fds around since they aren't required for munmap. It saves the address/length in a new structure which is read by the common code. The vhost-cuse changes are only compile tested. Signed-off-by: Rich Lane --- lib/librte_vhost/rte_virtio_net.h | 14 +++-- lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 24 --- lib/librte_vhost/vhost_user/virtio-net-user.c | 90 --- lib/librte_vhost/virtio-net.c | 24 ++- lib/librte_vhost/virtio-net.h | 3 + 5 files changed, 75 insertions(+), 80 deletions(-) diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index 10dcb90..5233879 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -144,16 +144,22 @@ struct virtio_memory_regions { uint64_taddress_offset; /**< Offset of region for address translation. */ }; +/** + * Record a memory mapping so that it can be munmap'd later. + */ +struct virtio_memory_mapping { + void *addr; + size_t length; +}; /** * Memory structure includes region and mapping information. */ struct virtio_memory { - uint64_tbase_address; /**< Base QEMU userspace address of the memory file. */ - uint64_tmapped_address; /**< Mapped address of memory file base in our applications memory space. */ - uint64_tmapped_size;/**< Total size of memory file. */ uint32_tnregions; /**< Number of memory regions. */ - struct virtio_memory_regions regions[0]; /**< Memory region information. */ + uint32_tnmappings; /**< Number of memory mappings */ + struct virtio_memory_regionsregions[VHOST_MEMORY_MAX_NREGIONS]; /**< Memory region information. */ + struct virtio_memory_mappingmappings[VHOST_MEMORY_MAX_NREGIONS]; /**< Memory mappings */ }; /** diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c index ae2c3fa..1cd0c52 100644 --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c @@ -278,15 +278,20 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, if (dev == NULL) return -1; - if (dev->mem && dev->mem->mapped_address) { - munmap((void *)(uintptr_t)dev->mem->mapped_address, - (size_t)dev->mem->mapped_size); - free(dev->mem); + if (nregions > VHOST_MEMORY_MAX_NREGIONS) { + RTE_LOG(ERR, VHOST_CONFIG, + "(%"PRIu64") Too many memory regions (%u, max %u)\n", + dev->device_fh, nregions, + VHOST_MEMORY_MAX_NREGIONS); + return -1; + } + + if (dev->mem) { + rte_vhost_free_mem(dev->mem); dev->mem = NULL; } - dev->mem = calloc(1, sizeof(struct virtio_memory) + - sizeof(struct virtio_memory_regions) * nregions); + dev->mem = calloc(1, sizeof(*dev->mem)); if (dev->mem == NULL) { RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev->mem\n", @@ -325,9 +330,10 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, dev->mem = NULL; return -1; } - dev->mem->mapped_address = mapped_address; - dev->mem->base_address = base_address; - dev->mem->mapped_size = mapped_size; + + rte_vhost_add_mapping(dev->mem, + (void *)(uintptr_t)mapped_address, + mapped_size); } } diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index 2934d1c..492927a 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -48,18 +48,6 @@ #include "vhost-net-user.h" #include "vhost-net.h" -struct orig_region_map { - int fd; - uint64_t mapped_address; - uint64_t mapped_size; - uint64_t blksz; -}; - -#define orig_region(ptr, nregions) \ - ((struct orig_region_map *)RTE_PTR_ADD((ptr), \ - sizeof(struct virtio_memory) + \ - sizeof(struct virtio_memory_regions) * (nregions))) - static uint6
[dpdk-dev] [PATCH v2 1/4] vmxnet3: restore tx data ring support
On 1/4/16, 9:16 PM, "Stephen Hemminger" wrote: >On Mon, 4 Jan 2016 18:28:16 -0800 >Yong Wang wrote: > >> Tx data ring support was removed in a previous change >> to add multi-seg transmit. This change adds it back. >> >> Fixes: 7ba5de417e3c ("vmxnet3: support multi-segment transmit") >> >> Signed-off-by: Yong Wang > >Do you have any numbers to confirm this?
[dpdk-dev] [PATCH v2 3/4] vmxnet3: add TSO support
On 1/4/16, 9:14 PM, "Stephen Hemminger" wrote: >On Mon, 4 Jan 2016 18:28:18 -0800 >Yong Wang wrote: > >> +mbuf = txq->cmd_ring.buf_info[eop_idx].m; >> +if (unlikely(mbuf == NULL)) >> +rte_panic("EOP desc does not point to a valid mbuf"); >> +else > >The unlikely is really not needed with rte_panic since it is declared >with cold attribute which has same effect. > >Else is unnecessary because rte_panic never returns. Done.
[dpdk-dev] [PATCH v2 3/4] vmxnet3: add TSO support
On 1/4/16, 9:15 PM, "Stephen Hemminger" wrote: >On Mon, 4 Jan 2016 18:28:18 -0800 >Yong Wang wrote: > >> +/* The number of descriptors that are needed for a packet. */ >> +static unsigned >> +txd_estimate(const struct rte_mbuf *m) >> +{ >> +return m->nb_segs; >> +} >> + > >A wrapper function only really clarifies if it is hiding some information. >Why not just code this in place? Sure and removed.
[dpdk-dev] [PATCH v3 4/4] vmxnet3: announce device offload capability
Signed-off-by: Yong Wang --- drivers/net/vmxnet3/vmxnet3_ethdev.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c index c363bf6..8a40127 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c @@ -693,7 +693,8 @@ vmxnet3_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) } static void -vmxnet3_dev_info_get(__attribute__((unused))struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) +vmxnet3_dev_info_get(__attribute__((unused))struct rte_eth_dev *dev, +struct rte_eth_dev_info *dev_info) { dev_info->max_rx_queues = VMXNET3_MAX_RX_QUEUES; dev_info->max_tx_queues = VMXNET3_MAX_TX_QUEUES; @@ -716,6 +717,17 @@ vmxnet3_dev_info_get(__attribute__((unused))struct rte_eth_dev *dev, struct rte_ .nb_min = VMXNET3_DEF_TX_RING_SIZE, .nb_align = 1, }; + + dev_info->rx_offload_capa = + DEV_RX_OFFLOAD_VLAN_STRIP | + DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM; + + dev_info->tx_offload_capa = + DEV_TX_OFFLOAD_VLAN_INSERT | + DEV_TX_OFFLOAD_TCP_CKSUM | + DEV_TX_OFFLOAD_UDP_CKSUM | + DEV_TX_OFFLOAD_TCP_TSO; } /* return 0 means link status changed, -1 means not changed */ @@ -819,7 +831,7 @@ vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vid, int on) else VMXNET3_CLEAR_VFTABLE_ENTRY(hw->shadow_vfta, vid); - /* don't change active filter if in promiscious mode */ + /* don't change active filter if in promiscuous mode */ if (rxConf->rxMode & VMXNET3_RXM_PROMISC) return 0; -- 1.9.1
[dpdk-dev] [PATCH v3 2/4] vmxnet3: add tx l4 cksum offload
Support TCP/UDP checksum offload. Signed-off-by: Yong Wang --- doc/guides/rel_notes/release_2_3.rst | 3 +++ drivers/net/vmxnet3/vmxnet3_rxtx.c | 39 +++- 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index a23c8ac..58205fe 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -20,6 +20,9 @@ Drivers Tx data ring has been shown to improve small pkt forwarding performance on vSphere environment. +* **vmxnet3: add tx l4 cksum offload.** + + Support TCP/UDP checksum offload. Libraries ~ diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index 2202d31..08e6115 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -332,6 +332,8 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_tx; vmxnet3_tx_queue_t *txq = tx_queue; struct vmxnet3_hw *hw = txq->hw; + Vmxnet3_TxQueueCtrl *txq_ctrl = &txq->shared->ctrl; + uint32_t deferred = rte_le_to_cpu_32(txq_ctrl->txNumDeferred); if (unlikely(txq->stopped)) { PMD_TX_LOG(DEBUG, "Tx queue is stopped."); @@ -413,21 +415,40 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, gdesc->txd.tci = txm->vlan_tci; } - /* TODO: Add transmit checksum offload here */ + if (txm->ol_flags & PKT_TX_L4_MASK) { + gdesc->txd.om = VMXNET3_OM_CSUM; + gdesc->txd.hlen = txm->l2_len + txm->l3_len; + + switch (txm->ol_flags & PKT_TX_L4_MASK) { + case PKT_TX_TCP_CKSUM: + gdesc->txd.msscof = gdesc->txd.hlen + offsetof(struct tcp_hdr, cksum); + break; + case PKT_TX_UDP_CKSUM: + gdesc->txd.msscof = gdesc->txd.hlen + offsetof(struct udp_hdr, dgram_cksum); + break; + default: + PMD_TX_LOG(WARNING, "requested cksum offload not supported %#llx", + txm->ol_flags & PKT_TX_L4_MASK); + abort(); + } + } else { + gdesc->txd.hlen = 0; + gdesc->txd.om = VMXNET3_OM_NONE; + gdesc->txd.msscof = 0; + } + + txq_ctrl->txNumDeferred = rte_cpu_to_le_32(++deferred); /* flip the GEN bit on the SOP */ rte_compiler_barrier(); gdesc->dword[2] ^= VMXNET3_TXD_GEN; - - txq->shared->ctrl.txNumDeferred++; nb_tx++; } - PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", txq->shared->ctrl.txThreshold); - - if (txq->shared->ctrl.txNumDeferred >= txq->shared->ctrl.txThreshold) { + PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", rte_le_to_cpu_32(txq_ctrl->txThreshold)); - txq->shared->ctrl.txNumDeferred = 0; + if (deferred >= rte_le_to_cpu_32(txq_ctrl->txThreshold)) { + txq_ctrl->txNumDeferred = 0; /* Notify vSwitch that packets are available. */ VMXNET3_WRITE_BAR0_REG(hw, (VMXNET3_REG_TXPROD + txq->queue_id * VMXNET3_REG_ALIGN), txq->cmd_ring.next2fill); @@ -728,8 +749,8 @@ vmxnet3_dev_tx_queue_setup(struct rte_eth_dev *dev, PMD_INIT_FUNC_TRACE(); if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS) != - ETH_TXQ_FLAGS_NOXSUMS) { - PMD_INIT_LOG(ERR, "TX no support for checksum offload yet"); + ETH_TXQ_FLAGS_NOXSUMSCTP) { + PMD_INIT_LOG(ERR, "SCTP checksum offload not supported"); return -EINVAL; } -- 1.9.1
[dpdk-dev] [PATCH v3 0/4] vmxnet3 TSO and tx cksum offload
v3: * fixed comments from Stephen * added performance number for tx data ring v2: * fixed some logging issues when debug option turned on * updated the txq_flags check in vmxnet3_dev_tx_queue_setup() This patchset adds TCP/UDP checksum offload and TSO to vmxnet3 PMD. One of the use cases for these features is to support STT. It also restores the tx data ring feature that was removed from a previous patch. Yong Wang (4): vmxnet3: restore tx data ring support vmxnet3: add tx l4 cksum offload vmxnet3: add TSO support vmxnet3: announce device offload capability doc/guides/rel_notes/release_2_3.rst | 11 +++ drivers/net/vmxnet3/vmxnet3_ethdev.c | 16 +++- drivers/net/vmxnet3/vmxnet3_ring.h | 13 --- drivers/net/vmxnet3/vmxnet3_rxtx.c | 162 +++ 4 files changed, 151 insertions(+), 51 deletions(-) -- 1.9.1
[dpdk-dev] [PATCH v3 3/4] vmxnet3: add TSO support
This commit adds vmxnet3 TSO support. Verified with test-pmd (set fwd csum) that both tso and non-tso pkts can be successfully transmitted and all segmentes for a tso pkt are correct on the receiver side. Signed-off-by: Yong Wang --- doc/guides/rel_notes/release_2_3.rst | 3 + drivers/net/vmxnet3/vmxnet3_ring.h | 13 - drivers/net/vmxnet3/vmxnet3_rxtx.c | 110 ++- 3 files changed, 85 insertions(+), 41 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 58205fe..ae487bb 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -24,6 +24,9 @@ Drivers Support TCP/UDP checksum offload. +* **vmxnet3: add TSO support.** + + Libraries ~ diff --git a/drivers/net/vmxnet3/vmxnet3_ring.h b/drivers/net/vmxnet3/vmxnet3_ring.h index 612487e..15b19e1 100644 --- a/drivers/net/vmxnet3/vmxnet3_ring.h +++ b/drivers/net/vmxnet3/vmxnet3_ring.h @@ -130,18 +130,6 @@ struct vmxnet3_txq_stats { uint64_ttx_ring_full; }; -typedef struct vmxnet3_tx_ctx { - int ip_type; - bool is_vlan; - bool is_cso; - - uint16_t evl_tag; /* only valid when is_vlan == TRUE */ - uint32_t eth_hdr_size; /* only valid for pkts requesting tso or csum -* offloading */ - uint32_t ip_hdr_size; - uint32_t l4_hdr_size; -} vmxnet3_tx_ctx_t; - typedef struct vmxnet3_tx_queue { struct vmxnet3_hw*hw; struct vmxnet3_cmd_ring cmd_ring; @@ -155,7 +143,6 @@ typedef struct vmxnet3_tx_queue { uint8_t port_id; /**< Device port identifier. */ } vmxnet3_tx_queue_t; - struct vmxnet3_rxq_stats { uint64_t drop_total; uint64_t drop_err; diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index 08e6115..fc879ee 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -295,27 +295,45 @@ vmxnet3_dev_clear_queues(struct rte_eth_dev *dev) } } +static int +vmxnet3_unmap_pkt(uint16_t eop_idx, vmxnet3_tx_queue_t *txq) +{ + int completed = 0; + struct rte_mbuf *mbuf; + + /* Release cmd_ring descriptor and free mbuf */ + VMXNET3_ASSERT(txq->cmd_ring.base[eop_idx].txd.eop == 1); + + mbuf = txq->cmd_ring.buf_info[eop_idx].m; + if (mbuf == NULL) + rte_panic("EOP desc does not point to a valid mbuf"); + rte_pktmbuf_free(mbuf); + + txq->cmd_ring.buf_info[eop_idx].m = NULL; + + while (txq->cmd_ring.next2comp != eop_idx) { + /* no out-of-order completion */ + VMXNET3_ASSERT(txq->cmd_ring.base[txq->cmd_ring.next2comp].txd.cq == 0); + vmxnet3_cmd_ring_adv_next2comp(&txq->cmd_ring); + completed++; + } + + /* Mark the txd for which tcd was generated as completed */ + vmxnet3_cmd_ring_adv_next2comp(&txq->cmd_ring); + + return completed + 1; +} + static void vmxnet3_tq_tx_complete(vmxnet3_tx_queue_t *txq) { int completed = 0; - struct rte_mbuf *mbuf; vmxnet3_comp_ring_t *comp_ring = &txq->comp_ring; struct Vmxnet3_TxCompDesc *tcd = (struct Vmxnet3_TxCompDesc *) (comp_ring->base + comp_ring->next2proc); while (tcd->gen == comp_ring->gen) { - /* Release cmd_ring descriptor and free mbuf */ - VMXNET3_ASSERT(txq->cmd_ring.base[tcd->txdIdx].txd.eop == 1); - while (txq->cmd_ring.next2comp != tcd->txdIdx) { - mbuf = txq->cmd_ring.buf_info[txq->cmd_ring.next2comp].m; - txq->cmd_ring.buf_info[txq->cmd_ring.next2comp].m = NULL; - rte_pktmbuf_free_seg(mbuf); - - /* Mark the txd for which tcd was generated as completed */ - vmxnet3_cmd_ring_adv_next2comp(&txq->cmd_ring); - completed++; - } + completed += vmxnet3_unmap_pkt(tcd->txdIdx, txq); vmxnet3_comp_ring_adv_next2proc(comp_ring); tcd = (struct Vmxnet3_TxCompDesc *)(comp_ring->base + @@ -351,21 +369,43 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, struct rte_mbuf *txm = tx_pkts[nb_tx]; struct rte_mbuf *m_seg = txm; int copy_size = 0; + bool tso = (txm->ol_flags & PKT_TX_TCP_SEG) != 0; + /* # of descriptors needed for a packet. */ + unsigned count = txm->nb_segs; - /* Is this packet execessively fragmented, then drop */ - if (unlikely(txm->nb_segs > VMXNET3_MAX_TXD_PER_PKT)) { - ++txq->stats.drop_too_many_segs; - ++txq->stats.drop_total; +
[dpdk-dev] [PATCH v3 1/4] vmxnet3: restore tx data ring support
Tx data ring support was removed in a previous change to add multi-seg transmit. This change adds it back. According to the original commit (2e849373), 64B pkt rate with l2fwd improved by ~20% on an Ivy Bridge server at which point we start to hit some bottleneck on the rx side. I also re-did the same test on a different setup (Haswell processor, ~2.3GHz clock rate) on top of the master and still observed ~17% performance gains. Fixes: 7ba5de417e3c ("vmxnet3: support multi-segment transmit") Signed-off-by: Yong Wang --- doc/guides/rel_notes/release_2_3.rst | 5 + drivers/net/vmxnet3/vmxnet3_rxtx.c | 17 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..a23c8ac 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -15,6 +15,11 @@ EAL Drivers ~~~ +* **vmxnet3: restore tx data ring.** + + Tx data ring has been shown to improve small pkt forwarding performance + on vSphere environment. + Libraries ~ diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index 4de5d89..2202d31 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -348,6 +348,7 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint32_t first2fill, avail, dw2; struct rte_mbuf *txm = tx_pkts[nb_tx]; struct rte_mbuf *m_seg = txm; + int copy_size = 0; /* Is this packet execessively fragmented, then drop */ if (unlikely(txm->nb_segs > VMXNET3_MAX_TXD_PER_PKT)) { @@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, break; } + if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) { + struct Vmxnet3_TxDataDesc *tdd; + + tdd = txq->data_ring.base + txq->cmd_ring.next2fill; + copy_size = rte_pktmbuf_pkt_len(txm); + rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), copy_size); + } + /* use the previous gen bit for the SOP desc */ dw2 = (txq->cmd_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT; first2fill = txq->cmd_ring.next2fill; @@ -377,7 +386,13 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, transmit buffer size (16K) is greater than maximum sizeof mbuf segment size. */ gdesc = txq->cmd_ring.base + txq->cmd_ring.next2fill; - gdesc->txd.addr = RTE_MBUF_DATA_DMA_ADDR(m_seg); + if (copy_size) + gdesc->txd.addr = rte_cpu_to_le_64(txq->data_ring.basePA + + txq->cmd_ring.next2fill * + sizeof(struct Vmxnet3_TxDataDesc)); + else + gdesc->txd.addr = RTE_MBUF_DATA_DMA_ADDR(m_seg); + gdesc->dword[2] = dw2 | m_seg->data_len; gdesc->dword[3] = 0; -- 1.9.1
[dpdk-dev] [PATCH v3 1/4] vmxnet3: restore tx data ring support
On Tue, 5 Jan 2016 16:12:55 -0800 Yong Wang wrote: > @@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf > **tx_pkts, > break; > } > > + if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) { > + struct Vmxnet3_TxDataDesc *tdd; > + > + tdd = txq->data_ring.base + txq->cmd_ring.next2fill; > + copy_size = rte_pktmbuf_pkt_len(txm); > + rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), > copy_size); > + } Good idea to use a local region which optmizes the copy in the host, but this implementation needs to be more general. As written it is broken for multi-segment packets. A multi-segment packet will have a pktlen >= datalen as in: m -> mb_segs=3, pktlen=1200, datalen=200 -> datalen=900 -> datalen=100 There are two ways to fix this. You could test for nb_segs == 1 or better yet. Optimize each segment it might be that the first segment (or tail segment) would fit in the available data area.
[dpdk-dev] [PATCH v3 2/4] vmxnet3: add tx l4 cksum offload
On Tue, 5 Jan 2016 16:12:56 -0800 Yong Wang wrote: > - if (txq->shared->ctrl.txNumDeferred >= txq->shared->ctrl.txThreshold) { > + PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", > rte_le_to_cpu_32(txq_ctrl->txThreshold)); For bisection, it would be good to split the byte-order fixes from the offload changes; in other words make them different commits.
[dpdk-dev] [PATCH v3 4/4] vmxnet3: announce device offload capability
On Tue, 5 Jan 2016 16:12:58 -0800 Yong Wang wrote: > > /* return 0 means link status changed, -1 means not changed */ > @@ -819,7 +831,7 @@ vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev, > uint16_t vid, int on) > else > VMXNET3_CLEAR_VFTABLE_ENTRY(hw->shadow_vfta, vid); > > - /* don't change active filter if in promiscious mode */ > + /* don't change active filter if in promiscuous mode */ Maybe send a first patch in series with these message and comment cleanups? Makes the review easier, and aides bisection.