[dpdk-dev] [PATCH 0/4] virtio support for container
Hello, On 1/12/2016 11:11 PM, Amit Tomer wrote: > Hello, > >> In vhost-switch, it judges if a virtio device is ready for processing after >> receiving >> a pkt from virtio device. So you'd better construct a pkt, and send it out >> firstly >> in l2fwd. > I tried to ping the socket interface from host for the same purpose > but it didn't work. > > Could you please suggest some other approach for achieving same(how > pkt can be sent out to l2fwd)? > > Also, before trying this, I have verified that vhost-switch is working > ok with testpmd . > > Thanks, > Amit. You can use below patch for l2fwd to send out an arp packet when it gets started. diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c index 720fd5a..572b1ac 100644 --- a/examples/l2fwd/main.c +++ b/examples/l2fwd/main.c @@ -69,6 +69,8 @@ #include #include +#define SEND_ARP + #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1 #define NB_MBUF 8192 @@ -185,6 +187,53 @@ print_stats(void) printf("\n\n"); } +#ifdef SEND_ARP +static void +dpdk_send_arp(int portid, struct rte_mempool *mp) +{ +/* + * len = 14 + 46 + * ARP, Request who-has 10.0.0.1 tell 10.0.0.2, length 46 + */ +static const uint8_t arp_request[] = { +/*0x:*/ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xec, 0xa8, 0x6b, 0xfd, 0x02, 0x29, 0x08, 0x06, 0x00, 0x01, +/*0x0010:*/ 0x08, 0x00, 0x06, 0x04, 0x00, 0x01, 0xec, 0xa8, 0x6b, 0xfd, 0x02, 0x29, 0x0a, 0x00, 0x00, 0x01, +/*0x0020:*/ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +/*0x0030:*/ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +int ret; +struct rte_mbuf *m; +struct ether_addr mac_addr; +int pkt_len = sizeof(arp_request) - 1; + +m = rte_pktmbuf_alloc(mp); + +memcpy((void *)((uint64_t)m->buf_addr + m->data_off), arp_request, pkt_len); +rte_pktmbuf_pkt_len(m) = pkt_len; +rte_pktmbuf_data_len(m) = pkt_len; + +rte_eth_macaddr_get(portid, &mac_addr); +memcpy((void *)((uint64_t)m->buf_addr + m->data_off + 6), &mac_addr, 6); + +ret = rte_eth_tx_burst(portid, 0, &m, 1); +if (ret == 1) { +printf("arp sent: ok\n"); +printf("%02x:%02x:%02x:%02x:%02x:%02x\n", +mac_addr.addr_bytes[0], +mac_addr.addr_bytes[1], +mac_addr.addr_bytes[2], +mac_addr.addr_bytes[3], +mac_addr.addr_bytes[4], +mac_addr.addr_bytes[5]); +} else { +printf("arp sent: fail\n"); +} + +rte_pktmbuf_free(m); +} +#endif + + /* Send the burst of packets on an output interface */ static int l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port) @@ -281,6 +330,9 @@ l2fwd_main_loop(void) portid = qconf->rx_port_list[i]; RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id, portid); +#ifdef SEND_ARP +dpdk_send_arp(portid, l2fwd_pktmbuf_pool); +#endif } while (1) {
[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir
> > diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h > index ce224ad..5cc22a0 100644 > --- a/lib/librte_ether/rte_eth_ctrl.h > +++ b/lib/librte_ether/rte_eth_ctrl.h > @@ -74,7 +74,11 @@ extern "C" { > #define RTE_ETH_FLOW_IPV6_EX15 > #define RTE_ETH_FLOW_IPV6_TCP_EX16 > #define RTE_ETH_FLOW_IPV6_UDP_EX17 > -#define RTE_ETH_FLOW_MAX18 > +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 #define > +RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 #define > +RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 #define > +RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21 > +#define RTE_ETH_FLOW_MAX22 > How to distinguish RTE_ETH_FLOW_PKT_FILTER_IPV4_XX with RTE_ETH_FLOW_NONFRAG_IPV4_XX, what is the difference? > /** > * Feature filter types > @@ -407,6 +411,9 @@ struct rte_eth_l2_flow { struct rte_eth_ipv4_flow { > uint32_t src_ip; /**< IPv4 source address to match. */ > uint32_t dst_ip; /**< IPv4 destination address to match. */ > + uint8_t tos; /**< IPV4 type of service to match. */ > + uint8_t proto;/**< IPV4 proto to match. */ > + uint8_t ttl; /**< IPV4 time to live to match. */ > }; > > /** > @@ -443,6 +450,10 @@ struct rte_eth_sctpv4_flow { struct > rte_eth_ipv6_flow { > uint32_t src_ip[4]; /**< IPv6 source address to match. */ > uint32_t dst_ip[4]; /**< IPv6 destination address to match. */ > + uint8_t tc; /**< IPv6 traffic class to match. */ > + uint32_t flow_label; /**< IPv6 flow label to match. */ > + uint8_t next_header;/**< IPv6 next header to match. */ > + uint8_t hop_limit; /**< IPv6 hop limits to match. */ > }; > There is also a patch http://dpdk.org/dev/patchwork/patch/9661/ which added these fields. Maybe we can merge them together. > +struct rte_eth_pkt_filter_flow { > + enum rte_eth_pkt_filter_type type; /**< Type of filter */ > + enum rte_eth_pkt_filter_type prio; > + /**< Prioritize the filter type when a packet matches several types */ > + struct rte_eth_pkt_filter pkt; /**< Packet fields to match. */ > + struct rte_eth_pkt_filter mask; /**< Mask for matched fields. */ > +}; > + > +/** > * An union contains the inputs for all types of flow > */ > union rte_eth_fdir_flow { > @@ -514,6 +570,7 @@ union rte_eth_fdir_flow { > struct rte_eth_ipv6_flow ipv6_flow; > struct rte_eth_mac_vlan_flow mac_vlan_flow; > struct rte_eth_tunnel_flow tunnel_flow; > + struct rte_eth_pkt_filter_flow pkt_filter_flow; > }; Why not use rte_eth_XX_flow directly but add a new one? Is it because of the mask? If so, how about to add a field in rte_eth_fdir_input like: struct rte_eth_fdir_input { uint16_t flow_type; union rte_eth_fdir_flow flow; /**< Flow fields to match, dependent on flow_type */ union rte_eth_fdir_flow flow_mask; struct rte_eth_fdir_flow_ext flow_ext; /**< Additional fields to match */ }; Thanks Jingjing
[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation
On Tue, Jan 12, 2016 at 05:35:06PM +0300, Pavel Fedin wrote: > Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for > some reason. This causes NULL dereference in qva_to_vva(). > > Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a What's this? An internal track id? If so, you should not include it here: it's just meaningless to us. Otherwise, this patch looks good to me. --yliu
[dpdk-dev] [PATCH 00/12] Add API to get packet type info
On 12/31/2015 9:53 PM, Jianfeng Tan wrote: > HAPPRY NEW YEAR! > > A new ether API rte_eth_dev_get_ptype_info() is added to query what > packet type information will be provided by current pmd driver of the > specifed port. > > To achieve this, a new function pointer, dev_ptype_info_get, is added > into struct eth_dev_ops. For those devices who do not implement it, it > means it will not provide any ptype info. I haven't go through all the patches, but I have a question, what's the usercase of this API? Thanks, Michael > Jianfeng Tan (12): > ethdev: add API to query what/if packet type is set > pmd/cxgbe: add dev_ptype_info_get implementation > pmd/e1000: add dev_ptype_info_get implementation > pmd/enic: add dev_ptype_info_get implementation > pmd/fm10k: add dev_ptype_info_get implementation > pmd/i40e: add dev_ptype_info_get implementation > pmd/ixgbe: add dev_ptype_info_get implementation > pmd/mlx4: add dev_ptype_info_get implementation > pmd/mlx5: add dev_ptype_info_get implementation > pmd/nfp: add dev_ptype_info_get implementation > pmd/vmxnet3: add dev_ptype_info_get implementation > examples/l3fwd: add option to parse ptype > > drivers/net/cxgbe/cxgbe_ethdev.c | 17 +++ > drivers/net/e1000/igb_ethdev.c | 48 > drivers/net/enic/enic_ethdev.c | 20 + > drivers/net/fm10k/fm10k_ethdev.c | 60 + > drivers/net/fm10k/fm10k_rxtx.c | 5 +++ > drivers/net/fm10k/fm10k_rxtx_vec.c | 5 +++ > drivers/net/i40e/i40e_ethdev.c | 1 + > drivers/net/i40e/i40e_ethdev_vf.c| 1 + > drivers/net/i40e/i40e_rxtx.c | 69 - > drivers/net/i40e/i40e_rxtx.h | 2 + > drivers/net/ixgbe/ixgbe_ethdev.c | 50 + > drivers/net/ixgbe/ixgbe_ethdev.h | 2 + > drivers/net/ixgbe/ixgbe_rxtx.c | 5 ++- > drivers/net/mlx4/mlx4.c | 27 +++ > drivers/net/mlx5/mlx5.c | 1 + > drivers/net/mlx5/mlx5.h | 2 + > drivers/net/mlx5/mlx5_ethdev.c | 25 +++ > drivers/net/mlx5/mlx5_rxtx.c | 2 + > drivers/net/nfp/nfp_net.c| 18 > drivers/net/vmxnet3/vmxnet3_ethdev.c | 20 + > examples/l3fwd/main.c| 86 > > lib/librte_ether/rte_ethdev.c| 12 + > lib/librte_ether/rte_ethdev.h| 22 + > lib/librte_mbuf/rte_mbuf.h | 13 ++ > 24 files changed, 511 insertions(+), 2 deletions(-) >
[dpdk-dev] [PATCH v3 1/4] vmxnet3: restore tx data ring support
On 1/5/16, 4:48 PM, "Stephen Hemminger" wrote: >On Tue, 5 Jan 2016 16:12:55 -0800 >Yong Wang wrote: > >> @@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf >> **tx_pkts, >> break; >> } >> >> +if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) { >> +struct Vmxnet3_TxDataDesc *tdd; >> + >> +tdd = txq->data_ring.base + txq->cmd_ring.next2fill; >> +copy_size = rte_pktmbuf_pkt_len(txm); >> +rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), >> copy_size); >> +} > >Good idea to use a local region which optmizes the copy in the host, >but this implementation needs to be more general. > >As written it is broken for multi-segment packets. A multi-segment >packet will have a pktlen >= datalen as in: > m -> mb_segs=3, pktlen=1200, datalen=200 >-> datalen=900 >-> datalen=100 > >There are two ways to fix this. You could test for nb_segs == 1 >or better yet. Optimize each segment it might be that the first >segment (or tail segment) would fit in the available data area. Currently the vmxnet3 backend has a limitation of 128B data area so it should work even for the multi-segmented pkt shown above. But I agree it does not work for all multi-segmented packets. The following packet will be such an example. m -> nb_segs=3, pktlen=128, datalen=64 -> datalen=32 -> datalen=32 It?s unclear if/how we might get into such a multi-segmented pkt but I agree we should handle this case. Patch updated taking the simple approach (checking for nb_segs == 1). I?ll leave the optimization as a future patch.
[dpdk-dev] [PATCH v3 2/4] vmxnet3: add tx l4 cksum offload
On 1/5/16, 4:51 PM, "Stephen Hemminger" wrote: >On Tue, 5 Jan 2016 16:12:56 -0800 >Yong Wang wrote: > >> -if (txq->shared->ctrl.txNumDeferred >= txq->shared->ctrl.txThreshold) { >> +PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", >> rte_le_to_cpu_32(txq_ctrl->txThreshold)); > >For bisection, it would be good to split the byte-order fixes from the >offload changes; in other words make them different commits. Sure and patched updated.
[dpdk-dev] [PATCH v3 4/4] vmxnet3: announce device offload capability
On 1/5/16, 4:52 PM, "Stephen Hemminger" wrote: >On Tue, 5 Jan 2016 16:12:58 -0800 >Yong Wang wrote: > >> >> /* return 0 means link status changed, -1 means not changed */ >> @@ -819,7 +831,7 @@ vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev, >> uint16_t vid, int on) >> else >> VMXNET3_CLEAR_VFTABLE_ENTRY(hw->shadow_vfta, vid); >> >> -/* don't change active filter if in promiscious mode */ >> +/* don't change active filter if in promiscuous mode */ > >Maybe send a first patch in series with these message and comment cleanups? > >Makes the review easier, and aides bisection. Sure and patch updated.
[dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del
Hi Thomas, > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Tuesday, January 12, 2016 8:37 PM > To: Lu, Wenzhuo > Cc: dev at dpdk.org; Vincent JARDIN > Subject: Re: [dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del > > Hi, > > 2016-01-11 08:28, Lu, Wenzhuo: > > [Wenzhuo] The udp_tunnel_add and udp_tunnel_del have already existed. > I just use them. Honestly I agree with you they are not accurate name. Better > change them to udp_tunnel_port_add and udp_tunnel_port_del. But it > should be a ABI change if I?m not wrong. I think we can announce it this > release and change them in the next release. Would you agree? Thanks. > > You can introduce the new name and keep the old one for backward compat > while announcing its deprecation. Good suggestion, I'll send a new patch set for this change. Thanks. > Thanks
[dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN & NVGRE checksum off-load support
Hi John, > -Original Message- > From: Mcnamara, John > Sent: Tuesday, January 12, 2016 9:45 PM > To: Lu, Wenzhuo ; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN & > NVGRE checksum off-load support > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu > > Sent: Monday, January 11, 2016 7:07 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN & > > NVGRE checksum off-load support > > > > +* **Support VxLAN & NVGRE checksum off-load on X550** > > + > > + * VxLAN & NVGRE RX/TX checksum off-load is supported on X550. > > +Provide RX/TX checksum off-load on both inner and outer IP > > +header and TCP header. > > + * Support VxLAN port configuration. Although the default VxLAN > > +port number is 4789, it can be changed. We should make it > > +configable to meet the change. > > Hi Wenzhou, > > The release note text should be in the past tense. Something like this would > be better: > > * **Added support for VxLAN and NVGRE checksum off-load on X550.** > > * Added support for VxLAN and NVGRE RX/TX checksum off-load on > X550. RX/TX checksum off-load is provided on both inner and > outer IP header and TCP header. > > * Added functions to support for VxLAN port configuration. The > default VxLAN port number is 4789 but this can be updated > programmatically. Thanks for the comments. Let me send a V2.
[dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k FTAG based forwarding
Hi, John > -Original Message- > From: Mcnamara, John > Sent: Tuesday, January 12, 2016 10:16 PM > To: Wang, Xiao W ; Chen, Jing D > ; Richardson, Bruce > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k > FTAG based forwarding > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang Xiao W > > Sent: Tuesday, January 5, 2016 12:32 PM > > To: Chen, Jing D; Richardson, Bruce > > Cc: dev at dpdk.org > > Subject: [dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k > > FTAG based forwarding > > > > Add a brief introduction on FTAG, describes what's FTAG and how it > > works in forwarding, introduction on how to run fm10k with FTAG is > > also included. > > > > Signed-off-by: Wang Xiao W > > --- > > doc/guides/nics/fm10k.rst | 13 + > > 1 file changed, 13 insertions(+) > > > > diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst > > index > > 4206b7f..d82bf41 100644 > > --- a/doc/guides/nics/fm10k.rst > > +++ b/doc/guides/nics/fm10k.rst > > @@ -34,6 +34,19 @@ FM10K Poll Mode Driver The FM10K poll mode driver > > library provides support for the Intel FM1 > > (FM10K) family of 40GbE/100GbE adapters. > > > > Hi, > > Some very minor comments. > > > > +FTAG Based Forwarding of FM10K > > +-- > > The Documentation Guidelines say to put a newline after section headers. > > > +FTAG Based Forwarding is a unique feature of FM10K. The FM10K family > > +of NICs support the addition of a Fabric Tag (FTAG) to carry special > > information. > > +The FTAG is placed at the beginning of the frame, it contains > > +information such as where the packet comes from and goes, the vlan tag. > > s/the vlan tag/and the vlan tag > > > > +In FTAG based forwarding mode, the switch logic forwards packets > > +according to glort (global resource tag) information, other than the > > s/other/rather > > > > +mac and vlan table. Now this feature works only on PF. > > s/Now/Currently > > > > + > > +To enable this feature, turn CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD to y > in > > In general variable and config names should be in fixed width quotes: > > ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD`` > > > > +the configuration file. A unit test case fm10k_ftag_autotest is for > > s/for/provided for > > John. > -- > Many thanks for your comments, I'll change accordingly.
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On 2016/01/12 15:59, Yuanhan Liu wrote: > +static int > +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) > +{ > + uint8_t pos; > + struct virtio_pci_cap cap; > + int ret; > + > + if (rte_eal_pci_map_device(dev) < 0) { > + PMD_INIT_LOG(DEBUG, "failed to map pci device!"); > + return -1; > + } > + Do you need to call rte_eal_pci_unmap_device() in somewhere in this file? Anyway, I've reviewed and tested your all patches. And it seems except for it, I guess your patches are good. Thanks, Tetsuya
[dpdk-dev] [PATCH] ixgbe: fix whitespace
-Original Message- From: Stephen Hemminger [mailto:step...@networkplumber.org] Sent: Wednesday, January 13, 2016 12:54 PM To: Zhang, Helin ; Ananyev, Konstantin Cc: dev at dpdk.org; Stephen Hemminger Subject: [PATCH] ixgbe: fix whitespace Normally, I ignore random minor whitespace issues, but this driver seems to have some authors that don't understand the conventions of putting whitespace after keywords. Signed-off-by: Stephen Hemminger Acked-by: Helin Zhang
[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation
Hello! > > Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a > > What's this? An internal track id? Yes, it's from our gerrit. I've just done git format-patch. > If so, you should not include it > here: it's just meaningless to us. > > Otherwise, this patch looks good to me. Should i repost, or can you just drop this tag by yourself when applying? Kind regards, Pavel Fedin Senior Engineer Samsung Electronics Research center Russia
[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation
On Wed, Jan 13, 2016 at 10:18:43AM +0300, Pavel Fedin wrote: > Hello! > > > > Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a > > > > What's this? An internal track id? > > Yes, it's from our gerrit. I've just done git format-patch. > > > If so, you should not include it > > here: it's just meaningless to us. > > > > Otherwise, this patch looks good to me. > > Should i repost, or can you just drop this tag by yourself when applying? I'd suggest so. Since it's Thomas (but not me) to apply your patch. And if you repost, you can add my ACK: Acked-by: Yuanhan Liu --yliu
[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for some reason. This causes NULL dereference in qva_to_vva(). Signed-off-by: Pavel Fedin Acked-by: Yuanhan Liu --- lib/librte_vhost/virtio-net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index 0ba5045..3e7cec0 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr *addr) struct vhost_virtqueue *vq; dev = get_device(ctx); - if (dev == NULL) + if ((dev == NULL) || (dev->mem == NULL)) return -1; /* addr->index refers to the queue index. The txq 1, rxq is 0. */ -- 2.1.1
[dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation
Using two tabs for "if" (or "while") statements is a bit weird to me. Also, using one tab unconditionaly for function definitions and prototypes doesn't look great. Here I'd suggest to use the indentation style the Linux kernel project prefers: to align with the open brace with tabs and additonal spaces (when necessary). Example: static int rte_eal_foo_bar(int a_long_argument_1, int another_long_argument_2, struct foo *yet_another_long_argument_3) ret = rte_eal_foo_bar(a_long_argument_1, another_long_argument_2, yet_another_long_argument_3); if (really_long_variable_name_1 == really_long_variable_name_2 && var3 == var4) { x = y + z; ; } Cc: Thomas Monjalon Cc: Siobhan Butler Cc: John McNamara Signed-off-by: Yuanhan Liu --- doc/guides/contributing/coding_style.rst | 38 ++-- 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst index ad1392d..23cd060 100644 --- a/doc/guides/contributing/coding_style.rst +++ b/doc/guides/contributing/coding_style.rst @@ -339,9 +339,11 @@ General * Do not put any spaces before a tab for indentation. * If you have to wrap a long statement, put the operator at the end of the line, and indent again. -* For control statements (if, while, etc.), continuation it is recommended that the next line be indented by two tabs, rather than one, - to prevent confusion as to whether the second line of the control statement forms part of the statement body or not. - Alternatively, the line continuation may use additional spaces to line up to an appropriately point on the preceding line, for example, to align to an opening brace. +* For control statements (if, while, etc.), function definitions and + function prototypes continuation lines, it is recommended that the + next line be indented by tab and additional spaces when necessary + to align to the opening brace. For others, it's also okay to be + indented by tab only. .. note:: @@ -350,17 +352,29 @@ General .. code-block:: c while (really_long_variable_name_1 == really_long_variable_name_2 && - var3 == var4){ /* confusing to read as */ - x = y + z; /* control stmt body lines up with second line of */ - a = b + c; /* control statement itself if single indent used */ + var3 == var4) { /* confusing to read as */ + x = y + z; /* control stmt body lines up with second line of */ + a = b + c; /* control statement itself if single indent used */ } if (really_long_variable_name_1 == really_long_variable_name_2 && - var3 == var4){ /* two tabs used */ - x = y + z; /* statement body no longer lines up */ - a = b + c; + var3 == var4) {/* align with above opening if statement */ + x = y + z; /* statement body no longer lines up */ + a = b + c; } + /* The continuation line is indented with two tab + 1 space */ + static int + rte_eal_foo_bar(int a_long_argument_1, int another_long_argument_2, + struct foo *yet_another_long_argument_3) + { +... + } + + /* The continuation line is indented with two tab + 7 spaces */ + ret = rte_eal_foo_bar(a_long_argument_1, another_long_argument_2, + yet_another_long_argument_3); + z = a + really + long + statement + that + needs + two + lines + gets + indented + on + the + second + and + subsequent + lines; @@ -510,9 +524,9 @@ Prototypes .. code-block:: c static char *function1(int _arg, const char *_arg2, -struct foo *_arg3, -struct bar *_arg4, -struct baz *_arg5); +struct foo *_arg3, +struct bar *_arg4, +struct baz *_arg5); static void usage(void); .. note:: -- 1.9.0
[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir
Hi Jingjing, On Tuesday, January 01/12/16, 2016 at 17:12:47 -0800, Wu, Jingjing wrote: > > > > diff --git a/lib/librte_ether/rte_eth_ctrl.h > > b/lib/librte_ether/rte_eth_ctrl.h > > index ce224ad..5cc22a0 100644 > > --- a/lib/librte_ether/rte_eth_ctrl.h > > +++ b/lib/librte_ether/rte_eth_ctrl.h > > @@ -74,7 +74,11 @@ extern "C" { > > #define RTE_ETH_FLOW_IPV6_EX15 > > #define RTE_ETH_FLOW_IPV6_TCP_EX16 > > #define RTE_ETH_FLOW_IPV6_UDP_EX17 > > -#define RTE_ETH_FLOW_MAX18 > > +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 #define > > +RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 #define > > +RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 #define > > +RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21 > > +#define RTE_ETH_FLOW_MAX22 > > > How to distinguish RTE_ETH_FLOW_PKT_FILTER_IPV4_XX with > RTE_ETH_FLOW_NONFRAG_IPV4_XX, what is the difference? The packet filter flow is basically a superset containing Ethernet, vlan, ipv4/ipv6 and tcp/udp flows whose fields can all be matched at the same time, unlike in case of the current flow director which seems to match only one of the flows at any given time. Additionally, it also allows specifying masks. I separated the two to make this meaning explicit. If this is not necessary, then I will merge them. > > /** > > * Feature filter types > > @@ -407,6 +411,9 @@ struct rte_eth_l2_flow { struct rte_eth_ipv4_flow { > > uint32_t src_ip; /**< IPv4 source address to match. */ > > uint32_t dst_ip; /**< IPv4 destination address to match. */ > > + uint8_t tos; /**< IPV4 type of service to match. */ > > + uint8_t proto;/**< IPV4 proto to match. */ > > + uint8_t ttl; /**< IPV4 time to live to match. */ > > }; > > > > /** > > @@ -443,6 +450,10 @@ struct rte_eth_sctpv4_flow { struct > > rte_eth_ipv6_flow { > > uint32_t src_ip[4]; /**< IPv6 source address to match. */ > > uint32_t dst_ip[4]; /**< IPv6 destination address to match. */ > > + uint8_t tc; /**< IPv6 traffic class to match. */ > > + uint32_t flow_label; /**< IPv6 flow label to match. */ > > + uint8_t next_header;/**< IPv6 next header to match. */ > > + uint8_t hop_limit; /**< IPv6 hop limits to match. */ > > }; > > > There is also a patch http://dpdk.org/dev/patchwork/patch/9661/ which added > these fields. Maybe we can merge them together. Yes, we can merge them. Would you like me to merge your patch here? > > +struct rte_eth_pkt_filter_flow { > > + enum rte_eth_pkt_filter_type type; /**< Type of filter */ > > + enum rte_eth_pkt_filter_type prio; > > + /**< Prioritize the filter type when a packet matches several types */ > > + struct rte_eth_pkt_filter pkt; /**< Packet fields to match. */ > > + struct rte_eth_pkt_filter mask; /**< Mask for matched fields. */ > > +}; > > + > > +/** > > * An union contains the inputs for all types of flow > > */ > > union rte_eth_fdir_flow { > > @@ -514,6 +570,7 @@ union rte_eth_fdir_flow { > > struct rte_eth_ipv6_flow ipv6_flow; > > struct rte_eth_mac_vlan_flow mac_vlan_flow; > > struct rte_eth_tunnel_flow tunnel_flow; > > + struct rte_eth_pkt_filter_flow pkt_filter_flow; > > }; > Why not use rte_eth_XX_flow directly but add a new one? Is it because of the > mask? If so, how about to add a field in rte_eth_fdir_input like: > struct rte_eth_fdir_input { > uint16_t flow_type; > union rte_eth_fdir_flow flow; > /**< Flow fields to match, dependent on flow_type */ > union rte_eth_fdir_flow flow_mask; > struct rte_eth_fdir_flow_ext flow_ext; > /**< Additional fields to match */ > }; > > Thanks > Jingjing The current rte_eth_XX_flow only allow matching _one_ of the flows because of the union. In contrast, rte_eth_pkt_filter_flow can match several flows at the same time; i.e. it can match ethernet, vlan, ip, and tcp/udp all at the same time. rte_eth_pkt_filter_flow is basically a superset containing several flows that can be matched at the same time. In our Chelsio T5 hardware, it's possible to have several flows that can be matched in a single rule. This is why I've created a superset that can match several flows in the same rule. Thanks, Rahul
[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation
On 1/13/2016 3:33 PM, Pavel Fedin wrote: > Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for > some reason. This causes NULL dereference in qva_to_vva(). Do you have examples for the malfunctioning clients? If so, could you list them in the commit message? > > Signed-off-by: Pavel Fedin > Acked-by: Yuanhan Liu > --- > lib/librte_vhost/virtio-net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c > index 0ba5045..3e7cec0 100644 > --- a/lib/librte_vhost/virtio-net.c > +++ b/lib/librte_vhost/virtio-net.c > @@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct > vhost_vring_addr *addr) > struct vhost_virtqueue *vq; > > dev = get_device(ctx); > - if (dev == NULL) > + if ((dev == NULL) || (dev->mem == NULL)) > return -1; > > /* addr->index refers to the queue index. The txq 1, rxq is 0. */
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On Wed, Jan 13, 2016 at 12:31:43PM +0900, Tetsuya Mukawa wrote: > On 2016/01/12 15:59, Yuanhan Liu wrote: > > +static int > > +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) > > +{ > > + uint8_t pos; > > + struct virtio_pci_cap cap; > > + int ret; > > + > > + if (rte_eal_pci_map_device(dev) < 0) { > > + PMD_INIT_LOG(DEBUG, "failed to map pci device!"); > > + return -1; > > + } > > + > > Do you need to call rte_eal_pci_unmap_device() in somewhere in this file? Yes, we should. I will see where I can find a proper place for it in next version; eth_virtio_dev_uninit sounds like a good option. > Anyway, I've reviewed and tested your all patches. > And it seems except for it, I guess your patches are good. Thank you for reviewing and testing it! --yliu
[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation
Hello! > Do you have examples for the malfunctioning clients? If so, could you > list them in the commit message? The only malfunctioning client was DPDK itself, with virtio for container RFC applied. The client-side problem has been fixed afterwards by http://dpdk.org/ml/archives/dev/2016-January/031169.html. See the RFC discussion thread. Kind regards, Pavel Fedin Senior Engineer Samsung Electronics Research center Russia
[dpdk-dev] Getting error while running DPDK test app on X-Gene1
Hi, We are trying to run dpdk on our arm64 based SOC having Intel 10G ixgbe PCIe card plugged. While running any test app, we are getting following error. EAL: PCI device :01:00.0 on NUMA socket 0 EAL: probe driver: 8086:10fb rte_ixgbe_pmd EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such file or directory EAL: Error - exiting with code: 1 Cause: Requested device :01:00.0 cannot be used Below are the details on modules, hugepages and device binding. root at arm64:~# lsmod Module Size Used by rte_kni 292795 0 igb_uio 4338 0 ixgbe 184456 0 root at arm64:~/dpdk# cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2048 root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status Network devices using DPDK-compatible driver :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused= :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused= Network devices using kernel driver === Other network devices = root at arm64:~/dpdk# Thanks, Ankit
[dpdk-dev] patchwork 9704-9710 compilation error
Hello Yongjie, I know about this error. It does not work on some GCC versions... However, this patch is not intended to be merged upstream. It just shows that the infra works in the RFC series. Anyway, thank you for your notification. Regards Jan On Wed, 13 Jan 2016 10:17:37 + "Gu, YongjieX" wrote: > Hi,Viktorin: > I tested your patch > 9704-9710(dpdk-dev-RFC-1-7-eal-common-define-rte_soc_-related-common-interface.patch > etc.) on patchwork,and met an error, > Error info as below,can you help to check. > > > OS: fedora20 > > Nic: Niantic > > Gcc: 4.8.3 > Kernel: 3.11.10-301.fc20.x86_64 > Target: x86_64-native-linuxapp-gcc > Baseline commit: 3b60ce8cbb959d7a6839f94ad995a3594c07801e > > Error info: > > cc1: error: -Werror=incompatible-pointer-types: no option > -Wincompatible-pointer-types CC test_pmd_perf.o > > make[5]: *** [test_soc.o] Error 1 > > Thanks > Yongjie -- Jan ViktorinE-mail: Viktorin at RehiveTech.com System ArchitectWeb:www.RehiveTech.com RehiveTech Brno, Czech Republic
[dpdk-dev] rte_prefetch0() is effective?
On Thu, Dec 24, 2015 at 03:35:14PM +0900, Moon-Sang Lee wrote: > I see codes as below in example directory, and I wonder it is effective. > Coherent IO is adopted to modern architectures, > so I think that DMA initiation by rte_eth_rx_burst() might already fulfills > cache lines of RX buffers. > Do I really need to call rte_prefetchX()? > > nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, > MAX_PKT_BURST); > ... > /* Prefetch and forward already prefetched packets */ > for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) { > rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[ > j + PREFETCH_OFFSET], void *)); > l3fwd_simple_forward(pkts_burst[j], portid, > qconf); > } > Good question. When the first example apps using this style of prefetch were originally written, yes, there was a noticable performance increase achieved by using the prefetch. Thereafter, I'm not sure that anyone has checked with each generation of platforms whether the prefetches are still necessary and how much they help, but I suspect that they still help a bit, and don't hurt performance. It would be an interesting exercise to check whether the prefetch offsets used in code like above can be adjusted to give better performance on our latest supported platforms. /Bruce
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
On Tue, Jan 05, 2016 at 11:11:24AM +, Hanoch Haim (hhaim) wrote: > Hi Oliver, > Thank you for the fast response and it would be great to open a discussion on > that. > In general our project can leverage your optimization and I think it is great > (we should have thought about it) . We can use it using the workaround I > described. > However, for me it seems odd that rte_pktmbuf_attach () that does not > *change* anything in m_const, except of the *atomic* ref counter does not > work in parallel. > The example I gave is a classic use case of rte_pktmbuf_attach (multicast ) > and I don't see why it wouldn't work after your optimization. > > Do you have a pointer to the documentation that state that that you can't > call the atomic ref counter from more than one thread? > Hi, actually, the issue is not that you can't work with the reference counter field from multiple threads, or that you can't use an mbuf from multiple threads, it's that if you are working with the same mbuf in multiple threads you have multiple references to the mbuf and your application must increase the reference counter appropriately. For example, if thread A is going to pass an mbuf to thread B and keep using it itself, you must increment the reference counter in thread A before enqueuing it to B. Regards, /Bruce
[dpdk-dev] [PATCH] pci: Add the class_id support in pci probe
On Thu, Dec 31, 2015 at 09:12:14AM -0800, Stephen Hemminger wrote: > On Tue, 29 Dec 2015 10:53:26 +0800 > Ziye Yang wrote: > > > This patch is used to add the class_id support > > for pci_probe since some devices need the class_info > > (class_code, subclass_code, programming_interface) > > > > Signed-off-by: Ziye Yang > > Since rte_pci is exposed to application this breaks the ABI. But applications are not going to be defining rte_pci_ids values internally, are they? That is for drivers to use. Is this really an ABI breakage for applications that we need to be concerned about? /Bruce
[dpdk-dev] [PATCH v2 0/4] fix the issue that DPDK takes over virtio device blindly
On Tue, Jan 12, 2016 at 10:46 AM, Xie, Huawei wrote: > On 1/12/2016 12:24 PM, Santosh Shukla wrote: >> On Tue, Jan 12, 2016 at 8:32 AM, Xie, Huawei wrote: >>> On 1/5/2016 1:25 AM, Stephen Hemminger wrote: On Mon, 4 Jan 2016 01:56:09 +0800 Huawei Xie wrote: > v2 changes: > Remove unnecessary assignment of NULL to dev->data->mac_addrs > Ajust one comment's position > change LOG level from ERR to INFO > > virtio PMD doesn't set RTE_PCI_DRV_NEED_MAPPING in drv_flags of its > eth_driver. It will try igb_uio and PORT IO in turn to configure > virtio device. Even user in guest VM doesn't want to use virtio for > DPDK, virtio PMD will take over the device blindly. > > The more serious problem is kernel driver is still manipulating the > device, which causes driver conflict. > > This patch checks if there is any kernel driver manipulating the > virtio device before virtio PMD uses port IO to configure the device. > > Huawei Xie (4): > eal: make the comment more accurate > eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the > device. > virtio: return 1 to tell the kernel we don't take over this device > virtio: check if any kernel driver is manipulating the virtio device > > drivers/net/virtio/virtio_ethdev.c | 16 ++-- > lib/librte_eal/common/eal_common_pci.c | 8 > lib/librte_eal/linuxapp/eal/eal_pci.c | 2 +- > 3 files changed, 19 insertions(+), 7 deletions(-) > Overall looks good, thanks for addressing this. It would be good to note that VFIO no-IOMMU mode should work for this as well. >>> It isn't implemented yet in virtio PMD. I could add a note in the commit >>> message. Do you plan to implement this? >>> >> I can send vfio-noiommu patches for this one, as I am looking at >> vfio-noiommu for virtio for my arm v4 patch series. Stephen, let me >> know if you already started working on this? >> >> Also for some reason I can't find [3/4] patch, could you point me to >> patch link? Thanks. > Thanks. Here is the patch: http://www.dpdk.org/dev/patchwork/patch/9720/ > Thanks, I am rebasing my v4 series on top of this patch, although I couldn't see current series creating issue for vfio-noiommu case. We'll post feedback in v4 cover-letter.
[dpdk-dev] [PATCH v3 09/12] virtio: vfio: Enable RTE_PCI_DRV_NEED_MAPPING flag in driver
On Tue, Jan 12, 2016 at 12:44 PM, Yuanhan Liu wrote: > On Sat, Jan 09, 2016 at 06:08:46PM +0530, Santosh Shukla wrote: >> On Thu, Jan 7, 2016 at 11:50 PM, Stephen Hemminger >> wrote: >> > On Thu, 7 Jan 2016 22:03:06 +0530 >> > Santosh Shukla wrote: >> > >> >> +#ifdef RTE_EAL_VFIO >> >> + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | >> >> RTE_PCI_DRV_DETACHABLE, >> >> +#else >> >> .drv_flags = RTE_PCI_DRV_DETACHABLE, >> >> +#endif >> > >> > Since VFIO is determined at runtime not compile time, the flags should >> > be updated at runtime not compile time. >> > >> > >> In general, Yes, Its a wrong approach i..e. Wrapping __need_mapping >> flag only for vfio case. I am thinking to add vfio parser routine >> something similar to virtio_xxx_xx_uio_xx() / virtio_xx_xx_ioport() >> currently exist. This will remove RTE_EAL_VFIO ifdef clutter for this >> patch and [08/12] patch and also virtio pmd driver can then initialize >> device for vfio mode.. >> >> _but_ I still need _MAPPING flag enabled for in virtio driver as >> because for vfio case - I want vfio_xx_mmap() routine to create vfio >> container/group_id and then create vfio_dev_fd for each virtio-net-pci >> interface. > > I'm thinking my following patch will help: > > http://dpdk.org/dev/patchwork/patch/9814/ > Yes, It works, so wont need NEED_MAPPING flag, Sending v4 patch series rebased on this patch.. > --yliu > >> Let me know my approach aligned to your suggestion.
[dpdk-dev] [PATCH] pci: Add the class_id support in pci probe
On 01/13/2016 01:55 PM, Bruce Richardson wrote: > On Thu, Dec 31, 2015 at 09:12:14AM -0800, Stephen Hemminger wrote: >> On Tue, 29 Dec 2015 10:53:26 +0800 >> Ziye Yang wrote: >> >>> This patch is used to add the class_id support >>> for pci_probe since some devices need the class_info >>> (class_code, subclass_code, programming_interface) >>> >>> Signed-off-by: Ziye Yang >> >> Since rte_pci is exposed to application this breaks the ABI. > > But applications are not going to be defining rte_pci_ids values internally, > are > they? That is for drivers to use. Is this really an ABI breakage for > applications that we > need to be concerned about? There might not be applications using it but drivers are ABI consumers too - think of 3rd party drivers and such. - Panu -
[dpdk-dev] [PATCH 0/2] i40evf: support interrupt based pf reset request
If DPDK is used on VF while the host is using Linux Kernel driver as PF driver on FVL NIC, some setting on PF will trigger VF reset. DPDK VF need to know the event. This patch set makes the interrupt based request of PF reset from PF supported by enabling the adminq event process in VF driver. Users can register a callback for this interrupt event to get informed, when a PF reset request detected like: rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_RESET, reset_event_callback, arg); Jingjing Wu (2): i40evf: allocate virtchnl cmd buffer for each vf i40evf: support interrupt based pf reset request drivers/net/i40e/i40e_ethdev.h| 2 + drivers/net/i40e/i40e_ethdev_vf.c | 425 +++--- lib/librte_ether/rte_ethdev.h | 1 + 3 files changed, 304 insertions(+), 124 deletions(-) -- 2.4.0
[dpdk-dev] [PATCH 1/2] i40evf: allocate virtchnl cmd buffer for each vf
Currently, i40evf PMD uses a global static buffer to send virtchnl command to host driver. It is shared by multi VFs. This patch changed to allocate virtchnl cmd buffer for each VF. Signed-off-by: Jingjing Wu --- drivers/net/i40e/i40e_ethdev.h| 2 + drivers/net/i40e/i40e_ethdev_vf.c | 183 +++--- 2 files changed, 75 insertions(+), 110 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 1f9792b..93122ad 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -494,7 +494,9 @@ struct i40e_vf { bool link_up; bool vf_reset; volatile uint32_t pend_cmd; /* pending command not finished yet */ + uint32_t cmd_retval; /* return value of the cmd response from PF */ u16 pend_msg; /* flags indicates events from pf not handled yet */ + uint8_t *aq_resp; /* buffer to store the adminq response from PF */ /* VSI info */ struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */ diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 14d2a50..ebcd881 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -103,9 +103,6 @@ enum i40evf_aq_result { I40EVF_MSG_CMD, /* Read async command result */ }; -/* A share buffer to store the command result from PF driver */ -static uint8_t cmd_result_buffer[I40E_AQ_BUF_SZ]; - static int i40evf_dev_configure(struct rte_eth_dev *dev); static int i40evf_dev_start(struct rte_eth_dev *dev); static void i40evf_dev_stop(struct rte_eth_dev *dev); @@ -237,31 +234,39 @@ i40evf_set_mac_type(struct i40e_hw *hw) } /* - * Parse admin queue message. - * - * return value: - * < 0: meet error - * 0: read sys msg - * > 0: read cmd result + * Read data in admin queue to get msg from pf driver */ static enum i40evf_aq_result -i40evf_parse_pfmsg(struct i40e_vf *vf, - struct i40e_arq_event_info *event, - struct i40evf_arq_msg_info *data) +i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data) { - enum i40e_virtchnl_ops opcode = (enum i40e_virtchnl_ops)\ - rte_le_to_cpu_32(event->desc.cookie_high); - enum i40e_status_code retval = (enum i40e_status_code)\ - rte_le_to_cpu_32(event->desc.cookie_low); - enum i40evf_aq_result ret = I40EVF_MSG_CMD; + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); + struct i40e_arq_event_info event; + enum i40e_virtchnl_ops opcode; + enum i40e_status_code retval; + int ret; + enum i40evf_aq_result result = I40EVF_MSG_NON; + event.buf_len = data->buf_len; + event.msg_buf = data->msg; + ret = i40e_clean_arq_element(hw, &event, NULL); + /* Can't read any msg from adminQ */ + if (ret) { + if (ret == I40E_ERR_ADMIN_QUEUE_NO_WORK) + result = I40EVF_MSG_NON; + else + result = I40EVF_MSG_ERR; + return result; + } + + opcode = (enum i40e_virtchnl_ops)rte_le_to_cpu_32(event.desc.cookie_high); + retval = (enum i40e_status_code)rte_le_to_cpu_32(event.desc.cookie_low); /* pf sys event */ if (opcode == I40E_VIRTCHNL_OP_EVENT) { struct i40e_virtchnl_pf_event *vpe = - (struct i40e_virtchnl_pf_event *)event->msg_buf; + (struct i40e_virtchnl_pf_event *)event.msg_buf; - /* Initialize ret to sys event */ - ret = I40EVF_MSG_SYS; + result = I40EVF_MSG_SYS; switch (vpe->event) { case I40E_VIRTCHNL_EVENT_LINK_CHANGE: vf->link_up = @@ -286,74 +291,17 @@ i40evf_parse_pfmsg(struct i40e_vf *vf, } } else { /* async reply msg on command issued by vf previously */ - ret = I40EVF_MSG_CMD; + result = I40EVF_MSG_CMD; /* Actual data length read from PF */ - data->msg_len = event->msg_len; + data->msg_len = event.msg_len; } - /* fill the ops and result to notify VF */ + data->result = retval; data->ops = opcode; - return ret; -} - -/* - * Read data in admin queue to get msg from pf driver - */ -static enum i40evf_aq_result -i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data) -{ - struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); - struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); - struct i40e_arq_event_info event; - int ret; - enum i40evf_aq_result result = I40EVF_MSG_NON; - - event.buf_len = data->buf_len; - event.msg_buf = data->msg; - ret =
[dpdk-dev] [PATCH 2/2] i40evf: support interrupt based pf reset request
Interrupt based request of PF reset from PF is supported by enabling the adminq event process in VF driver. Users can register a callback for this interrupt event to get informed, when a PF reset request detected like: rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_RESET, reset_event_callback, arg); Signed-off-by: Jingjing Wu --- drivers/net/i40e/i40e_ethdev_vf.c | 274 +- lib/librte_ether/rte_ethdev.h | 1 + 2 files changed, 245 insertions(+), 30 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index ebcd881..cfeee77 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -74,8 +74,6 @@ #define I40EVF_BUSY_WAIT_DELAY 10 #define I40EVF_BUSY_WAIT_COUNT 50 #define MAX_RESET_WAIT_CNT 20 -/*ITR index for NOITR*/ -#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3 struct i40evf_arq_msg_info { enum i40e_virtchnl_ops ops; @@ -151,6 +149,9 @@ static int i40evf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); static int i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev, + uint8_t *msg, + uint16_t msglen); /* Default hash key buffer for RSS */ static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1]; @@ -357,20 +358,42 @@ i40evf_execute_vf_cmd(struct rte_eth_dev *dev, struct vf_cmd_info *args) return err; } - do { - /* Delay some time first */ - rte_delay_ms(ASQ_DELAY_MS); - ret = i40evf_read_pfmsg(dev, &info); - if (ret == I40EVF_MSG_CMD) { - err = 0; - break; - } else if (ret == I40EVF_MSG_ERR) { - err = -1; - break; - } - /* If don't read msg or read sys event, continue */ - } while (i++ < MAX_TRY_TIMES); - _clear_cmd(vf); + switch (args->ops) { + case I40E_VIRTCHNL_OP_RESET_VF: + /*no need to process in this function */ + break; + case I40E_VIRTCHNL_OP_VERSION: + case I40E_VIRTCHNL_OP_GET_VF_RESOURCES: + /* for init adminq commands, need to poll the response */ + do { + /* Delay some time first */ + rte_delay_ms(ASQ_DELAY_MS); + ret = i40evf_read_pfmsg(dev, &info); + if (ret == I40EVF_MSG_CMD) { + err = 0; + break; + } else if (ret == I40EVF_MSG_ERR) { + err = -1; + break; + } + /* If don't read msg or read sys event, continue */ + } while (i++ < MAX_TRY_TIMES); + _clear_cmd(vf); + break; + + default: + /* for other adminq in running time, waiting the cmd done flag */ + do { + /* Delay some time first */ + rte_delay_ms(ASQ_DELAY_MS); + if (vf->pend_cmd == I40E_VIRTCHNL_OP_UNKNOWN) { + err = 0; + break; + } + /* If don't read msg or read sys event, continue */ + } while (i++ < MAX_TRY_TIMES); + break; + } return (err | vf->cmd_retval); } @@ -719,7 +742,7 @@ i40evf_config_irq_map(struct rte_eth_dev *dev) map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer; map_info->num_vectors = 1; - map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR; + map_info->vecmap[0].rxitr_idx = I40E_ITR_INDEX_DEFAULT; map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id; /* Alway use default dynamic MSIX interrupt */ map_info->vecmap[0].vector_id = vector_id; @@ -1093,6 +1116,38 @@ i40evf_dev_atomic_write_link_status(struct rte_eth_dev *dev, return 0; } +/* Disable IRQ0 */ +static inline void +i40evf_disable_irq0(struct i40e_hw *hw) +{ + /* Disable all interrupt types */ + I40E_WRITE_REG(hw, I40E_VFINT_ICR0_ENA1, 0); + I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTL01, + I40E_VFINT_DYN_CTL01_ITR_INDX_MASK); + I40EVF_WRITE_FLUSH(hw); +} + +/* Enable IRQ0 */ +static inline void +i40evf_enable_irq0(struct i40e_hw *hw) +{ + /* Enable admin queue interrupt trigger */ + uint32_t val; + + i40evf_disable_irq0(hw); + val = I40E_READ_REG(hw, I40E_VFINT_ICR0_ENA1); + val |= I40E_VFINT_ICR0_ENA1_ADMINQ_MASK | + I40E_VFINT_ICR0_ENA1_LINK_STAT_CHANGE_MASK; +
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
This commit is adding a generic mechanism to support multiple IOMMU types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special VFIO mode that doesn't use IOMMU at all), but it's easily extended by adding necessary definitions into eal_pci_init.h and a DMA mapping function to eal_pci_vfio_dma.c. Since type 1 IOMMU module is no longer necessary to have VFIO, we fix the module check to check for vfio-pci instead. It's not ideal and triggers VFIO checks more often (and thus produces more error output, which was the reason behind the module check in the first place), so we compensate for that by providing more verbose logging, indicating whether VFIO initialization has succeeded or failed. Signed-off-by: Anatoly Burakov Signed-off-by: Santosh Shukla Tested-by: Santosh Shukla --- v2 changes: Compile fix (hat-tip to Santosh Shukla) Tested-by is provisional, since only superficial testing was done --- lib/librte_eal/linuxapp/eal/Makefile | 1 + lib/librte_eal/linuxapp/eal/eal_pci_init.h | 22 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 143 - lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c | 84 +++ lib/librte_eal/linuxapp/eal/eal_vfio.h | 5 + 5 files changed, 202 insertions(+), 53 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 26eced5..5c9e9d9 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c +SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_dma.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/eal_pci_init.h index a17c708..da1c431 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h +++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h @@ -106,6 +106,28 @@ struct vfio_config { struct vfio_group vfio_groups[VFIO_MAX_GROUPS]; }; +/* function pointer typedef for DMA mapping functions */ +typedef int (*vfio_dma_func_t)(int); + +/* Structure to hold supported IOMMU types */ +struct vfio_iommu_type { + int type_id; + const char *name; + vfio_dma_func_t dma_map_func; +}; + +/* function prototypes for different IOMMU types */ +int vfio_iommu_type1_dma_map(int container_fd); +int vfio_iommu_noiommu_dma_map(int container_fd); + +/* IOMMU types we support */ +static const struct vfio_iommu_type iommu_types[] = { + /* x86 IOMMU, otherwise known as type 1 */ + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map}, + /* IOMMU-less mode */ + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map}, +}; + #endif #endif /* EAL_PCI_INIT_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index 74f91ba..5eb6cd0 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -72,6 +72,7 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq) #define VFIO_DIR "/dev/vfio" #define VFIO_CONTAINER_PATH "/dev/vfio/vfio" #define VFIO_GROUP_FMT "/dev/vfio/%u" +#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u" #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL) /* per-process VFIO config */ @@ -208,42 +209,58 @@ pci_vfio_set_bus_master(int dev_fd) return 0; } -/* set up DMA mappings */ -static int -pci_vfio_setup_dma_maps(int vfio_container_fd) -{ - const struct rte_memseg *ms = rte_eal_get_physmem_layout(); - int i, ret; - - ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, - VFIO_TYPE1_IOMMU); - if (ret) { - RTE_LOG(ERR, EAL, " cannot set IOMMU type, " - "error %i (%s)\n", errno, strerror(errno)); - return -1; +/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */ +static const struct vfio_iommu_type * +pci_vfio_set_iommu_type(int vfio_container_fd) { + unsigned idx; + for (idx = 0; idx < RTE_DIM(iommu_types); idx++) { + const struct vfio_iommu_type *t = &iommu_types[idx]; + + int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, + t->type_id); + if (!ret) { + RTE_LOG(NOTICE, EAL, " using IOMMU type %d (%s)\n", + t->type_id, t->name); + return t; + } + /* not an error, there may be more supported IOMMU types */ +
[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation
On 1/13/2016 5:40 PM, Pavel Fedin wrote: > Hello! > >> Do you have examples for the malfunctioning clients? If so, could you >> list them in the commit message? > The only malfunctioning client was DPDK itself, with virtio for container > RFC applied. The client-side problem has been fixed > afterwards by http://dpdk.org/ml/archives/dev/2016-January/031169.html. See > the RFC discussion thread. If this is the case, i am wondering whether we should include "malfunctioning clients" in commit message. It triggers me to think if there are existing buggy implementations. Anyway, check is OK. > Kind regards, > Pavel Fedin > Senior Engineer > Samsung Electronics Research center Russia > > >
[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir
Hi, Rahul > -Original Message- > From: Rahul Lakkireddy [mailto:rahul.lakkireddy at chelsio.com] > Sent: Wednesday, January 13, 2016 4:49 PM > To: Wu, Jingjing > Cc: dev at dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan > Subject: Re: [dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new > behavior > switch to fdir > > Hi Jingjing, > > On Tuesday, January 01/12/16, 2016 at 17:12:47 -0800, Wu, Jingjing wrote: > > > > > > diff --git a/lib/librte_ether/rte_eth_ctrl.h > > > b/lib/librte_ether/rte_eth_ctrl.h > > > index ce224ad..5cc22a0 100644 > > > --- a/lib/librte_ether/rte_eth_ctrl.h > > > +++ b/lib/librte_ether/rte_eth_ctrl.h > > > @@ -74,7 +74,11 @@ extern "C" { > > > #define RTE_ETH_FLOW_IPV6_EX15 > > > #define RTE_ETH_FLOW_IPV6_TCP_EX16 > > > #define RTE_ETH_FLOW_IPV6_UDP_EX17 > > > -#define RTE_ETH_FLOW_MAX18 > > > +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 #define > > > +RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 #define > > > +RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 #define > > > +RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21 > > > +#define RTE_ETH_FLOW_MAX22 > > > > > How to distinguish RTE_ETH_FLOW_PKT_FILTER_IPV4_XX with > RTE_ETH_FLOW_NONFRAG_IPV4_XX, what is the difference? > > The packet filter flow is basically a superset containing Ethernet, > vlan, ipv4/ipv6 and tcp/udp flows whose fields can all be matched at > the same time, unlike in case of the current flow director which seems > to match only one of the flows at any given time. Additionally, it also > allows specifying masks. I separated the two to make this meaning > explicit. If this is not necessary, then I will merge them. Thanks for clarification, now I understand. How about just define one to indicate using pkt_filter? And move the IPV4_XX info to the structure rte_eth_pkt_filter? > > There is also a patch http://dpdk.org/dev/patchwork/patch/9661/ which added > > these > fields. Maybe we can merge them together. > > Yes, we can merge them. Would you like me to merge your patch here? The i40e driver implementation is done based on the change. If you'd like to merge, maybe other patches in the patch set also need to be merged. Anyway, I think maintainer can deal with it. > > The current rte_eth_XX_flow only allow matching _one_ of the flows > because of the union. In contrast, rte_eth_pkt_filter_flow can match > several flows at the same time; i.e. it can match ethernet, vlan, ip, > and tcp/udp all at the same time. rte_eth_pkt_filter_flow is basically > a superset containing several flows that can be matched at the same > time. > > In our Chelsio T5 hardware, it's possible to have several flows that can > be matched in a single rule. This is why I've created a superset that > can match several flows in the same rule. > > Thanks, > Rahul Thanks for clarification, it's clear. And it's great to have this feature. Even it is a superset containing several flows, we still can use the existing structs like struct rte_eth_ipv4_flow { struct rte_eth_l2_flow ether; struct rte_eth_vlan_flow vlan; uint32_t src_ip; uint32_t dst_ip; }; What do you think? Thanks Jingjing
[dpdk-dev] [PATCH 03/14] eal/common: introduce union rte_device and related
On Mon, Jan 04, 2016 at 09:08:15PM +0100, Jan Viktorin wrote: > The union rte_device can be used in situations where we want to work with all > devices without distinguishing among bus-specific features (PCI, ...). > The target device type can be detected by reading the magic. > > Also, the macros RTE_DEVICE_DECL and RTE_DEVICE_PTR_DECL are introduced to > provide a generic way to declare a device or a pointer to a device. The macros > aim to preserve API backwards-compatibility. Eg. > > struct old_super_struct { => struct old_super_struct { > struct rte_pci_device *pci_dev; =>RTE_DEVICE_PTR_DECL(pci_dev); > ... => ... > };=> }; > > struct old_super_struct inst; > > The new code should reference inst.dev.pci, the old code can still use the > inst.pci_dev. The previously introduced magic is included so one can ask the > instance about its type: > > if (inst.dev.magic == RTE_PCI_DEVICE_MAGIC) { > ... > } Rather than magic numbers i.e. #defines, an enum might be better. /Bruce
[dpdk-dev] [PATCH 00/14] Step towards PCI independency
Hello Jan, On Mon, Jan 11, 2016 at 6:29 PM, Jan Viktorin wrote: > Hello David, > > did you find time to see the patchset? I am working on a PMD on top of > these so I'd be glad to base on the code close to the (potentially) > upstreamed one. I took a quick look but since we still have an abi breakage, I am trying to put my ideas on paper and will share to you asap. Regards, -- David Marchand
[dpdk-dev] [PATCH 03/14] eal/common: introduce union rte_device and related
On Wed, 13 Jan 2016 14:01:19 + Bruce Richardson wrote: > On Mon, Jan 04, 2016 at 09:08:15PM +0100, Jan Viktorin wrote: > > The union rte_device can be used in situations where we want to work with > > all > > devices without distinguishing among bus-specific features (PCI, ...). > > The target device type can be detected by reading the magic. > > > > Also, the macros RTE_DEVICE_DECL and RTE_DEVICE_PTR_DECL are introduced to > > provide a generic way to declare a device or a pointer to a device. The > > macros > > aim to preserve API backwards-compatibility. Eg. > > > > struct old_super_struct { => struct old_super_struct { > > struct rte_pci_device *pci_dev; => > > RTE_DEVICE_PTR_DECL(pci_dev); > > ... => ... > > };=> }; > > > > struct old_super_struct inst; > > > > The new code should reference inst.dev.pci, the old code can still use the > > inst.pci_dev. The previously introduced magic is included so one can ask the > > instance about its type: > > > > if (inst.dev.magic == RTE_PCI_DEVICE_MAGIC) { > > ... > > } > > Rather than magic numbers i.e. #defines, an enum might be better. True. However, would it be helpful to put really some _magic_ numbers there for debugging purposes (to clearly recognize the data type)? Or, is it sufficient to just say 1 for PCI, 2 for SoC, 3 for xxx...? > > /Bruce > -- Jan Viktorin E-mail: Viktorin at RehiveTech.com System Architect Web:www.RehiveTech.com RehiveTech Brno, Czech Republic
[dpdk-dev] [PATCH 03/14] eal/common: introduce union rte_device and related
On Wed, Jan 13, 2016 at 03:12:27PM +0100, Jan Viktorin wrote: > On Wed, 13 Jan 2016 14:01:19 + > Bruce Richardson wrote: > > > On Mon, Jan 04, 2016 at 09:08:15PM +0100, Jan Viktorin wrote: > > > The union rte_device can be used in situations where we want to work with > > > all > > > devices without distinguishing among bus-specific features (PCI, ...). > > > The target device type can be detected by reading the magic. > > > > > > Also, the macros RTE_DEVICE_DECL and RTE_DEVICE_PTR_DECL are introduced to > > > provide a generic way to declare a device or a pointer to a device. The > > > macros > > > aim to preserve API backwards-compatibility. Eg. > > > > > > struct old_super_struct { => struct old_super_struct { > > > struct rte_pci_device *pci_dev; => > > > RTE_DEVICE_PTR_DECL(pci_dev); > > > ... => ... > > > };=> }; > > > > > > struct old_super_struct inst; > > > > > > The new code should reference inst.dev.pci, the old code can still use the > > > inst.pci_dev. The previously introduced magic is included so one can ask > > > the > > > instance about its type: > > > > > > if (inst.dev.magic == RTE_PCI_DEVICE_MAGIC) { > > > ... > > > } > > > > Rather than magic numbers i.e. #defines, an enum might be better. > > True. However, would it be helpful to put really some _magic_ numbers > there for debugging purposes (to clearly recognize the data type)? Or, > is it sufficient to just say 1 for PCI, 2 for SoC, 3 for xxx...? > I'd find it hard to see the need for actual magic numbers. I think the magic field should be renamed to "type" and the values taken from a device_type enum. Should make the code more readable e.g. if (inst.dev.type == RTE_DEVTYPE_PCI) { ... } /Bruce
[dpdk-dev] [PATCH v2 6/7] eal: pci: export pci_map_device
On Tue, Jan 12, 2016 at 2:35 PM, Yuanhan Liu wrote: > On Tue, Jan 12, 2016 at 04:40:43PM +0800, Yuanhan Liu wrote: >> On Tue, Jan 12, 2016 at 09:31:05AM +0100, David Marchand wrote: >> > On Tue, Jan 12, 2016 at 7:59 AM, Yuanhan Liu > > linux.intel.com> >> > wrote: >> > >> > Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will >> > invoke pci_map_device internally for us. From that point view, there >> > is no need to export pci_map_device. >> > >> > However, for virtio pmd driver, which is designed to work without >> > binding UIO (or something similar first), pci_map_device() will fail, >> > which ends up with virtio pmd driver being skipped. Therefore, we can >> > not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver. >> > >> > Therefore, this patch exports pci_map_device, and let virtio pmd >> > call it when necessary. >> > >> > >> > Well, if you introduce map function, I suppose, for hotplug, you would need >> > unmap. >> >> Good remind. Thanks. I will export pci_unmap_device as well. > > And here you go. > > --yliu > > -- >8 -- > From aa3d9d0fa827781d1563fd4c06ba04a8fafdc41c Mon Sep 17 00:00:00 2001 > From: Yuanhan Liu > Date: Mon, 11 Jan 2016 16:51:35 +0800 > Subject: [PATCH] eal: pci: export pci_[un]map_device > > Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will > invoke pci_map_device internally for us. From that point view, there > is no need to export pci_map_device. > > However, for virtio pmd driver, which is designed to work without > binding UIO (or something similar first), pci_map_device() will fail, > which ends up with virtio pmd driver being skipped. Therefore, we can > not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver. > > Therefore, this patch exports pci_map_device, and let virtio pmd > call it when necessary. > > Signed-off-by: Yuanhan Liu > --- > v2: - export pci_unmap_device as well > > - Add few more comments about rte_eal_pci_map_device(). This patch tested for vfio-noIOMMU mode for arm64 platform. Tested-By: Santosh Shukla
[dpdk-dev] [PATCH v3 06/12] eal: pci: vfio: add rd/wr func for pci bar space
On Thu, Jan 07, 2016 at 10:19:25AM -0800, Stephen Hemminger wrote: > On Thu, 7 Jan 2016 22:03:03 +0530 > Santosh Shukla wrote: > > > > > +int rte_eal_pci_read_bar(const struct rte_pci_device *device __rte_unused, > > +void *buf __rte_unused, > > +size_t len __rte_unused, > > +off_t offset __rte_unused, > > +int bar_idx __rte_unused) > > +{ > > +#ifdef VFIO_PRESENT > > + const struct rte_intr_handle *intr_handle = &device->intr_handle; > > + return pci_vfio_read_bar(intr_handle, buf, len, offset, bar_idx); > > +#else > > + return 0; /* UIO's not applicable */ > > +#endif > > +} > > It seems wrong to declare all the parameters as unused but then use them. > Maybe there is a way to have a macro for USED(x) in the #else case There is RTE_SET_USED() in rte_common.h. I'd prefer that macro used than the option of duplicating the entire function which really pads out the code. /Bruce
[dpdk-dev] [PATCH 0/4] virtio support for container
Hello, > > You can use below patch for l2fwd to send out an arp packet when it gets > started. I tried to send out arp packet using this patch but buffer allocation for arp packets itself gets failed: m = rte_pktmbuf_alloc(mp); Return a NULL Value. Thanks, Amit.
[dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation
On Wed, Jan 13, 2016 at 03:58:49PM +0800, Yuanhan Liu wrote: > Using two tabs for "if" (or "while") statements is a bit weird to me. > Also, using one tab unconditionaly for function definitions and > prototypes doesn't look great. > > Here I'd suggest to use the indentation style the Linux kernel > project prefers: to align with the open brace with tabs and > additonal spaces (when necessary). > > Example: > > static int > rte_eal_foo_bar(int a_long_argument_1, int another_long_argument_2, > struct foo *yet_another_long_argument_3) > > ret = rte_eal_foo_bar(a_long_argument_1, another_long_argument_2, > yet_another_long_argument_3); > > if (really_long_variable_name_1 == really_long_variable_name_2 && > var3 == var4) { > x = y + z; > ; > } > > Cc: Thomas Monjalon > Cc: Siobhan Butler > Cc: John McNamara > Signed-off-by: Yuanhan Liu > --- While it's not a big deal - yet is something likely to trigger massive discussion my objections to this style of indentation is two-fold: 1. It means that we are using a mix of tabs and spaces for indentation at the start of a line. I think it's more consistent that all whitespace at the start of a line should be either tabs or spaces, but not a mixture. 2. It makes how the code look much more dependent upon the tab-size being used in the viewer - being able to adjust how much whitespace is seem at the start of each line is a major advantage of using tabs rather than spaces for indentation. For anyone using a 4-character tabstop - probably the most popular tabstop value after an 8-char one - aligning an if-statement continuation to the opening brace will cause it to align with the body of the block. So, while the two-tab indent may look "a bit weird" it does solve the two issues above. I believe practical benefits should override initial impressions. [It took me a while to get used to also, but now I very much like it as a style.] Regards, /Bruce
[dpdk-dev] rte_prefetch0() is effective?
Prefetchs make a big difference because a powerful CPU like IA is always trying to find items to prefetch and the priority of these is not always easy to determine. This is especially a problem across subroutine calls since the compiler cannot determine what is of priority in the other subroutines and the runtime CPU logic cannot always have the future well predicted far enough in the future for all possible paths, especially if you have a cache miss, which takes eons of clock cycles to do a memory access probably resulting in a CPU stall. Until we get to the point of the computers full understanding the logic of the program and writing optimum code (putting programmers out of business) , the understanding of what is important as the program progresses gives the programmer knowledge of what is desirable to prefetch. It is difficult to determine if the CPU is going to have the same priority of the prefetch, so having a prefetch may or may not show up as a measureable performance improvement under some conditions, but having the prefetch decision in place can make prefetch priority decision correct in these other cases, which make a performance improvement. Removing a prefetch without thinking through and fully understanding the logic of why it is there, or what he added cost (in the case of calculating an address for the prefetch that affects other current operations) if any, is just plain amateur work. It is not to say people do not make bad judgments on what needs to be prefetched and put poor prefetch placement and should only be removed if not logically proper for expected runtime operation. Only more primitive CPUs with no prefetch capabilities don't benefit from properly placed prefetches. Mike -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson Sent: Wednesday, January 13, 2016 3:35 AM To: Moon-Sang Lee Cc: dev at dpdk.org Subject: Re: [dpdk-dev] rte_prefetch0() is effective? On Thu, Dec 24, 2015 at 03:35:14PM +0900, Moon-Sang Lee wrote: > I see codes as below in example directory, and I wonder it is effective. > Coherent IO is adopted to modern architectures, so I think that DMA > initiation by rte_eth_rx_burst() might already fulfills cache lines of > RX buffers. > Do I really need to call rte_prefetchX()? > > nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, > MAX_PKT_BURST); > ... > /* Prefetch and forward already prefetched packets */ > for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) { > rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[ > j + PREFETCH_OFFSET], void *)); > l3fwd_simple_forward(pkts_burst[j], portid, > qconf); > } > Good question. When the first example apps using this style of prefetch were originally written, yes, there was a noticable performance increase achieved by using the prefetch. Thereafter, I'm not sure that anyone has checked with each generation of platforms whether the prefetches are still necessary and how much they help, but I suspect that they still help a bit, and don't hurt performance. It would be an interesting exercise to check whether the prefetch offsets used in code like above can be adjusted to give better performance on our latest supported platforms. /Bruce
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
Hi Bruce. This is exactly what was possible before the patch and does *not* work after this patch. I think it is more related to the semantic of the API. My understanding of the API was that I could allocate a share mbuf (ref=1) and then attach it from many concurrent threads (as each thread attach opt inc in 1 and driver reduces, atomic by 1) After the patch there is a need to make the ref of the share mbuf to be 2 (why?) and free it twice at the end( ?), this makes it cumbersome to use it and for sure there is a need to describe this semantic change. However, this change is good for the common case, and I think it is better to keep it. The options I see are: 1) Document the current behavior 2) Create two sets of API with two types of semantic thanks, Hanoh -Original Message- From: Bruce Richardson [mailto:bruce.richard...@intel.com] Sent: Wednesday, January 13, 2016 1:48 PM To: Hanoch Haim (hhaim) Cc: Olivier MATZ; dev at dpdk.org; Ido Barnea (ibarnea); Itay Marom (imarom) Subject: Re: [dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update On Tue, Jan 05, 2016 at 11:11:24AM +, Hanoch Haim (hhaim) wrote: > Hi Oliver, > Thank you for the fast response and it would be great to open a discussion on > that. > In general our project can leverage your optimization and I think it is great > (we should have thought about it) . We can use it using the workaround I > described. > However, for me it seems odd that rte_pktmbuf_attach () that does not > *change* anything in m_const, except of the *atomic* ref counter does not > work in parallel. > The example I gave is a classic use case of rte_pktmbuf_attach (multicast ) > and I don't see why it wouldn't work after your optimization. > > Do you have a pointer to the documentation that state that that you can't > call the atomic ref counter from more than one thread? > Hi, actually, the issue is not that you can't work with the reference counter field from multiple threads, or that you can't use an mbuf from multiple threads, it's that if you are working with the same mbuf in multiple threads you have multiple references to the mbuf and your application must increase the reference counter appropriately. For example, if thread A is going to pass an mbuf to thread B and keep using it itself, you must increment the reference counter in thread A before enqueuing it to B. Regards, /Bruce
[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
On Wed, Jan 13, 2016 at 04:28:34PM +, Hanoch Haim (hhaim) wrote: > Hi Bruce. > This is exactly what was possible before the patch and does *not* work after > this patch. > I think it is more related to the semantic of the API. My understanding of > the API was that I could allocate a share mbuf (ref=1) and then attach it > from many concurrent threads (as each thread attach opt inc in 1 and driver > reduces, atomic by 1) > After the patch there is a need to make the ref of the share mbuf to be 2 > (why?) and free it twice at the end( ?), this makes it cumbersome to use it > and for sure there is a need to describe this semantic change. > There is no such thing as allocating a shared mbuf. Mbufs are always allocated with a reference count of one. To share an mbuf among threads, then the ref count must be increased. NOTE: this is a different operation to that of "attaching" another mbuf to it - it just happens that when attaching one mbuf to another you have to increase the reference count as the attaching mbuf acts like another thread holding a pointer to the original mbuf. Once finished with the mbuf, each thread (or pointing mbuf) needs to decrement the reference counter, and the last one holding the reference is then responsible for freeing that buffer. The easiest way to do this is to use the free call and let it handle the reference counts, but if it makes more logical sense in an app to do the decrement and checking itself (so that inc's and dec's match in the app code), then that can work too, obviously. > However, this change is good for the common case, and I think it is better to > keep it. > The options I see are: > 1) Document the current behavior > 2) Create two sets of API with two types of semantic Given the confusion I think documenting the way things are "meant to work" is a good idea. /Bruce > > thanks, > Hanoh > > -Original Message- > From: Bruce Richardson [mailto:bruce.richardson at intel.com] > Sent: Wednesday, January 13, 2016 1:48 PM > To: Hanoch Haim (hhaim) > Cc: Olivier MATZ; dev at dpdk.org; Ido Barnea (ibarnea); Itay Marom (imarom) > Subject: Re: [dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update > > On Tue, Jan 05, 2016 at 11:11:24AM +, Hanoch Haim (hhaim) wrote: > > Hi Oliver, > > Thank you for the fast response and it would be great to open a discussion > > on that. > > In general our project can leverage your optimization and I think it is > > great (we should have thought about it) . We can use it using the > > workaround I described. > > However, for me it seems odd that rte_pktmbuf_attach () that does not > > *change* anything in m_const, except of the *atomic* ref counter does not > > work in parallel. > > The example I gave is a classic use case of rte_pktmbuf_attach (multicast > > ) and I don't see why it wouldn't work after your optimization. > > > > Do you have a pointer to the documentation that state that that you can't > > call the atomic ref counter from more than one thread? > > > Hi, > > actually, the issue is not that you can't work with the reference counter > field from multiple threads, or that you can't use an mbuf from multiple > threads, it's that if you are working with the same mbuf in multiple threads > you have multiple references to the mbuf and your application must increase > the reference counter appropriately. For example, if thread A is going to > pass an mbuf to thread B and keep using it itself, you must increment the > reference counter in thread A before enqueuing it to B. > > Regards, > /Bruce >
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
On Wed, 13 Jan 2016 12:36:09 + Anatoly Burakov wrote: > +/* IOMMU types we support */ > +static const struct vfio_iommu_type iommu_types[] = { > + /* x86 IOMMU, otherwise known as type 1 */ > + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map}, > + /* IOMMU-less mode */ > + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map}, > +}; > + Nit.. Why full-tab indent here?
[dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation
On Wed, 13 Jan 2016 15:07:08 + Bruce Richardson wrote: > So, while the two-tab indent may look "a bit weird" it does solve the two > issues > above. I believe practical benefits should override initial impressions. [It > took > me a while to get used to also, but now I very much like it as a style.] I don't think that deviating from kernel style for this case is justified.
[dpdk-dev] UX Bug in Sphinx HTML Layout for Programming Guide (and maybe other guides?)
When you go to this link: http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html There is a bug in the Sphinx layout, where the subchapters of a chapter are invisible even after the chapter is clicked. It is a pain when you are trying to figure out the different sections in a widely variable chapter like the performance chapter, to know which section to read without having to click one-by-one to find the right part. I am not sure about what it would take to show the subchapters in the outline view to make this easier for someone to figure out. Or maybe a TOC for a chapter at the beginning of the chapter? This would really help a lot. Sincerely, Matthew.
[dpdk-dev] rte_prefetch0() is effective?
On Wed, Jan 13, 2016 at 11:34:33AM +, Bruce Richardson wrote: > When the first example apps using this style of prefetch were originally > written, yes, there was a noticable performance increase achieved by using > the prefetch. Thereafter, I'm not sure that anyone has checked with each > generation of platforms whether the prefetches are still necessary and how > much they help, but I suspect that they still help a bit, and don't hurt > performance. FYI, for me as a community member this paragraph describes one of my top irritations about DPDK. The Intel accelerations, such as adding prefetches, or support for new features like the librte_power, are treated as one-off projects not as ongoing technical efforts which need periodic retesting and maintenance. Thus it turned out that after waiting over a month for a reply, I eventually discovered librte_power probably never worked right at all since at least Sandy Bridge, which is a very old chip by now for servers. The accelerations are also treated like black magic. Meaning no comments are put in the code about how and why they work, so an outsider trying his best to measure things in VTune to help provide the ongoing testing and maintenance, can not tell why something was done or how it might be adjusted to work right in their environment if their hardware is older or newer than whatever undocumented hardware was used in developing the example. There's nowhere I know of that says the reference platform and core generation used for developing an example either so I could get some idea if it's current or old code. When I ask a high level question, such as "Which Intel accelerations should one make sure are enabled to get best performance?" it normally doesn't get any reply. This makes life difficult because there are many dozen accelerations listed in the data sheet of a typical modern Intel core and no guidance is provided on the priority of the different accelerations for DPDK. So I don't have a good idea about where to focus my time to get the best acceleration out of all the technology it must have cost Intel millions or billions to create. To me that's very sad. I am hoping maybe there are some resources we could make available to help understand the principles behind the accelerations so it is easier for the community to take part in maintaining them and maybe even helping create new ones. Note: I read through all the subchapters here: http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html None of them mention any CPU acceleration details whatsoever. They don't explain any specifics on prefetch or branch prediction. Only that they exist and do things. Sincerely, Matthew.
[dpdk-dev] [PATCH] app/testpmd Fix max_socket detection
Previously, max_socket was set to the highest numbered socket with an enabled lcore. The intent is to set it to the highest socket regardless of it being enabled. Change-Id: I6306af0f90aa3c1fc5ffed75d1eed8297d29e132 Signed-off-by: Stephen Hurd --- app/test-pmd/testpmd.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 6129c26..26a2cce 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -359,17 +359,17 @@ set_default_fwd_lcores_config(void) nb_lc = 0; for (i = 0; i < RTE_MAX_LCORE; i++) { + if (sock_num > max_socket) { + if (sock_num > RTE_MAX_NUMA_NODES) + rte_exit(EXIT_FAILURE, "Total sockets greater than %u\n", RTE_MAX_NUMA_NODES); + max_socket = sock_num; + } if (! rte_lcore_is_enabled(i)) continue; if (i == rte_get_master_lcore()) continue; fwd_lcores_cpuids[nb_lc++] = i; sock_num = rte_lcore_to_socket_id(i) + 1; - if (sock_num > max_socket) { - if (sock_num > RTE_MAX_NUMA_NODES) - rte_exit(EXIT_FAILURE, "Total sockets greater than %u\n", RTE_MAX_NUMA_NODES); - max_socket = sock_num; - } } nb_lcores = (lcoreid_t) nb_lc; nb_cfg_lcores = nb_lcores; -- 1.9.1
[dpdk-dev] [PATCH v2] app/testpmd Fix max_socket detection
Previously, max_socket was set to the highest numbered socket with an enabled lcore. The intent is to set it to the highest socket regardless of it being enabled. Change-Id: I6306af0f90aa3c1fc5ffed75d1eed8297d29e132 Signed-off-by: Stephen Hurd v2: Forgot to commit before sending email... sorry for the nouse. --- app/test-pmd/testpmd.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 6129c26..a4088f9 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -359,17 +359,17 @@ set_default_fwd_lcores_config(void) nb_lc = 0; for (i = 0; i < RTE_MAX_LCORE; i++) { - if (! rte_lcore_is_enabled(i)) - continue; - if (i == rte_get_master_lcore()) - continue; - fwd_lcores_cpuids[nb_lc++] = i; sock_num = rte_lcore_to_socket_id(i) + 1; if (sock_num > max_socket) { if (sock_num > RTE_MAX_NUMA_NODES) rte_exit(EXIT_FAILURE, "Total sockets greater than %u\n", RTE_MAX_NUMA_NODES); max_socket = sock_num; } + if (! rte_lcore_is_enabled(i)) + continue; + if (i == rte_get_master_lcore()) + continue; + fwd_lcores_cpuids[nb_lc++] = i; } nb_lcores = (lcoreid_t) nb_lc; nb_cfg_lcores = nb_lcores; -- 1.9.1
[dpdk-dev] VFIO no-iommu
On Thu, 2016-01-14 at 14:03 +0800, Jike Song wrote: > On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson > wrote: > > > > So it works.??Is it acceptable???Useful???Sufficiently complete???Does > > it imply deprecating the uio interface???I believe the feature that > > started this discussion was support for MSI/X interrupts so that VFs > > can support some kind of interrupt (uio only supports INTx since it > > doesn't allow DMA).??Implementing that would be the ultimate test of > > whether this provides dpdk with not only a more consistent interface, > > but the feature dpdk wants that's missing in uio. Thanks, > > > Hi Alex, > > Sorry for jumping in.??Just being curious, how does VFIO No-IOMMU mode > support DMA from userspace drivers???If I understand correctly, due to > the absence of IOMMU, pcidev has to use physaddr to start a DMA > transaction, but how it is supposed to get physaddr from userspace > drivers, /proc//pagemap or something else? Hi Jike, vfio no-iommu does nothing to facilitate DMA mappings, UIO didn't either and DPDK managed to work with that. ?vfio no-iommu is meant to be an enabler and provide a consistent vfio device model (with MSI/X interrupts), but fundamentally the idea of a non-iommu protected, user owned device capable of DMA is unsupportable. ?This is why vfio no- iommu taints the kernel. ?With that in mind, one of the design goals is to introduce as little code as possible for vfio no-iommu. ?A new vfio iommu backend with pinning and virt-to-bus translation goes against that design goal. ?I don't know the details of how DPDK did this with UIO, but the same lack of DMA mapping facilities is present with vfio no-iommu. ?It really just brings the vfio device model, nothing more. Thanks, Alex