Thanks for taking the time to review this Daniele, I will submit a new V3 this evening with the required changes.
Thanks Ian From: Daniele Di Proietto [mailto:diproiet...@ovn.org] Sent: Tuesday, May 24, 2016 1:42 AM To: Stokes, Ian Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH v2 2/2] netdev-dpdk.c: Add ingress-policing functionality. Thanks for the patch, it's simpler than I expected (all the infrastructure was already there). Minor comments inline, otherwise this looks good to me. If you agree with the comments, would you mind sending another version? Daniele 2016-05-10 2:19 GMT-07:00 Ian Stokes <ian.sto...@intel.com<mailto:ian.sto...@intel.com>>: This patch provides the modifications required in netdev-dpdk.c and vswitch.xml to enable ingress policing for DPDK interfaces. This patch implements the necessary netdev functions to netdev-dpdk.c as well as various helper functions required for ingress policing. The vswitch.xml has been modified to explain the expected parameters and behaviour when using ingress policing. The INSTALL.DPDK.md<http://INSTALL.DPDK.md> guide has been modified to provide an example configuration of ingress policing. Signed-off-by: Ian Stokes <ian.sto...@intel.com<mailto:ian.sto...@intel.com>> --- v2: - Rebase to latest master. *netdev-dpdk.c - Modied 'netdev_dpdk_set_policing()' to set default value of of policer_burst to 8000 kbits. - Modified 'netdev_dpdk_policer_construct__()' to use '1000/8' for clarity instead of '125' when converting kbits to bytes for rate and burst. v1: *INSTALL.DPDK.md<http://INSTALL.DPDK.md> - Add ingress_policing usage example. *NEWS - Add support for ingress_policing to DPDK section. *netdev-dpdk.c - Add ingress_policer struct. - Modify struct netdev-dpdk to include a ovsrcu pointer to an ingress policer. - Modify netdev_dpdk_init() to initialise ingress policer to null. - Add function 'ingress_policer_run()'to process packets with a given netdev, rte_mbuf and packet count. - Modified 'netdev_dpdk_vhost_update_rx_counters()' to accept new parameter int dropped representing number of packets dropped. - Modified 'netdev_dpdk_vhost_update_rx_counters()' to update number of dropped packets in stats->rx_dropped. - Modified 'netdev_dpdk_vhost_rxq_recv()' to check and call ingress_policing functionality with 'ingress_policer_run()'. - Modified 'netdev_dpdk_rxq_recv()' to check and call ingress_policing functionality with 'ingress_policer_run()'. - Modified 'netdev_dpdk_rxq_recv()' to update rx dropped stats. - Modified 'netdev_dpdk_vhost_get_stats()' to add support for rx_dropped. - Add function 'netdev_dpdk_policer_construct__()' to create an ingress_policer. - Add function 'netdev_dpdk_policer_destruct__()' to destroy an ingress policer. - Add function 'netdev_dpdk_set_policing()' responsible for calling helper functions to create and destroy an ingress policer for the device. - Add function 'netdev_dpdk_get_ingress_policer()' to return a pointer to the ingress_policer using the ovsrcu_get function. *vswitch.xml - Modify ingress policing section to include support in OVS with DPDK. --- INSTALL.DPDK.md<http://INSTALL.DPDK.md> | 19 ++++++ NEWS | 1 + lib/netdev-dpdk.c | 173 ++++++++++++++++++++++++++++++++++++++++++++++++- vswitchd/vswitch.xml | 4 +- 4 files changed, 191 insertions(+), 6 deletions(-) diff --git a/INSTALL.DPDK.md<http://INSTALL.DPDK.md> b/INSTALL.DPDK.md<http://INSTALL.DPDK.md> index 93f92e4..68735cc 100644 --- a/INSTALL.DPDK.md<http://INSTALL.DPDK.md> +++ b/INSTALL.DPDK.md<http://INSTALL.DPDK.md> @@ -267,6 +267,25 @@ Using the DPDK with ovs-vswitchd: For more details regarding egress-policer parameters please refer to the vswitch.xml. +9. Ingress Policing Example + + Assuming you have a vhost-user port receiving traffic consisting of + packets of size 64 bytes, the following command would limit the reception + rate of the port to ~1,000,000 packets per second: + + `ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 + ingress_policing_burst=1000` + + To examine the ingress policer configuration of the port: + + `ovs-vsctl list interface vhost-user0` + + To clear the ingress policer configuration from the port use the following: + + `ovs-vsctl set interface vhost-user0 ingress_policing_rate=0` + + For more details regarding ingress-policer see the vswitch.xml. + Performance Tuning: ------------------- diff --git a/NEWS b/NEWS index 4e81cad..ba201cf 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,7 @@ Post-v2.5.0 * DB entries have been added for many of the DPDK EAL command line arguments. Additional arguments can be passed via the dpdk-extra entry. + * Add ingress policing functionality. - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - ovs-appctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0ed909c..dd4fa0a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -328,6 +328,12 @@ struct dpdk_ring { struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); }; +struct ingress_policer { + struct rte_meter_srtcm_params app_srtcm_params; + struct rte_meter_srtcm in_policer; + rte_spinlock_t policer_lock; +}; + struct netdev_dpdk { struct netdev up; int port_id; @@ -373,6 +379,10 @@ struct netdev_dpdk { struct qos_conf *qos_conf; rte_spinlock_t qos_lock; + /* Ingress Policer */ + OVSRCU_TYPE(struct ingress_policer *) ingress_policer; + uint32_t policer_rate; + uint32_t policer_burst; }; struct netdev_rxq_dpdk { @@ -386,6 +396,9 @@ static int netdev_dpdk_construct(struct netdev *); struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev); +struct ingress_policer * +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev); + static bool is_dpdk_class(const struct netdev_class *class) { @@ -759,6 +772,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, dev->qos_conf = NULL; rte_spinlock_init(&dev->qos_lock); + /* Initialise rcu pointer for ingress policer to NULL */ + ovsrcu_init(&dev->ingress_policer, NULL); + dev->policer_rate = 0; + dev->policer_burst = 0; + netdev->n_txq = NR_QUEUE; netdev->n_rxq = NR_QUEUE; netdev->requested_n_rxq = NR_QUEUE; @@ -1198,6 +1216,22 @@ netdev_dpdk_policer_run__(struct rte_meter_srtcm *meter, return cnt; } +static int +ingress_policer_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, + int pkt_cnt) +{ + int cnt = 0; + struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); + + if (policer) { + rte_spinlock_lock(&policer->policer_lock); + cnt = netdev_dpdk_policer_run__(&policer->in_policer, pkts, pkt_cnt); + rte_spinlock_unlock(&policer->policer_lock); + } + + return cnt; +} + The callers to the above function already have to check for 'policer'. Maybe it'd be clearer to pass 'policer' instead of 'dev' and drop the if? static bool is_vhost_running(struct virtio_net *virtio_dev) { @@ -1232,13 +1266,15 @@ netdev_dpdk_vhost_update_rx_size_counters(struct netdev_stats *stats, static inline void netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats, - struct dp_packet **packets, int count) + struct dp_packet **packets, int count, + int dropped) { int i; unsigned int packet_size; struct dp_packet *packet; stats->rx_packets += count; + stats->rx_dropped += dropped; for (i = 0; i < count; i++) { packet = packets[i]; packet_size = dp_packet_size(packet); @@ -1273,7 +1309,9 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev); int qid = rxq->queue_id; + struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); uint16_t nb_rx = 0; + uint16_t dropped = 0; if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { return EAGAIN; @@ -1291,8 +1329,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, return EAGAIN; } + if (policer != NULL) { Minor: I would drop the "!= NULL" + dropped = nb_rx; + nb_rx = ingress_policer_run(dev, (struct rte_mbuf **)packets, nb_rx); + dropped -= nb_rx; + } + rte_spinlock_lock(&dev->stats_lock); - netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx); + netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx, dropped); rte_spinlock_unlock(&dev->stats_lock); *c = (int) nb_rx; @@ -1305,7 +1349,9 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet **packets, { struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); int nb_rx; + int dropped = 0; /* There is only one tx queue for this core. Do not flush other * queues. @@ -1323,6 +1369,19 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet **packets, return EAGAIN; } + if (policer != NULL) { same here + dropped = nb_rx; + nb_rx = ingress_policer_run(dev, (struct rte_mbuf **) packets, nb_rx); + dropped -= nb_rx; + } + + /* Update stats to reflect dropped packets */ + if (OVS_UNLIKELY(dropped)) { + rte_spinlock_lock(&dev->stats_lock); + dev->stats.rx_dropped += dropped; + rte_spinlock_unlock(&dev->stats_lock); + } + *c = nb_rx; return 0; @@ -1756,6 +1815,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, /* Supported Stats */ stats->rx_packets += dev->stats.rx_packets; stats->tx_packets += dev->stats.tx_packets; + stats->rx_dropped = dev->stats.rx_dropped; stats->tx_dropped += dev->stats.tx_dropped; stats->multicast = dev->stats.multicast; stats->rx_bytes = dev->stats.rx_bytes; @@ -1882,11 +1942,12 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) rte_spinlock_lock(&dev->stats_lock); stats->tx_dropped = dev->stats.tx_dropped; + stats->rx_dropped = dev->stats.rx_dropped; rte_spinlock_unlock(&dev->stats_lock); /* These are the available DPDK counters for packets not received due to * local resource constraints in DPDK and NIC respectively. */ - stats->rx_dropped = rte_stats.rx_nombuf + rte_stats.imissed; + stats->rx_dropped += rte_stats.rx_nombuf + rte_stats.imissed; stats->rx_missed_errors = rte_stats.imissed; ovs_mutex_unlock(&dev->mutex); @@ -1941,6 +2002,104 @@ netdev_dpdk_get_features(const struct netdev *netdev, } static int +netdev_dpdk_policer_construct__(struct netdev *netdev_, uint32_t rate, + uint32_t burst) I would drop the "__" part from the name. +{ + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); + struct ingress_policer *policer; + uint64_t rate_bytes; + uint64_t burst_bytes; + int err = 0; + + policer = xmalloc(sizeof *policer); + rte_spinlock_init(&policer->policer_lock); + + /* rte_meter requires bytes so convert kbits rate and burst to bytes. */ + rate_bytes = rate * 1000/8; + burst_bytes = burst * 1000/8; + + policer->app_srtcm_params.cir = rate_bytes; + policer->app_srtcm_params.cbs = burst_bytes; + policer->app_srtcm_params.ebs = 0; + err = rte_meter_srtcm_config(&policer->in_policer, + &policer->app_srtcm_params); + ovsrcu_set(&netdev->ingress_policer, policer); + + return err; +} IMHO it will be clearer if this function returns 'policer', instead of setting it in 'netdev_' + +static int netdev_dpdk_policer_destruct__(struct ingress_policer * policer) +{ + if (policer == NULL) { + return 0; + } + else { + free(policer); + } + return 0; +} There's no need for this function. free(NULL) is perfectly valid (plus, the caller already checks). + +static int +netdev_dpdk_set_policing(struct netdev* netdev_, uint32_t policer_rate, + uint32_t policer_burst) +{ + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); Not a big deal, but in the rest of the functions we use 'netdev' for 'struct netdev *', and 'dev' for 'struct netdev_dpdk *' (since commit d46285a2206f) + struct ingress_policer *policer; + int err = 0; There's no way for this function to fail, so this variable is unnecessary. + + /* Force to 0 if no rate specified, + * default to 8000 kbits if burst is 0, + * else stick with user-specified value. + */ + policer_burst = (!policer_rate ? 0 + : !policer_burst ? 8000 + : policer_burst); + + ovs_mutex_lock(&netdev->mutex); + + policer = ovsrcu_get_protected(struct ingress_policer *, + &netdev->ingress_policer); + + if (netdev->policer_rate == policer_rate && + netdev->policer_burst == policer_burst) { + /* Assume that settings haven't changed since we last set them. */ + ovs_mutex_unlock(&netdev->mutex); + return err; + } + + /* Destroy any existing ingress policer for the device if one exists + * and the policer_rate and policer_burst are being set to 0 + */ + if ((policer != NULL) && (policer_rate == 0) && (policer_burst == 0)) { + /* User has set rate and burst to 0, destroy the policer */ + ovsrcu_postpone(netdev_dpdk_policer_destruct__, policer); + ovsrcu_set(&netdev->ingress_policer, NULL); + netdev->policer_rate = 0; + netdev->policer_burst = 0; + ovs_mutex_unlock(&netdev->mutex); + return err; + } + + /* Destroy existing policer before adding new policer */ + if (policer != NULL) { + ovsrcu_postpone(netdev_dpdk_policer_destruct__, policer); + ovsrcu_set(&netdev->ingress_policer, NULL); + netdev->policer_rate = 0; + netdev->policer_burst = 0; + } + + /* Add an ingress policer */ + err = netdev_dpdk_policer_construct__(netdev_, policer_rate, + policer_burst); + + /* Update policer_rate and policer_burst for the netdev */ + netdev->policer_rate = policer_rate; + netdev->policer_burst = policer_burst; + ovs_mutex_unlock(&netdev->mutex); The above block seems more complicated than necessary. How about something like this? if (policer) { ovsrcu_postpone(free, policer); } if (policer_rate != 0) { new_policer = netdev_dpdk_policer_construct__(rate, burst); } else { new_policer = NULL; } ovsrcu_set(&netdev->ingress_policer, new_policer); netdev->policer_rate = policer_rate; netdev->policer_burst = policer_burst; ovs_mutex_unlock(&netdev->mutex); + return err; +} + +static int netdev_dpdk_get_ifindex(const struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -2387,6 +2546,12 @@ netdev_dpdk_get_virtio(const struct netdev_dpdk *dev) return ovsrcu_get(struct virtio_net *, &dev->virtio_dev); } +struct ingress_policer * +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev) +{ + return ovsrcu_get(struct ingress_policer *, &dev->ingress_policer); +} + /* * These callbacks allow virtio-net devices to be added to vhost ports when * configuration has been fully complete. @@ -2827,7 +2992,7 @@ static const struct dpdk_qos_ops egress_policer_ops = { GET_FEATURES, \ NULL, /* set_advertisements */ \ \ - NULL, /* set_policing */ \ + netdev_dpdk_set_policing, /* set_policing */ \ Minor: since the member is not NULL anymore we can remove the comment netdev_dpdk_get_qos_types, \ NULL, /* get_qos_capabilities */ \ netdev_dpdk_get_qos, \ diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 944d8ec..0c9e60c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2502,8 +2502,8 @@ table="Queue"/> tables). </p> <p> - Policing is currently implemented only on Linux. The Linux - implementation uses a simple ``token bucket'' approach: + Policing is currently implemented on Linux and OVS with DPDK. Both + implementations use a simple ``token bucket'' approach: </p> <ul> <li> -- 1.7.4.1 _______________________________________________ dev mailing list dev@openvswitch.org<mailto:dev@openvswitch.org> http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev