Thanks for the patch, it's simpler than I expected (all the infrastructure was already there).
Minor comments inline, otherwise this looks good to me. If you agree with the comments, would you mind sending another version? Daniele 2016-05-10 2:19 GMT-07:00 Ian Stokes <ian.sto...@intel.com>: > This patch provides the modifications required in netdev-dpdk.c and > vswitch.xml to enable ingress policing for DPDK interfaces. > > This patch implements the necessary netdev functions to netdev-dpdk.c as > well as various helper functions required for ingress policing. > > The vswitch.xml has been modified to explain the expected parameters and > behaviour when using ingress policing. > > The INSTALL.DPDK.md guide has been modified to provide an example > configuration of ingress policing. > > Signed-off-by: Ian Stokes <ian.sto...@intel.com> > --- > > v2: > - Rebase to latest master. > > *netdev-dpdk.c > - Modied 'netdev_dpdk_set_policing()' to set default value of of > policer_burst to 8000 kbits. > - Modified 'netdev_dpdk_policer_construct__()' to use '1000/8' for clarity > instead of '125' when converting kbits to bytes for rate and burst. > > v1: > > *INSTALL.DPDK.md > - Add ingress_policing usage example. > > *NEWS > - Add support for ingress_policing to DPDK section. > > *netdev-dpdk.c > - Add ingress_policer struct. > - Modify struct netdev-dpdk to include a ovsrcu pointer to an ingress > policer. > - Modify netdev_dpdk_init() to initialise ingress policer to null. > - Add function 'ingress_policer_run()'to process packets with a given > netdev, rte_mbuf and packet count. > - Modified 'netdev_dpdk_vhost_update_rx_counters()' to accept new > parameter int dropped representing number of packets dropped. > - Modified 'netdev_dpdk_vhost_update_rx_counters()' to update number of > dropped packets in stats->rx_dropped. > - Modified 'netdev_dpdk_vhost_rxq_recv()' to check and call > ingress_policing functionality with 'ingress_policer_run()'. > - Modified 'netdev_dpdk_rxq_recv()' to check and call > ingress_policing functionality with 'ingress_policer_run()'. > - Modified 'netdev_dpdk_rxq_recv()' to update rx dropped stats. > - Modified 'netdev_dpdk_vhost_get_stats()' to add support for rx_dropped. > - Add function 'netdev_dpdk_policer_construct__()' to create an > ingress_policer. > - Add function 'netdev_dpdk_policer_destruct__()' to destroy an ingress > policer. > - Add function 'netdev_dpdk_set_policing()' responsible for calling > helper functions to create and destroy an ingress policer for the > device. > - Add function 'netdev_dpdk_get_ingress_policer()' to return a pointer > to the ingress_policer using the ovsrcu_get function. > > *vswitch.xml > - Modify ingress policing section to include support in OVS with DPDK. > --- > INSTALL.DPDK.md | 19 ++++++ > NEWS | 1 + > lib/netdev-dpdk.c | 173 > ++++++++++++++++++++++++++++++++++++++++++++++++- > vswitchd/vswitch.xml | 4 +- > 4 files changed, 191 insertions(+), 6 deletions(-) > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > index 93f92e4..68735cc 100644 > --- a/INSTALL.DPDK.md > +++ b/INSTALL.DPDK.md > @@ -267,6 +267,25 @@ Using the DPDK with ovs-vswitchd: > For more details regarding egress-policer parameters please refer to > the > vswitch.xml. > > +9. Ingress Policing Example > + > + Assuming you have a vhost-user port receiving traffic consisting of > + packets of size 64 bytes, the following command would limit the > reception > + rate of the port to ~1,000,000 packets per second: > + > + `ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 > + ingress_policing_burst=1000` > + > + To examine the ingress policer configuration of the port: > + > + `ovs-vsctl list interface vhost-user0` > + > + To clear the ingress policer configuration from the port use the > following: > + > + `ovs-vsctl set interface vhost-user0 ingress_policing_rate=0` > + > + For more details regarding ingress-policer see the vswitch.xml. > + > Performance Tuning: > ------------------- > > diff --git a/NEWS b/NEWS > index 4e81cad..ba201cf 100644 > --- a/NEWS > +++ b/NEWS > @@ -32,6 +32,7 @@ Post-v2.5.0 > * DB entries have been added for many of the DPDK EAL command line > arguments. Additional arguments can be passed via the dpdk-extra > entry. > + * Add ingress policing functionality. > - ovs-benchmark: This utility has been removed due to lack of use and > bitrot. > - ovs-appctl: > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 0ed909c..dd4fa0a 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -328,6 +328,12 @@ struct dpdk_ring { > struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); > }; > > +struct ingress_policer { > + struct rte_meter_srtcm_params app_srtcm_params; > + struct rte_meter_srtcm in_policer; > + rte_spinlock_t policer_lock; > +}; > + > struct netdev_dpdk { > struct netdev up; > int port_id; > @@ -373,6 +379,10 @@ struct netdev_dpdk { > struct qos_conf *qos_conf; > rte_spinlock_t qos_lock; > > + /* Ingress Policer */ > + OVSRCU_TYPE(struct ingress_policer *) ingress_policer; > + uint32_t policer_rate; > + uint32_t policer_burst; > }; > > struct netdev_rxq_dpdk { > @@ -386,6 +396,9 @@ static int netdev_dpdk_construct(struct netdev *); > > struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev); > > +struct ingress_policer * > +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev); > + > static bool > is_dpdk_class(const struct netdev_class *class) > { > @@ -759,6 +772,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int > port_no, > dev->qos_conf = NULL; > rte_spinlock_init(&dev->qos_lock); > > + /* Initialise rcu pointer for ingress policer to NULL */ > + ovsrcu_init(&dev->ingress_policer, NULL); > + dev->policer_rate = 0; > + dev->policer_burst = 0; > + > netdev->n_txq = NR_QUEUE; > netdev->n_rxq = NR_QUEUE; > netdev->requested_n_rxq = NR_QUEUE; > @@ -1198,6 +1216,22 @@ netdev_dpdk_policer_run__(struct rte_meter_srtcm > *meter, > return cnt; > } > > +static int > +ingress_policer_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, > + int pkt_cnt) > +{ > + int cnt = 0; > + struct ingress_policer *policer = > netdev_dpdk_get_ingress_policer(dev); > + > + if (policer) { > + rte_spinlock_lock(&policer->policer_lock); > + cnt = netdev_dpdk_policer_run__(&policer->in_policer, pkts, > pkt_cnt); > + rte_spinlock_unlock(&policer->policer_lock); > + } > + > + return cnt; > +} > + > The callers to the above function already have to check for 'policer'. Maybe it'd be clearer to pass 'policer' instead of 'dev' and drop the if? > static bool > is_vhost_running(struct virtio_net *virtio_dev) > { > @@ -1232,13 +1266,15 @@ netdev_dpdk_vhost_update_rx_size_counters(struct > netdev_stats *stats, > > static inline void > netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats, > - struct dp_packet **packets, int > count) > + struct dp_packet **packets, int > count, > + int dropped) > { > int i; > unsigned int packet_size; > struct dp_packet *packet; > > stats->rx_packets += count; > + stats->rx_dropped += dropped; > for (i = 0; i < count; i++) { > packet = packets[i]; > packet_size = dp_packet_size(packet); > @@ -1273,7 +1309,9 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, > struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); > struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev); > int qid = rxq->queue_id; > + struct ingress_policer *policer = > netdev_dpdk_get_ingress_policer(dev); > uint16_t nb_rx = 0; > + uint16_t dropped = 0; > > if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { > return EAGAIN; > @@ -1291,8 +1329,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, > return EAGAIN; > } > > + if (policer != NULL) { > Minor: I would drop the "!= NULL" > + dropped = nb_rx; > + nb_rx = ingress_policer_run(dev, (struct rte_mbuf **)packets, > nb_rx); > + dropped -= nb_rx; > + } > + > rte_spinlock_lock(&dev->stats_lock); > - netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx); > + netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx, > dropped); > rte_spinlock_unlock(&dev->stats_lock); > > *c = (int) nb_rx; > @@ -1305,7 +1349,9 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct > dp_packet **packets, > { > struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); > struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); > + struct ingress_policer *policer = > netdev_dpdk_get_ingress_policer(dev); > int nb_rx; > + int dropped = 0; > > /* There is only one tx queue for this core. Do not flush other > * queues. > @@ -1323,6 +1369,19 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct > dp_packet **packets, > return EAGAIN; > } > > + if (policer != NULL) { > same here > + dropped = nb_rx; > + nb_rx = ingress_policer_run(dev, (struct rte_mbuf **) packets, > nb_rx); > + dropped -= nb_rx; > + } > + > + /* Update stats to reflect dropped packets */ > + if (OVS_UNLIKELY(dropped)) { > + rte_spinlock_lock(&dev->stats_lock); > + dev->stats.rx_dropped += dropped; > + rte_spinlock_unlock(&dev->stats_lock); > + } > + > *c = nb_rx; > > return 0; > @@ -1756,6 +1815,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev > *netdev, > /* Supported Stats */ > stats->rx_packets += dev->stats.rx_packets; > stats->tx_packets += dev->stats.tx_packets; > + stats->rx_dropped = dev->stats.rx_dropped; > stats->tx_dropped += dev->stats.tx_dropped; > stats->multicast = dev->stats.multicast; > stats->rx_bytes = dev->stats.rx_bytes; > @@ -1882,11 +1942,12 @@ netdev_dpdk_get_stats(const struct netdev *netdev, > struct netdev_stats *stats) > > rte_spinlock_lock(&dev->stats_lock); > stats->tx_dropped = dev->stats.tx_dropped; > + stats->rx_dropped = dev->stats.rx_dropped; > rte_spinlock_unlock(&dev->stats_lock); > > /* These are the available DPDK counters for packets not received due > to > * local resource constraints in DPDK and NIC respectively. */ > - stats->rx_dropped = rte_stats.rx_nombuf + rte_stats.imissed; > + stats->rx_dropped += rte_stats.rx_nombuf + rte_stats.imissed; > stats->rx_missed_errors = rte_stats.imissed; > > ovs_mutex_unlock(&dev->mutex); > @@ -1941,6 +2002,104 @@ netdev_dpdk_get_features(const struct netdev > *netdev, > } > > static int > +netdev_dpdk_policer_construct__(struct netdev *netdev_, uint32_t rate, > + uint32_t burst) > I would drop the "__" part from the name. > +{ > + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > + struct ingress_policer *policer; > + uint64_t rate_bytes; > + uint64_t burst_bytes; > + int err = 0; > + > + policer = xmalloc(sizeof *policer); > + rte_spinlock_init(&policer->policer_lock); > + > + /* rte_meter requires bytes so convert kbits rate and burst to bytes. > */ > + rate_bytes = rate * 1000/8; > + burst_bytes = burst * 1000/8; > + > + policer->app_srtcm_params.cir = rate_bytes; > + policer->app_srtcm_params.cbs = burst_bytes; > + policer->app_srtcm_params.ebs = 0; > + err = rte_meter_srtcm_config(&policer->in_policer, > + &policer->app_srtcm_params); > + ovsrcu_set(&netdev->ingress_policer, policer); > + > + return err; > +} > IMHO it will be clearer if this function returns 'policer', instead of setting it in 'netdev_' > + > +static int netdev_dpdk_policer_destruct__(struct ingress_policer * > policer) > +{ > + if (policer == NULL) { > + return 0; > + } > + else { > + free(policer); > + } > + return 0; > +} > There's no need for this function. free(NULL) is perfectly valid (plus, the caller already checks). > + > +static int > +netdev_dpdk_set_policing(struct netdev* netdev_, uint32_t policer_rate, > + uint32_t policer_burst) > +{ > + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > Not a big deal, but in the rest of the functions we use 'netdev' for 'struct netdev *', and 'dev' for 'struct netdev_dpdk *' (since commit d46285a2206f) > + struct ingress_policer *policer; > + int err = 0; > There's no way for this function to fail, so this variable is unnecessary. > + > + /* Force to 0 if no rate specified, > + * default to 8000 kbits if burst is 0, > + * else stick with user-specified value. > + */ > + policer_burst = (!policer_rate ? 0 > + : !policer_burst ? 8000 > + : policer_burst); > + > + ovs_mutex_lock(&netdev->mutex); > + > + policer = ovsrcu_get_protected(struct ingress_policer *, > + &netdev->ingress_policer); > + > + if (netdev->policer_rate == policer_rate && > + netdev->policer_burst == policer_burst) { > + /* Assume that settings haven't changed since we last set them. */ > + ovs_mutex_unlock(&netdev->mutex); > + return err; > + } > + > > + /* Destroy any existing ingress policer for the device if one exists > + * and the policer_rate and policer_burst are being set to 0 > + */ > + if ((policer != NULL) && (policer_rate == 0) && (policer_burst == 0)) > { > + /* User has set rate and burst to 0, destroy the policer */ > + ovsrcu_postpone(netdev_dpdk_policer_destruct__, policer); > + ovsrcu_set(&netdev->ingress_policer, NULL); > + netdev->policer_rate = 0; > + netdev->policer_burst = 0; > + ovs_mutex_unlock(&netdev->mutex); > + return err; > + } > + > + /* Destroy existing policer before adding new policer */ > + if (policer != NULL) { > + ovsrcu_postpone(netdev_dpdk_policer_destruct__, policer); > + ovsrcu_set(&netdev->ingress_policer, NULL); > + netdev->policer_rate = 0; > + netdev->policer_burst = 0; > + } > + > + /* Add an ingress policer */ > + err = netdev_dpdk_policer_construct__(netdev_, policer_rate, > + policer_burst); > + > + /* Update policer_rate and policer_burst for the netdev */ > + netdev->policer_rate = policer_rate; > + netdev->policer_burst = policer_burst; > + ovs_mutex_unlock(&netdev->mutex); > The above block seems more complicated than necessary. How about something like this? if (policer) { ovsrcu_postpone(free, policer); } if (policer_rate != 0) { new_policer = netdev_dpdk_policer_construct__(rate, burst); } else { new_policer = NULL; } ovsrcu_set(&netdev->ingress_policer, new_policer); netdev->policer_rate = policer_rate; netdev->policer_burst = policer_burst; ovs_mutex_unlock(&netdev->mutex); > + return err; > +} > + > +static int > netdev_dpdk_get_ifindex(const struct netdev *netdev) > { > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > @@ -2387,6 +2546,12 @@ netdev_dpdk_get_virtio(const struct netdev_dpdk > *dev) > return ovsrcu_get(struct virtio_net *, &dev->virtio_dev); > } > > +struct ingress_policer * > +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev) > +{ > + return ovsrcu_get(struct ingress_policer *, &dev->ingress_policer); > +} > + > /* > * These callbacks allow virtio-net devices to be added to vhost ports > when > * configuration has been fully complete. > @@ -2827,7 +2992,7 @@ static const struct dpdk_qos_ops egress_policer_ops > = { > GET_FEATURES, \ > NULL, /* set_advertisements */ \ > \ > - NULL, /* set_policing */ \ > + netdev_dpdk_set_policing, /* set_policing */ \ > Minor: since the member is not NULL anymore we can remove the comment > netdev_dpdk_get_qos_types, \ > NULL, /* get_qos_capabilities */ \ > netdev_dpdk_get_qos, \ > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 944d8ec..0c9e60c 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -2502,8 +2502,8 @@ > table="Queue"/> tables). > </p> > <p> > - Policing is currently implemented only on Linux. The Linux > - implementation uses a simple ``token bucket'' approach: > + Policing is currently implemented on Linux and OVS with DPDK. Both > + implementations use a simple ``token bucket'' approach: > </p> > <ul> > <li> > -- > 1.7.4.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev