This patch provides the modifications required in netdev-dpdk.c and vswitch.xml to enable ingress policing for DPDK interfaces.
This patch implements the necessary netdev functions to netdev-dpdk.c as well as various helper functions required for ingress policing. The vswitch.xml has been modified to explain the expected parameters and behaviour when using ingress policing. The INSTALL.DPDK.md guide has been modified to provide an example configuration of ingress policing. Signed-off-by: Ian Stokes <ian.sto...@intel.com> --- v3: - Rebase to master. *netdev-dpdk.c - Modified calls to function ingress_policer_run() to use pointer to ingress_policer struct rather than pointer to netdev-dpdk struct. - Clean up policer check by removing '!= NULL' in if conditions in netdev_dpdk_rxq_recv(). - Clean up policer check by removing '!= NULL' in if conditions in netdev_dpdk_vhost_rxq_recv(). - Remove '__' suffix from 'netdev_dpdk_policer_construct()'. - Modified 'netdev_dpdk_policer_construct()' to return pointer to ingress_policer. - Modified 'netdev_dpdk_policer_construct()' to flag VLOG_ERR if creation of rte_meter fails. - Removed function netdev_dpdk_policer_destruct__() as it is not required any longer. - Use correct variable names for netdev-dpdk *dev and netdev *netdev though out all new functions. - Remove variable err from netdev_dpdk_set_policing() as function cannot fail. - Simplify netdev_dpdk_set_policing() function by using single call to free the existing policer before creating new policer, set the rate and burst once before function exits. - Remove comment for set policing in NETDEV_DPDK DEFINE as function is no longer null. v2: - Rebase to latest master. *netdev-dpdk.c - Modified 'netdev_dpdk_set_policing()' to set default value of policer_burst to 8000 kbits. - Modified 'netdev_dpdk_policer_construct__()' to use '1000/8' for clarity instead of '125' when converting kbits to bytes for rate and burst. v1: *INSTALL.DPDK.md - Add ingress_policing usage example. *NEWS - Add support for ingress_policing to DPDK section. *netdev-dpdk.c - Add ingress_policer struct. - Modify struct netdev-dpdk to include a ovsrcu pointer to an ingress policer. - Modify netdev_dpdk_init() to initialise ingress policer to null. - Add function 'ingress_policer_run()'to process packets with a given netdev, rte_mbuf and packet count. - Modified 'netdev_dpdk_vhost_update_rx_counters()' to accept new parameter int dropped representing number of packets dropped. - Modified 'netdev_dpdk_vhost_update_rx_counters()' to update number of dropped packets in stats->rx_dropped. - Modified 'netdev_dpdk_vhost_rxq_recv()' to check and call ingress_policing functionality with 'ingress_policer_run()'. - Modified 'netdev_dpdk_rxq_recv()' to check and call ingress_policing functionality with 'ingress_policer_run()'. - Modified 'netdev_dpdk_rxq_recv()' to update rx dropped stats. - Modified 'netdev_dpdk_vhost_get_stats()' to add support for rx_dropped. - Add function 'netdev_dpdk_policer_construct__()' to create an ingress_policer. - Add function 'netdev_dpdk_policer_destruct__()' to destroy an ingress policer. - Add function 'netdev_dpdk_set_policing()' responsible for calling helper functions to create and destroy an ingress policer for the device. - Add function 'netdev_dpdk_get_ingress_policer()' to return a pointer to the ingress_policer using the ovsrcu_get function. *vswitch.xml - Modify ingress policing section to include support in OVS with DPDK. --- INSTALL.DPDK.md | 19 +++++++ NEWS | 1 + lib/netdev-dpdk.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++-- vswitchd/vswitch.xml | 4 +- 4 files changed, 166 insertions(+), 6 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 93f92e4..68735cc 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -267,6 +267,25 @@ Using the DPDK with ovs-vswitchd: For more details regarding egress-policer parameters please refer to the vswitch.xml. +9. Ingress Policing Example + + Assuming you have a vhost-user port receiving traffic consisting of + packets of size 64 bytes, the following command would limit the reception + rate of the port to ~1,000,000 packets per second: + + `ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 + ingress_policing_burst=1000` + + To examine the ingress policer configuration of the port: + + `ovs-vsctl list interface vhost-user0` + + To clear the ingress policer configuration from the port use the following: + + `ovs-vsctl set interface vhost-user0 ingress_policing_rate=0` + + For more details regarding ingress-policer see the vswitch.xml. + Performance Tuning: ------------------- diff --git a/NEWS b/NEWS index 4e81cad..ba201cf 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,7 @@ Post-v2.5.0 * DB entries have been added for many of the DPDK EAL command line arguments. Additional arguments can be passed via the dpdk-extra entry. + * Add ingress policing functionality. - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - ovs-appctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c78f892..0389bb7 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -328,6 +328,12 @@ struct dpdk_ring { struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); }; +struct ingress_policer { + struct rte_meter_srtcm_params app_srtcm_params; + struct rte_meter_srtcm in_policer; + rte_spinlock_t policer_lock; +}; + struct netdev_dpdk { struct netdev up; int port_id; @@ -378,6 +384,11 @@ struct netdev_dpdk { * netdev_dpdk*_reconfigure() is called */ int requested_n_txq; int requested_n_rxq; + + /* Ingress Policer */ + OVSRCU_TYPE(struct ingress_policer *) ingress_policer; + uint32_t policer_rate; + uint32_t policer_burst; }; struct netdev_rxq_dpdk { @@ -391,6 +402,9 @@ static int netdev_dpdk_construct(struct netdev *); struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev); +struct ingress_policer * +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev); + static bool is_dpdk_class(const struct netdev_class *class) { @@ -764,6 +778,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, dev->qos_conf = NULL; rte_spinlock_init(&dev->qos_lock); + /* Initialise rcu pointer for ingress policer to NULL */ + ovsrcu_init(&dev->ingress_policer, NULL); + dev->policer_rate = 0; + dev->policer_burst = 0; + netdev->n_txq = NR_QUEUE; netdev->n_rxq = NR_QUEUE; dev->requested_n_rxq = NR_QUEUE; @@ -1136,6 +1155,19 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm *meter, return cnt; } +static int +ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, + int pkt_cnt) +{ + int cnt = 0; + + rte_spinlock_lock(&policer->policer_lock); + cnt = netdev_dpdk_policer_run(&policer->in_policer, pkts, pkt_cnt); + rte_spinlock_unlock(&policer->policer_lock); + + return cnt; +} + static bool is_vhost_running(struct virtio_net *virtio_dev) { @@ -1170,13 +1202,15 @@ netdev_dpdk_vhost_update_rx_size_counters(struct netdev_stats *stats, static inline void netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats, - struct dp_packet **packets, int count) + struct dp_packet **packets, int count, + int dropped) { int i; unsigned int packet_size; struct dp_packet *packet; stats->rx_packets += count; + stats->rx_dropped += dropped; for (i = 0; i < count; i++) { packet = packets[i]; packet_size = dp_packet_size(packet); @@ -1211,7 +1245,9 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev); int qid = rxq->queue_id; + struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); uint16_t nb_rx = 0; + uint16_t dropped = 0; if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { return EAGAIN; @@ -1229,8 +1265,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, return EAGAIN; } + if (policer) { + dropped = nb_rx; + nb_rx = ingress_policer_run(policer, (struct rte_mbuf **)packets, nb_rx); + dropped -= nb_rx; + } + rte_spinlock_lock(&dev->stats_lock); - netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx); + netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx, dropped); rte_spinlock_unlock(&dev->stats_lock); *c = (int) nb_rx; @@ -1243,7 +1285,9 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet **packets, { struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); int nb_rx; + int dropped = 0; /* There is only one tx queue for this core. Do not flush other * queues. @@ -1261,6 +1305,19 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet **packets, return EAGAIN; } + if (policer) { + dropped = nb_rx; + nb_rx = ingress_policer_run(policer, (struct rte_mbuf **) packets, nb_rx); + dropped -= nb_rx; + } + + /* Update stats to reflect dropped packets */ + if (OVS_UNLIKELY(dropped)) { + rte_spinlock_lock(&dev->stats_lock); + dev->stats.rx_dropped += dropped; + rte_spinlock_unlock(&dev->stats_lock); + } + *c = nb_rx; return 0; @@ -1694,6 +1751,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, /* Supported Stats */ stats->rx_packets += dev->stats.rx_packets; stats->tx_packets += dev->stats.tx_packets; + stats->rx_dropped = dev->stats.rx_dropped; stats->tx_dropped += dev->stats.tx_dropped; stats->multicast = dev->stats.multicast; stats->rx_bytes = dev->stats.rx_bytes; @@ -1821,11 +1879,12 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) rte_spinlock_lock(&dev->stats_lock); stats->tx_dropped = dev->stats.tx_dropped; + stats->rx_dropped = dev->stats.rx_dropped; rte_spinlock_unlock(&dev->stats_lock); /* These are the available DPDK counters for packets not received due to * local resource constraints in DPDK and NIC respectively. */ - stats->rx_dropped = rte_stats.rx_nombuf + rte_stats.imissed; + stats->rx_dropped += rte_stats.rx_nombuf + rte_stats.imissed; stats->rx_missed_errors = rte_stats.imissed; ovs_mutex_unlock(&dev->mutex); @@ -1879,6 +1938,81 @@ netdev_dpdk_get_features(const struct netdev *netdev, return 0; } +static struct ingress_policer * +netdev_dpdk_policer_construct(uint32_t rate, uint32_t burst) +{ + struct ingress_policer *policer = NULL; + uint64_t rate_bytes; + uint64_t burst_bytes; + int err = 0; + + policer = xmalloc(sizeof *policer); + rte_spinlock_init(&policer->policer_lock); + + /* rte_meter requires bytes so convert kbits rate and burst to bytes. */ + rate_bytes = rate * 1000/8; + burst_bytes = burst * 1000/8; + + policer->app_srtcm_params.cir = rate_bytes; + policer->app_srtcm_params.cbs = burst_bytes; + policer->app_srtcm_params.ebs = 0; + err = rte_meter_srtcm_config(&policer->in_policer, + &policer->app_srtcm_params); + if(err) { + VLOG_ERR("Could not create rte meter for ingress policer"); + return NULL; + } + + return policer; +} + +static int +netdev_dpdk_set_policing(struct netdev* netdev, uint32_t policer_rate, + uint32_t policer_burst) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct ingress_policer *policer; + + /* Force to 0 if no rate specified, + * default to 8000 kbits if burst is 0, + * else stick with user-specified value. + */ + policer_burst = (!policer_rate ? 0 + : !policer_burst ? 8000 + : policer_burst); + + ovs_mutex_lock(&dev->mutex); + + policer = ovsrcu_get_protected(struct ingress_policer *, + &dev->ingress_policer); + + if (dev->policer_rate == policer_rate && + dev->policer_burst == policer_burst) { + /* Assume that settings haven't changed since we last set them. */ + ovs_mutex_unlock(&dev->mutex); + return 0; + } + + /* Destroy any existing ingress policer for the device if one exists + * and the policer_rate and policer_burst are being set to 0 + */ + if (policer) { + ovsrcu_postpone(free, policer); + } + + if (policer_rate != 0) { + policer = netdev_dpdk_policer_construct(policer_rate, policer_burst); + } else { + policer = NULL; + } + ovsrcu_set(&dev->ingress_policer, policer); + dev->policer_rate = policer_rate; + dev->policer_burst = policer_burst; + ovs_mutex_unlock(&dev->mutex); + + return 0; +} + static int netdev_dpdk_get_ifindex(const struct netdev *netdev) { @@ -2326,6 +2460,12 @@ netdev_dpdk_get_virtio(const struct netdev_dpdk *dev) return ovsrcu_get(struct virtio_net *, &dev->virtio_dev); } +struct ingress_policer * +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev) +{ + return ovsrcu_get(struct ingress_policer *, &dev->ingress_policer); +} + /* * These callbacks allow virtio-net devices to be added to vhost ports when * configuration has been fully complete. @@ -2839,7 +2979,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) GET_FEATURES, \ NULL, /* set_advertisements */ \ \ - NULL, /* set_policing */ \ + netdev_dpdk_set_policing, \ netdev_dpdk_get_qos_types, \ NULL, /* get_qos_capabilities */ \ netdev_dpdk_get_qos, \ diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 944d8ec..0c9e60c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2502,8 +2502,8 @@ table="Queue"/> tables). </p> <p> - Policing is currently implemented only on Linux. The Linux - implementation uses a simple ``token bucket'' approach: + Policing is currently implemented on Linux and OVS with DPDK. Both + implementations use a simple ``token bucket'' approach: </p> <ul> <li> -- 1.7.4.1 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev