Thanks for the review Daniele, comments inline. Will re-spin another version.

> Thanks for the patch and the review.
> 
> I agree with everything Flavio pointed out, a few more comments below
> 
> On 10/02/2016 11:54, "Flavio Leitner" <f...@sysclose.org> wrote:
> 
> >On Mon,  1 Feb 2016 20:47:25 +0000
> >Ian Stokes <ian.sto...@intel.com> wrote:
> >
> >> This patch provides the modifications required in netdev-dpdk.c and
> >> vswitch.xml to allow for a DPDK user space QoS algorithm.
> >>
> >> This patch adds a QoS configuration structure for netdev-dpdk and
> >> expected QoS operations 'dpdk_qos_ops'. Various helper functions are
> >> also supplied.
> >>
> >> Also included are the modifications required for vswitch.xml to allow
> >>a  new QoS implementation for netdev-dpdk devices. This includes a new
> >>QoS type  `egress-policer` as well as its expected QoS table entries.
> >>
> >> The QoS functionality implemented for DPDK devices is `egress-
> policer`.
> >> This can be used to drop egress packets at a configurable rate.
> >>
> >> The INSTALL.DPDK.md guide has also been modified to provide an
> >> example configuration of `egress-policer` QoS.
> >>
> >> Signed-off-by: Ian Stokes <ian.sto...@intel.com>
> >> ---
> >>  INSTALL.DPDK.md      |   20 +++
> >>  lib/netdev-dpdk.c    |  439
> >>++++++++++++++++++++++++++++++++++++++++++++++++--
> >>  vswitchd/vswitch.xml |   52 ++++++
> >>  3 files changed, 498 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index e8ef4b5..dac70cd
> >> 100644
> >> --- a/INSTALL.DPDK.md
> >> +++ b/INSTALL.DPDK.md
> >> @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
> >>     ./ovs-ofctl add-flow br0 in_port=2,action=output:1
> >>     ```
> >>
> >> +8. QoS usage example
> >> +
> >> +   Assuming you have a vhost-user port transmitting traffic
> >> + consisting
> >>of
> >> +   packets of size 64 bytes, the following command would limit the
> >>egress
> >> +   transmission rate of the port to ~1,000,000 packets per second:
> >> +
> >> +   `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos
> >> + create
> >>qos
> >> +   type=egress-policer other-config:cir=46000000
> >> + other-config:cbs=2048`
> >> +
> >> +   To examine the QoS configuration of the port:
> >> +
> >> +   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> >> +
> >> +   To clear the QoS configuration from the port and ovsdb use the
> >>following:
> >> +
> >> +   `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0
> >> + qos`
> 
> Minor nit: the first "--" is not necessary

Well spotted, will fix this.

> 
> >> +
> >> +   For more details regarding egress-policer parameters please refer
> >>to the
> >> +   vswitch.xml.
> >> +
> >>  Performance Tuning:
> >>  -------------------
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >> 09ccc2c..716da79 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -44,6 +44,7 @@
> >>  #include "ovs-rcu.h"
> >>  #include "packets.h"
> >>  #include "shash.h"
> >> +#include "smap.h"
> >>  #include "sset.h"
> >>  #include "unaligned.h"
> >>  #include "timeval.h"
> >> @@ -52,12 +53,14 @@
> >>
> >>  #include "rte_config.h"
> >>  #include "rte_mbuf.h"
> >> +#include "rte_meter.h"
> >>  #include "rte_virtio_net.h"
> >>
> >>  VLOG_DEFINE_THIS_MODULE(dpdk);
> >>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> >>
> >>  #define DPDK_PORT_WATCHDOG_INTERVAL 5
> >> +#define DPDK_MAX_QOS_NAME_SIZE 10
> >
> >10 is too short to give an useful description.
> >The proposed one already goes over it.
> >
> >>>> a = 'egress-policer'
> >>>> len(a)
> >14
> >
> >
> >
> >>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE  #define OVS_VPORT_DPDK
> >> "ovs_dpdk"
> >> @@ -142,6 +145,107 @@ static int rte_eal_init_ret = ENODEV;
> >>
> >>  static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
> >>
> >> +/* Quality of Service */
> >> +
> >> +/* An instance of a QoS configuration.  Always associated with a
> >>particular
> >> + * network device.
> >> + *
> >> + * Each QoS implementation subclasses this with whatever additional
> >>data it
> >> + * needs.
> >> + */
> >> +struct qos_conf {
> >> +    const struct dpdk_qos_ops *ops;
> >> +};
> >> +
> >> +/* A particular implementation of dpdk QoS operations.
> >> + *
> >> + * The functions below return 0 if successful or a positive errno
> >>value on
> >> + * failure, except where otherwise noted. All of them must be
> >>provided, except
> >> + * where otherwise noted.
> >> + */
> >> +struct dpdk_qos_ops {
> >> +
> >> +    /* Name of the QoS type */
> >
> >Please mention DPDK_MAX_QOS_NAME_SIZE
> >
> >
> >> +    const char *qos_name;
> >> +
> >> +    /* Called to construct the QoS implementation on 'netdev'. The
> >> +     * implementation should make the appropriate calls to configure
> >>QoS
> >> +     * according to 'details'. The implementation may assume that
> >> + any
> >>current
> >> +     * QoS configuration already installed should be destroyed
> before
> >> +     * constructing the new configuration.
> >> +     *
> >> +     * The contents of 'details' should be documented as valid for
> >>'ovs_name'
> >> +     * in the "other_config" column in the "QoS" table in
> >>vswitchd/vswitch.xml
> >> +     * (which is built as ovs-vswitchd.conf.db(8)).
> >> +     *
> >> +     * This function must return 0 if and only if it sets
> >>'netdev->qos_conf'
> >> +     * to an initialized 'struct qos_conf'.
> >> +     *
> >> +     * For all QoS implementations it should always be non-null.
> >> +     */
> >> +    int (*qos_construct)(struct netdev *netdev, const struct smap
> >>*details);
> >> +
> >> +    /* Destroys the data structures allocated by the implementation
> >> + as
> >>part of
> >> +     * 'qos_conf.
> >> +     *
> >> +     * For all QoS implementations it should always be non-null.
> >> +     */
> >> +    void (*qos_destruct)(struct netdev *netdev, struct qos_conf
> >> + *conf);
> >> +
> >> +    /* Retrieves details of 'netdev->qos_conf' configuration into
> >>'details'.
> >> +     *
> >> +     * The contents of 'details' should be documented as valid for
> >>'ovs_name'
> >> +     * in the "other_config" column in the "QoS" table in
> >>vswitchd/vswitch.xml
> >> +     * (which is built as ovs-vswitchd.conf.db(8)).
> >> +     */
> >> +    int (*qos_get)(const struct netdev *netdev, struct smap
> >> + *details);
> >> +
> >> +    /* Reconfigures 'netdev->qos_conf' according to 'details',
> >>performing any
> >> +     * required calls to complete the reconfiguration.
> >> +     *
> >> +     * The contents of 'details' should be documented as valid for
> >>'ovs_name'
> >> +     * in the "other_config" column in the "QoS" table in
> >>vswitchd/vswitch.xml
> >> +     * (which is built as ovs-vswitchd.conf.db(8)).
> >> +     *
> >> +     * This function may be null if 'qos_conf' is not configurable.
> >> +     */
> >> +    int (*qos_set)(struct netdev *netdev, const struct smap
> >> + *details);
> >> +
> >> +    /* Modify an array of rte_mbufs. The modification is specific to
> >> +     * each qos implementation.
> >> +     *
> >> +     * The function should take and array of mbufs and an int
> >>representing
> >> +     * the current number of mbufs present in the array.
> >> +     *
> >> +     * After the function has performed a qos modification to the
> >>array of
> >> +     * mbufs it returns an int representing the number of mbufs now
> >>present in
> >> +     * the array. This value is can then be passed to the port send
> >>function
> >> +     * along with the modified array for transmission.
> >> +     *
> >> +     * For all QoS implementations it should always be non-null.
> >> +     */
> >> +    int (*qos_alg_process)(struct netdev *netdev, struct rte_mbuf
> >>**pkts,
> >> +                           int pkt_cnt);
> >
> >Maybe qos_run() ?
> >
> >
> >> +};
> >> +
> >> +/* dpdk_qos_ops for each type of user space QoS implementation */
> >> +static const struct dpdk_qos_ops egress_policer_ops;
> >> +
> >> +/*
> >> + * Array of dpdk_qos_ops, contains pointer to all supported QoS
> >> + * operations.
> >> + */
> >> +static const struct dpdk_qos_ops *const qos_confs[] = {
> >> +    &egress_policer_ops,
> >> +    NULL
> >> +};
> >> +
> >> +/* Action that can be set for a packet for rte_meter */ enum
> >> +egress_policer_action {
> >> +        GREEN = e_RTE_METER_GREEN,
> >> +        DROP = 1,
> >> +};
> >
> >I don't think it's a good idea to mix external and internal
> >declarations because if the external one changes, it will break here.
> >Since GREEN and DROP have meanings only inside of OVS, we should define
> >our own values.
> >See my comment where this is used.
> >
> >
> >> +
> >>  /* Contains all 'struct dpdk_dev's. */  static struct ovs_list
> >> dpdk_list OVS_GUARDED_BY(dpdk_mutex)
> >>      = OVS_LIST_INITIALIZER(&dpdk_list); @@ -232,6 +336,11 @@ struct
> >> netdev_dpdk {
> >>
> >>      /* In dpdk_list. */
> >>      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
> >> +
> >> +    /* QoS configuration and lock for the device */
> >> +    struct qos_conf *qos_conf;
> >> +    rte_spinlock_t qos_lock;
> >> +
> >>  };
> >>
> >>  struct netdev_rxq_dpdk {
> >> @@ -611,6 +720,10 @@ netdev_dpdk_init(struct netdev *netdev_,
> >>unsigned int port_no,
> >>          goto unlock;
> >>      }
> >>
> >> +    /* Initialise QoS configuration to NULL and qos lock to unlocked
> */
> >> +    netdev->qos_conf = NULL;
> >> +    rte_spinlock_init(&netdev->qos_lock);
> >> +
> >>      netdev_->n_txq = NR_QUEUE;
> >>      netdev_->n_rxq = NR_QUEUE;
> >>      netdev->real_n_txq = NR_QUEUE;
> >> @@ -1066,6 +1179,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_,
> >>struct dp_packet **packets,
> >>      return 0;
> >>  }
> >>
> >> +static inline int
> >> +netdev_dpdk_qos_process__(struct netdev_dpdk *dev, struct rte_mbuf
> >>**pkts,
> >> +                        int cnt)
> >> +{
> >> +    struct netdev *netdev = &dev->up;
> >> +
> >> +    if (dev->qos_conf != NULL) {
> >> +        rte_spinlock_lock(&dev->qos_lock);
> >> +        if (dev->qos_conf != NULL) {
> >> +            cnt = dev->qos_conf->ops->qos_alg_process(netdev, pkts,
> >>cnt);
> >> +        }
> >> +        rte_spinlock_unlock(&dev->qos_lock);
> >> +    }
> >> +
> >> +    return cnt;
> >> +}
> >> +
> >>  static inline void
> >>  netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
> >>                                       struct dp_packet **packets,  @@
> >>-1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
> >>qid,
> >>      struct virtio_net *virtio_dev =
> netdev_dpdk_get_virtio(vhost_dev);
> >>      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
> >>      unsigned int total_pkts = cnt;
> >> +    unsigned int qos_pkts = cnt;
> >>      uint64_t start = 0;
> >>
> >>      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) {  @@ -1106,6
> >>+1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
> >>          rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
> >>      }
> >>
> >> +    /* Check has QoS has been configured for the netdev */
> >> +    cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt);
> >> +    qos_pkts -= cnt;
> >
> >I don't see how this is working because egress_policer_alg_process()
> >will loop in the array one by one freeing or not the packet.
> >If it does nothing, then the array is good but otherwise it creates a
> >problem because the array still references freed entries.  So in loop
> >below we are sending a mix of good and freed buffers up to 'cnt'.
> >
> 
> This is definitely a problem. Sorry I didn't catch it on the review of
> V1.

This should not be an issue due to how the token bucket updates before 
processing a batch of packets rather than updating after processing each packet 
individual packet. I would expect to see a segfault when data from a freed 
packet is attempted to be accessed in the send function after QoS has run 
otherwise. I'll double check to be sure.

> 
> >
> >> +
> >>      do {
> >>          int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
> >>          unsigned int tx_pkts;
> >> @@ -1147,6 +1282,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
> >>int qid,
> >>      }
> >>
> >>      rte_spinlock_lock(&vhost_dev->stats_lock);
> >> +    cnt += qos_pkts;
> >>      netdev_dpdk_vhost_update_tx_counters(&vhost_dev->stats, pkts,
> >>total_pkts,
> >>                                           cnt);
> >>      rte_spinlock_unlock(&vhost_dev->stats_lock);
> >> @@ -1242,19 +1378,25 @@ dpdk_do_tx_copy(struct netdev *netdev, int
> >>qid, struct dp_packet **pkts,
> >>          newcnt++;
> >>      }
> >>
> >> -    if (OVS_UNLIKELY(dropped)) {
> >> -        rte_spinlock_lock(&dev->stats_lock);
> >> -        dev->stats.tx_dropped += dropped;
> >> -        rte_spinlock_unlock(&dev->stats_lock);
> >> -    }
> >> -
> >>      if (dev->type == DPDK_DEV_VHOST) {
> >>          __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
> >>mbufs, newcnt, true);
> >>      } else {
> >> +        unsigned int qos_pkts = newcnt;
> >> +
> >> +        /* Check if QoS has been configured for this netdev. */
> >> +        newcnt = netdev_dpdk_qos_process__(dev, mbufs, newcnt);
> >> +
> >> +        dropped += qos_pkts - newcnt;
> >>          dpdk_queue_pkts(dev, qid, mbufs, newcnt);
> >>          dpdk_queue_flush(dev, qid);
> >>      }
> >>
> >> +    if (OVS_UNLIKELY(dropped)) {
> >> +        rte_spinlock_lock(&dev->stats_lock);
> >> +        dev->stats.tx_dropped += dropped;
> >> +        rte_spinlock_unlock(&dev->stats_lock);
> >> +    }
> >> +
> >>      if (!dpdk_thread_is_pmd()) {
> >>          ovs_mutex_unlock(&nonpmd_mempool_mutex);
> >>      }
> >> @@ -1304,15 +1446,24 @@ netdev_dpdk_send__(struct netdev_dpdk *dev,
> >>int qid,
> >>      } else {
> >>          int next_tx_idx = 0;
> >>          int dropped = 0;
> >> +        unsigned int qos_pkts = 0;
> >>
> >>          for (i = 0; i < cnt; i++) {
> >>              int size = dp_packet_size(pkts[i]);
> >>
> >>              if (OVS_UNLIKELY(size > dev->max_packet_len)) {
> >>                  if (next_tx_idx != i) {
> >> +                    cnt = i - next_tx_idx;
> >> +                    qos_pkts = cnt;
> >> +
> >> +                    cnt = netdev_dpdk_qos_process__(dev,
> >> +                                                    (struct
> >>rte_mbuf**)pkts,
> >> +                                                    cnt);
> >> +                    dropped += qos_pkts - cnt;
> >>                      dpdk_queue_pkts(dev, qid,
> >>                                      (struct rte_mbuf
> >>**)&pkts[next_tx_idx],
> >> -                                    i-next_tx_idx);
> >> +                                    cnt);
> >> +
> >>                  }
> >>
> >>                  VLOG_WARN_RL(&rl, "Too big size %d max_packet_len
> >>%d",  @@ -1324,9 +1475,13 @@ netdev_dpdk_send__(struct netdev_dpdk
> >>*dev, int qid,
> >>              }
> >>          }
> >>          if (next_tx_idx != cnt) {
> >> -           dpdk_queue_pkts(dev, qid,
> >> -                            (struct rte_mbuf **)&pkts[next_tx_idx],
> >> -                            cnt-next_tx_idx);
> >> +            cnt -= next_tx_idx;
> >> +            qos_pkts = cnt;
> >> +
> >> +            cnt = netdev_dpdk_qos_process__(dev, (struct
> >>rte_mbuf**)pkts, cnt);
> >> +            dropped += qos_pkts - cnt;
> >> +            dpdk_queue_pkts(dev, qid, (struct rte_mbuf
> >>**)&pkts[next_tx_idx],
> >> +                            cnt);
> >>          }
> >>
> >>          if (OVS_UNLIKELY(dropped)) { @@ -2107,6 +2262,264 @@
> >> unlock_dpdk:
> >>      return err;
> >>  }
> >>
> >> +/* QoS Functions */
> >> +
> >> +/*
> >> + * Initialize QoS configuration operations.
> >> + */
> >> +static void
> >> +qos_conf_init(struct qos_conf *conf, const struct dpdk_qos_ops *ops)
> >> +{
> >> +    conf->ops = ops;
> >> +}
> >> +
> >> +/*
> >> + * Search existing QoS operations in qos_ops and compare each set of
> >> + * operations qos_name to name. Return a dpdk_qos_ops pointer to a
> >>match,
> >> + * else return NULL
> >> + */
> >> +static const struct dpdk_qos_ops *
> >> +qos_lookup_name(const char *name)
> >> +{
> >> +    const struct dpdk_qos_ops *const *opsp;
> >> +
> >> +    for (opsp = qos_confs; *opsp != NULL; opsp++) {
> >> +        const struct dpdk_qos_ops *ops = *opsp;
> >> +        if (!strncmp(name, ops->qos_name,DPDK_MAX_QOS_NAME_SIZE)) {
> >                                            ^ missing space
> >
> >Do we really need strncmp? Since it comes from DB, I suspect it will
> >always be NULL terminated.
> 
> I agree. We should drop DPDK_MAX_QOS_NAME_SIZE entirely
> 
OK, I wasn't aware of the DB behavior. Good to know. Will remove it.
> >
> >
> >> +            return ops;
> >> +        }
> >> +    }
> >> +    return NULL;
> >> +}
> >> +
> >> +/*
> >> + * Call qos_destruct to clean up items associated with the netdevs
> >> + * qos_conf. Set netdevs qos_conf to NULL.
> >> + */
> >> +static void
> >> +qos_delete_conf(struct netdev *netdev_) {
> >> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >> +
> >> +    rte_spinlock_lock(&netdev->qos_lock);
> >> +    if (netdev->qos_conf) {
> >> +        if (netdev->qos_conf->ops->qos_destruct) {
> >> +            netdev->qos_conf->ops->qos_destruct(netdev_,
> >>netdev->qos_conf);
> >> +        }
> >> +        netdev->qos_conf = NULL;
> >> +    }
> >> +    rte_spinlock_unlock(&netdev->qos_lock);
> >> +}
> >> +
> >> +static int
> >> +netdev_dpdk_get_qos_types(const struct netdev *netdev OVS_UNUSED,
> >> +                           struct sset *types) {
> >> +    const struct dpdk_qos_ops *const *opsp;
> >> +
> >> +    for (opsp = qos_confs; *opsp != NULL; opsp++) {
> >> +        const struct dpdk_qos_ops *ops = *opsp;
> >> +        if (ops->qos_construct && ops->qos_name[0] != '\0') {
> >> +            sset_add(types, ops->qos_name);
> >> +        }
> >> +    }
> >> +    return 0;
> >> +}
> >> +
> >> +static int
> >> +netdev_dpdk_get_qos(const struct netdev *netdev_,
> >> +                    const char **typep, struct smap *details) {
> >> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >> +    int error = 0;
> >> +
> >> +    ovs_mutex_lock(&netdev->mutex);
> >> +    if(netdev->qos_conf) {
> >> +        *typep = netdev->qos_conf->ops->qos_name;
> >> +        error = (netdev->qos_conf->ops->qos_get
> >> +                 ? netdev->qos_conf->ops->qos_get(netdev_, details):
> >>0);
> >> +    }
> >> +    ovs_mutex_unlock(&netdev->mutex);
> >> +
> >> +    return error;
> >> +}
> >> +
> >> +static int
> >> +netdev_dpdk_set_qos(struct netdev *netdev_,
> >> +                    const char *type, const struct smap *details) {
> >> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >> +    const struct dpdk_qos_ops *new_ops = NULL;
> >> +    int error = 0;
> >> +
> >> +    /* If type is empty the current QoS configuration for the
> >>dpdk-netdev can
> >> +     * be destroyed */
> >> +    if(type[0] == '\0') {
> >> +        qos_delete_conf(netdev_);
> >
> >Are we missing ovs_mutex_lock(&netdev->mutex) here?
> >
> >> +    }
> >> +
> >> +    new_ops = qos_lookup_name(type);
> >> +    if (!new_ops || !new_ops->qos_construct) {
> >> +        return EOPNOTSUPP;
> >> +    }
> >
> >I would suggest to do:
> >
> >netdev_dpdk_set_qos(struct netdev *netdev_,
> >                    const char *type, const struct smap *details) {
> >   struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >   const struct dpdk_qos_ops *new_ops = NULL;
> >   int error = 0;
> >
> >   new_ops = qos_lookup_name(type);
> >   if (!new_ops || !new_ops->qos_construct) {
> >       return EOPNOTSUPP;
> >   }
> >
> >   ovs_mutex_lock(&netdev->mutex);
> >   if(type[0] == '\0') {
> >       qos_delete_conf(netdev_);
> >   }
> 
> I would treat the case of 'type' not found or 'type'=="" equally and
> delete the qos configuration.
Understood, I can't remember if I had it that way originally, let me 
investigate.
> 
> >
> >[ ...continue below as you posted ]
> >> +    if (netdev->qos_conf) {
> >> +        if (new_ops == netdev->qos_conf->ops) {
> >> +            error = new_ops->qos_set ? new_ops->qos_set(netdev_,
> >>details) : 0;
> >> +        }
> >> +        else {
> >> +            /* Delete existing QoS configuration. */
> >> +            qos_delete_conf(netdev_);
> >> +            ovs_assert(netdev->qos_conf == NULL);
> >> +
> >> +            /* Install new QoS configuration. */
> >> +            error = new_ops->qos_construct(netdev_, details);
> >> +            ovs_assert((error == 0) == (netdev->qos_conf != NULL));
> >> +        }
> >> +    } else {
> >> +        error = new_ops->qos_construct(netdev_, details);
> >> +        ovs_assert((error == 0) == (netdev->qos_conf != NULL));
> >> +    }
> >> +
> >> +    ovs_mutex_unlock(&netdev->mutex);
> >> +    return error;
> >> +}
> >> +
> >> +/* egress-policer details */
> >> +
> >> +struct egress_policer {
> >> +    struct qos_conf qos_conf;
> >> +    struct rte_meter_srtcm_params app_srtcm_params;
> >> +    struct rte_meter_srtcm egress_meter; };
> >> +
> >> +static struct egress_policer *
> >> +egress_policer_get__(const struct netdev *netdev_) {
> >> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >> +    return CONTAINER_OF(netdev->qos_conf, struct egress_policer,
> >>qos_conf);
> >> +}
> >> +
> >> +static int
> >> +egress_policer_qos_construct(struct netdev *netdev_,
> >> +                            const struct smap *details) {
> >> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >> +    struct egress_policer *policer;
> >> +    const char *cir_s;
> >> +    const char *cbs_s;
> >> +    int err = 0;
> >> +
> >> +    rte_spinlock_lock(&netdev->qos_lock);
> >> +    policer = xmalloc(sizeof *policer);
> >> +    qos_conf_init(&policer->qos_conf, &egress_policer_ops);
> >> +    netdev->qos_conf = &policer->qos_conf;
> >> +    cir_s = smap_get(details, "cir");
> >> +    cbs_s = smap_get(details, "cbs");
> >> +    policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL,
> >> + 10)
> >>: 0;
> >> +    policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL,
> >> + 10)
> >>: 0;
> >> +    policer->app_srtcm_params.ebs = 0;
> >> +    err = rte_meter_srtcm_config(&policer->egress_meter,
> >> +                                    &policer->app_srtcm_params);
> >> +    rte_spinlock_unlock(&netdev->qos_lock);
> >> +
> >> +    return err;
> >> +}
> >> +
> >> +static void
> >> +egress_policer_qos_destruct(struct netdev *netdev_ OVS_UNUSED,
> >> +                        struct qos_conf *conf) {
> >> +    struct egress_policer *policer = CONTAINER_OF(conf, struct
> >>egress_policer,
> >> +                                                qos_conf);
> >> +    free(policer);
> >> +}
> >> +
> >> +static int
> >> +egress_policer_qos_get(const struct netdev *netdev, struct smap
> >>*details)
> >> +{
> >> +    struct egress_policer *policer = egress_policer_get__(netdev);
> >> +    smap_add_format(details, "cir", "%llu",
> >> +                    1ULL * policer->app_srtcm_params.cir);
> >> +    smap_add_format(details, "cbs", "%llu",
> >> +                    1ULL * policer->app_srtcm_params.cbs);
> >> +    return 0;
> >> +}
> >> +
> >> +static int
> >> +egress_policer_qos_set(struct netdev *netdev_, const struct smap
> >>*details)
> >> +{
> >> +    struct egress_policer *policer;
> >> +    const char *cir_s;
> >> +    const char *cbs_s;
> >> +    int err = 0;
> >> +
> >> +    policer = egress_policer_get__(netdev_);
> >> +    cir_s = smap_get(details, "cir");
> >> +    cbs_s = smap_get(details, "cbs");
> >> +    policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL,
> >> + 10)
> >>: 0;
> >> +    policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL,
> >> + 10)
> >>: 0;
> >> +    policer->app_srtcm_params.ebs = 0;
> >> +    err = rte_meter_srtcm_config(&policer->egress_meter,
> >> +                                    &policer->app_srtcm_params);
> >> +
> >> +    return err;
> >> +}
> >> +
> >> +static inline int
> >> +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> >> +                            struct rte_mbuf *pkt, uint64_t time) {
> >> +    uint8_t output_color;
> >> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> >>ether_hdr);
> >> +    enum egress_policer_action action = GREEN;
> >> +
> >> +    /* color input is not used for blind modes */
> >> +    output_color = (uint8_t)
> >> + rte_meter_srtcm_color_blind_check(meter,
> >>time,
> >> +
> >>pkt_len);
> >> +
> >> +    /* Check output color, 0 corresponds to GREEN i.e. packet can be
> >> +     * transmitted. If greater than 0 then action is set to DROP. */
> >> +    if (output_color) {
> >> +        action = DROP;
> >> +    }
> >
> >
> >This could be simpler:
> >
> >static inline int
> >egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> >                            struct rte_mbuf *pkt, uint64_t time) {
> >     uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> >ether_hdr);
> >
> >     /* If GREEN, accept it otherwise drop it */
> >     if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) !=
> >e_RTE_METER_GREEN) {
> >         return 1;
> >     }
> >
> >     return 0;
> >}
> >
> >
> >
> >> +    return action;
> >> +}
> >> +
> >> +static int
> >> +egress_policer_alg_process(struct netdev *netdev_, struct rte_mbuf
> >>**pkts,
> >> +                        int pkt_cnt) {
> >> +    int i = 0;
> >> +    int cnt = pkt_cnt;
> >> +    struct egress_policer *policer = egress_policer_get__(netdev_);
> >> +    struct rte_mbuf *pkt = NULL;
> >> +    uint64_t current_time = rte_rdtsc();
> >> +
> >> +    for(i = 0; i < pkt_cnt; i++) {
> >> +        pkt = pkts[i];
> >> +        /* Handle current packet */
> >> +        if (egress_policer_pkt_handle__(&policer->egress_meter, pkt,
> >> +                                        current_time) == DROP) {
> >> +            rte_pktmbuf_free(pkt);
> >> +            cnt = cnt - 1;
> >> +        }
> >> +    }
> >> +
> >> +    return cnt;
> >> +}
> >> +
> >> +static const struct dpdk_qos_ops egress_policer_ops = {
> >> +    "egress-policer",    /* qos_name */
> >> +    egress_policer_qos_construct,
> >> +    egress_policer_qos_destruct,
> >> +    egress_policer_qos_get,
> >> +    egress_policer_qos_set,
> >> +    egress_policer_alg_process
> >> +};
> >> +
> >>  #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ,
> >>SEND, \
> >>      GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)
> >>  \
> >>  {                                                             \
> >> @@ -2144,10 +2557,10 @@ unlock_dpdk:
> >>      NULL,                       /* set_advertisements */      \
> >>                                                                \
> >>      NULL,                       /* set_policing */            \
> >> -    NULL,                       /* get_qos_types */           \
> >> +    netdev_dpdk_get_qos_types,                                \
> >>      NULL,                       /* get_qos_capabilities */    \
> >> -    NULL,                       /* get_qos */                 \
> >> -    NULL,                       /* set_qos */                 \
> >> +    netdev_dpdk_get_qos,                                      \
> >> +    netdev_dpdk_set_qos,                                      \
> >>      NULL,                       /* get_queue */               \
> >>      NULL,                       /* set_queue */               \
> >>      NULL,                       /* delete_queue */            \
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> >> ce0dbc1..545d47f 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -3298,6 +3298,26 @@
> >>            for information on how this classifier works.
> >>          </dd>
> >>        </dl>
> >> +      <dl>
> >> +        <dt><code>egress-policer</code></dt>
> >> +        <dd>
> >> +          An egress policer algorithm. This implementation uses the
> >>DPDK
> >> +          rte_meter library. The rte_meter library provides an
> >>implementation
> >> +          of srTCM (RFC2697) which allows the metering and policing
> >> + of
> >>traffic.
> >> +          The implementation in OVS creates a color blind srTCM
> >> + meter
> >>with a
> >> +          single token bucket used to police traffic. It should be
> >>noted that
> >> +          when the rte_meter is configured as part of QoS there will
> >>be a
> >> +          performance overhead as the rte_mter itself will consume
> >> + CPU
> >>cycles
> >> +          in order to police traffic. These CPU cycles ordinarily
> >> + are
> >>used for
> >> +          packet proccessing. As such the drop in performance will
> >> + be
> >>noticed
> >> +          in terms of overall aggregate traffic throughput.
> >> +
> >> +          For more information regarding the usage of the DPDK
> >>rte_meter see
> >> +
> >><code>http://dpdk.org/doc/guides/sample_app_ug/qos_metering.html</code
> >>>
> >> +          For more information regarding srTCM see
> >> +          <code>https://tools.ietf.org/html/rfc2697</code>
> >> +        </dd>
> >> +      </dl>
> 
> I'm still not sure whether we should have references to rfc2697 or the
> DPDK qos sample app.  From a user perspective this is a token bucket.
> If there are only two possible actions (drop and forward), what's the
> value of considering the colors?
> 

True, I'll remove these references.

> >>      </column>
> >>
> >>      <column name="queues">
> >> @@ -3334,6 +3354,38 @@
> >>        </column>
> >>      </group>
> >>
> >> +    <group title="Configuration for egress-policer QoS">
> >> +      <p>
> >> +        <ref table="QoS"/> <ref table="QoS" column="type"/>
> >> +        <code>egress-policer</code> provides egress policing for
> >>userspace
> >> +        port types with DPDK.
> >> +
> >> +        It has the following key-value pairs defined.
> >> +      </p>
> >> +
> >> +      <column name="other_config" key="cir" type='{"type":
> "integer"}'>
> >> +        The Committed Information Rate (CIR) is measured in bytes of
> IP
> >> +        packets per second, i.e. it includes the IP header, but not
> >>link
> >> +        specific (e.g. Ethernet) headers. This represents the bytes
> >>per second
> >> +        rate at which the token bucket will be updated. The cir
> >> + value
> >>is
> >> +        calculated by (pps x packet data size).  For example
> >> + assuming
> >>a user
> >> +        wishes to limit a stream consisting of 64 byte packets to 1
> >>million
> >> +        packets per second the CIR would be set to to to 46000000.
> >>This value
> >> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the
> >>policing
> >> +        rate for the number of packets per second and 46 represents
> >>the size
> >> +        of the packet data for a 64 byte ip packet.
> >> +      </column>
> >> +      <column name="other_config" key="cbs" type='{"type":
> "integer"}'>
> >> +        The Committed Burst Size (CBS) is measured in bytes and
> >>represents a
> >> +        token bucket. At a minimum this value should be be set to
> >> + the
> >>expected
> >> +        largest size packet in the traffic stream. In practice
> >> + larger
> >>values
> >> +        may be used to increase the size of the token bucket. If a
> >>packet can
> >> +        be transmitted then the cbs will be decremented by the
> >> + number
> >>of
> >> +        bytes/tokens of the packet. If there are not enough tokens
> >> + in
> >>the cbs
> >> +        bucket the packet will be dropped.
> >> +      </column>
> >> +    </group>
> >> +
> >>      <group title="Common Columns">
> >>        The overall purpose of these columns is described under
> >><code>Common
> >>        Columns</code> at the beginning of this document.
> >
> >On a higher level, it would be nice to move QoS from
> >__netdev_dpdk_vhost_send() to its caller netdev_dpdk_vhost_send() and
> >then run QoS before do the array copy at dpdk_do_tx_copy().
> >But I guess we need the packet as a rte_mbuf.
> >
> >Thanks,
> >--
> >fbl
> >
> >_______________________________________________
> >dev mailing list
> >dev@openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to