Thanks for the patch and the review.

I agree with everything Flavio pointed out, a few more comments below

On 10/02/2016 11:54, "Flavio Leitner" <f...@sysclose.org> wrote:

>On Mon,  1 Feb 2016 20:47:25 +0000
>Ian Stokes <ian.sto...@intel.com> wrote:
>
>> This patch provides the modifications required in netdev-dpdk.c and
>> vswitch.xml to allow for a DPDK user space QoS algorithm.
>> 
>> This patch adds a QoS configuration structure for netdev-dpdk and
>> expected QoS operations 'dpdk_qos_ops'. Various helper functions
>> are also supplied.
>> 
>> Also included are the modifications required for vswitch.xml to allow a
>> new QoS implementation for netdev-dpdk devices. This includes a new QoS
>>type
>> `egress-policer` as well as its expected QoS table entries.
>> 
>> The QoS functionality implemented for DPDK devices is `egress-policer`.
>> This can be used to drop egress packets at a configurable rate.
>> 
>> The INSTALL.DPDK.md guide has also been modified to provide an example
>> configuration of `egress-policer` QoS.
>> 
>> Signed-off-by: Ian Stokes <ian.sto...@intel.com>
>> ---
>>  INSTALL.DPDK.md      |   20 +++
>>  lib/netdev-dpdk.c    |  439
>>++++++++++++++++++++++++++++++++++++++++++++++++--
>>  vswitchd/vswitch.xml |   52 ++++++
>>  3 files changed, 498 insertions(+), 13 deletions(-)
>> 
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index e8ef4b5..dac70cd 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
>>     ./ovs-ofctl add-flow br0 in_port=2,action=output:1
>>     ```
>>  
>> +8. QoS usage example
>> +
>> +   Assuming you have a vhost-user port transmitting traffic consisting
>>of
>> +   packets of size 64 bytes, the following command would limit the
>>egress
>> +   transmission rate of the port to ~1,000,000 packets per second:
>> +
>> +   `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create
>>qos
>> +   type=egress-policer other-config:cir=46000000 other-config:cbs=2048`
>> +
>> +   To examine the QoS configuration of the port:
>> +
>> +   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
>> +
>> +   To clear the QoS configuration from the port and ovsdb use the
>>following:
>> +
>> +   `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0 qos`

Minor nit: the first "--" is not necessary

>> +
>> +   For more details regarding egress-policer parameters please refer
>>to the
>> +   vswitch.xml.
>> +
>>  Performance Tuning:
>>  -------------------
>>  
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 09ccc2c..716da79 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -44,6 +44,7 @@
>>  #include "ovs-rcu.h"
>>  #include "packets.h"
>>  #include "shash.h"
>> +#include "smap.h"
>>  #include "sset.h"
>>  #include "unaligned.h"
>>  #include "timeval.h"
>> @@ -52,12 +53,14 @@
>>  
>>  #include "rte_config.h"
>>  #include "rte_mbuf.h"
>> +#include "rte_meter.h"
>>  #include "rte_virtio_net.h"
>>  
>>  VLOG_DEFINE_THIS_MODULE(dpdk);
>>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>  
>>  #define DPDK_PORT_WATCHDOG_INTERVAL 5
>> +#define DPDK_MAX_QOS_NAME_SIZE 10
>
>10 is too short to give an useful description.
>The proposed one already goes over it.
>
>>>> a = 'egress-policer'
>>>> len(a)
>14
>
>
>  
>>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>>  #define OVS_VPORT_DPDK "ovs_dpdk"
>> @@ -142,6 +145,107 @@ static int rte_eal_init_ret = ENODEV;
>>  
>>  static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
>>  
>> +/* Quality of Service */
>> +
>> +/* An instance of a QoS configuration.  Always associated with a
>>particular
>> + * network device.
>> + *
>> + * Each QoS implementation subclasses this with whatever additional
>>data it
>> + * needs.
>> + */
>> +struct qos_conf {
>> +    const struct dpdk_qos_ops *ops;
>> +};
>> +
>> +/* A particular implementation of dpdk QoS operations.
>> + *
>> + * The functions below return 0 if successful or a positive errno
>>value on
>> + * failure, except where otherwise noted. All of them must be
>>provided, except
>> + * where otherwise noted.
>> + */
>> +struct dpdk_qos_ops {
>> +
>> +    /* Name of the QoS type */
>
>Please mention DPDK_MAX_QOS_NAME_SIZE
>
>
>> +    const char *qos_name;
>> +
>> +    /* Called to construct the QoS implementation on 'netdev'. The
>> +     * implementation should make the appropriate calls to configure
>>QoS
>> +     * according to 'details'. The implementation may assume that any
>>current
>> +     * QoS configuration already installed should be destroyed before
>> +     * constructing the new configuration.
>> +     *
>> +     * The contents of 'details' should be documented as valid for
>>'ovs_name'
>> +     * in the "other_config" column in the "QoS" table in
>>vswitchd/vswitch.xml
>> +     * (which is built as ovs-vswitchd.conf.db(8)).
>> +     *
>> +     * This function must return 0 if and only if it sets
>>'netdev->qos_conf'
>> +     * to an initialized 'struct qos_conf'.
>> +     *
>> +     * For all QoS implementations it should always be non-null.
>> +     */
>> +    int (*qos_construct)(struct netdev *netdev, const struct smap
>>*details);
>> +
>> +    /* Destroys the data structures allocated by the implementation as
>>part of
>> +     * 'qos_conf.
>> +     *
>> +     * For all QoS implementations it should always be non-null.
>> +     */
>> +    void (*qos_destruct)(struct netdev *netdev, struct qos_conf *conf);
>> +
>> +    /* Retrieves details of 'netdev->qos_conf' configuration into
>>'details'.
>> +     *
>> +     * The contents of 'details' should be documented as valid for
>>'ovs_name'
>> +     * in the "other_config" column in the "QoS" table in
>>vswitchd/vswitch.xml
>> +     * (which is built as ovs-vswitchd.conf.db(8)).
>> +     */
>> +    int (*qos_get)(const struct netdev *netdev, struct smap *details);
>> +
>> +    /* Reconfigures 'netdev->qos_conf' according to 'details',
>>performing any
>> +     * required calls to complete the reconfiguration.
>> +     *
>> +     * The contents of 'details' should be documented as valid for
>>'ovs_name'
>> +     * in the "other_config" column in the "QoS" table in
>>vswitchd/vswitch.xml
>> +     * (which is built as ovs-vswitchd.conf.db(8)).
>> +     *
>> +     * This function may be null if 'qos_conf' is not configurable.
>> +     */
>> +    int (*qos_set)(struct netdev *netdev, const struct smap *details);
>> +
>> +    /* Modify an array of rte_mbufs. The modification is specific to
>> +     * each qos implementation.
>> +     *
>> +     * The function should take and array of mbufs and an int
>>representing
>> +     * the current number of mbufs present in the array.
>> +     *
>> +     * After the function has performed a qos modification to the
>>array of
>> +     * mbufs it returns an int representing the number of mbufs now
>>present in
>> +     * the array. This value is can then be passed to the port send
>>function
>> +     * along with the modified array for transmission.
>> +     *
>> +     * For all QoS implementations it should always be non-null.
>> +     */
>> +    int (*qos_alg_process)(struct netdev *netdev, struct rte_mbuf
>>**pkts,
>> +                           int pkt_cnt);
>
>Maybe qos_run() ?
>
>
>> +};
>> +
>> +/* dpdk_qos_ops for each type of user space QoS implementation */
>> +static const struct dpdk_qos_ops egress_policer_ops;
>> +
>> +/*
>> + * Array of dpdk_qos_ops, contains pointer to all supported QoS
>> + * operations.
>> + */
>> +static const struct dpdk_qos_ops *const qos_confs[] = {
>> +    &egress_policer_ops,
>> +    NULL
>> +};
>> +
>> +/* Action that can be set for a packet for rte_meter */
>> +enum egress_policer_action {
>> +        GREEN = e_RTE_METER_GREEN,
>> +        DROP = 1,
>> +};
>
>I don't think it's a good idea to mix external and internal declarations
>because if the external one changes, it will break here. Since GREEN and
>DROP
>have meanings only inside of OVS, we should define our own values.
>See my comment where this is used.
>
>
>> +
>>  /* Contains all 'struct dpdk_dev's. */
>>  static struct ovs_list dpdk_list OVS_GUARDED_BY(dpdk_mutex)
>>      = OVS_LIST_INITIALIZER(&dpdk_list);
>> @@ -232,6 +336,11 @@ struct netdev_dpdk {
>>  
>>      /* In dpdk_list. */
>>      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>> +
>> +    /* QoS configuration and lock for the device */
>> +    struct qos_conf *qos_conf;
>> +    rte_spinlock_t qos_lock;
>> +
>>  };
>>  
>>  struct netdev_rxq_dpdk {
>> @@ -611,6 +720,10 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned
>>int port_no,
>>          goto unlock;
>>      }
>>  
>> +    /* Initialise QoS configuration to NULL and qos lock to unlocked */
>> +    netdev->qos_conf = NULL;
>> +    rte_spinlock_init(&netdev->qos_lock);
>> +
>>      netdev_->n_txq = NR_QUEUE;
>>      netdev_->n_rxq = NR_QUEUE;
>>      netdev->real_n_txq = NR_QUEUE;
>> @@ -1066,6 +1179,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_,
>>struct dp_packet **packets,
>>      return 0;
>>  }
>>  
>> +static inline int
>> +netdev_dpdk_qos_process__(struct netdev_dpdk *dev, struct rte_mbuf
>>**pkts,
>> +                        int cnt)
>> +{
>> +    struct netdev *netdev = &dev->up;
>> +
>> +    if (dev->qos_conf != NULL) {
>> +        rte_spinlock_lock(&dev->qos_lock);
>> +        if (dev->qos_conf != NULL) {
>> +            cnt = dev->qos_conf->ops->qos_alg_process(netdev, pkts,
>>cnt);
>> +        }
>> +        rte_spinlock_unlock(&dev->qos_lock);
>> +    }
>> +
>> +    return cnt;
>> +}
>> +
>>  static inline void
>>  netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
>>                                       struct dp_packet **packets,
>> @@ -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
>>int qid,
>>      struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
>>      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
>>      unsigned int total_pkts = cnt;
>> +    unsigned int qos_pkts = cnt;
>>      uint64_t start = 0;
>>  
>>      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) {
>> @@ -1106,6 +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
>>int qid,
>>          rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
>>      }
>>  
>> +    /* Check has QoS has been configured for the netdev */
>> +    cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt);
>> +    qos_pkts -= cnt;
>
>I don't see how this is working because egress_policer_alg_process()
>will loop in the array one by one freeing or not the packet.
>If it does nothing, then the array is good but otherwise it creates
>a problem because the array still references freed entries.  So in
>loop below we are sending a mix of good and freed buffers up to 'cnt'.
>

This is definitely a problem. Sorry I didn't catch it on the review of V1.

>
>> +
>>      do {
>>          int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
>>          unsigned int tx_pkts;
>> @@ -1147,6 +1282,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
>>int qid,
>>      }
>>  
>>      rte_spinlock_lock(&vhost_dev->stats_lock);
>> +    cnt += qos_pkts;
>>      netdev_dpdk_vhost_update_tx_counters(&vhost_dev->stats, pkts,
>>total_pkts,
>>                                           cnt);
>>      rte_spinlock_unlock(&vhost_dev->stats_lock);
>> @@ -1242,19 +1378,25 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
>>struct dp_packet **pkts,
>>          newcnt++;
>>      }
>>  
>> -    if (OVS_UNLIKELY(dropped)) {
>> -        rte_spinlock_lock(&dev->stats_lock);
>> -        dev->stats.tx_dropped += dropped;
>> -        rte_spinlock_unlock(&dev->stats_lock);
>> -    }
>> -
>>      if (dev->type == DPDK_DEV_VHOST) {
>>          __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
>>mbufs, newcnt, true);
>>      } else {
>> +        unsigned int qos_pkts = newcnt;
>> +
>> +        /* Check if QoS has been configured for this netdev. */
>> +        newcnt = netdev_dpdk_qos_process__(dev, mbufs, newcnt);
>> +
>> +        dropped += qos_pkts - newcnt;
>>          dpdk_queue_pkts(dev, qid, mbufs, newcnt);
>>          dpdk_queue_flush(dev, qid);
>>      }
>>  
>> +    if (OVS_UNLIKELY(dropped)) {
>> +        rte_spinlock_lock(&dev->stats_lock);
>> +        dev->stats.tx_dropped += dropped;
>> +        rte_spinlock_unlock(&dev->stats_lock);
>> +    }
>> +
>>      if (!dpdk_thread_is_pmd()) {
>>          ovs_mutex_unlock(&nonpmd_mempool_mutex);
>>      }
>> @@ -1304,15 +1446,24 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int
>>qid,
>>      } else {
>>          int next_tx_idx = 0;
>>          int dropped = 0;
>> +        unsigned int qos_pkts = 0;
>>  
>>          for (i = 0; i < cnt; i++) {
>>              int size = dp_packet_size(pkts[i]);
>>  
>>              if (OVS_UNLIKELY(size > dev->max_packet_len)) {
>>                  if (next_tx_idx != i) {
>> +                    cnt = i - next_tx_idx;
>> +                    qos_pkts = cnt;
>> +
>> +                    cnt = netdev_dpdk_qos_process__(dev,
>> +                                                    (struct
>>rte_mbuf**)pkts,
>> +                                                    cnt);
>> +                    dropped += qos_pkts - cnt;
>>                      dpdk_queue_pkts(dev, qid,
>>                                      (struct rte_mbuf
>>**)&pkts[next_tx_idx],
>> -                                    i-next_tx_idx);
>> +                                    cnt);
>> +
>>                  }
>>  
>>                  VLOG_WARN_RL(&rl, "Too big size %d max_packet_len %d",
>> @@ -1324,9 +1475,13 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int
>>qid,
>>              }
>>          }
>>          if (next_tx_idx != cnt) {
>> -           dpdk_queue_pkts(dev, qid,
>> -                            (struct rte_mbuf **)&pkts[next_tx_idx],
>> -                            cnt-next_tx_idx);
>> +            cnt -= next_tx_idx;
>> +            qos_pkts = cnt;
>> +
>> +            cnt = netdev_dpdk_qos_process__(dev, (struct
>>rte_mbuf**)pkts, cnt);
>> +            dropped += qos_pkts - cnt;
>> +            dpdk_queue_pkts(dev, qid, (struct rte_mbuf
>>**)&pkts[next_tx_idx],
>> +                            cnt);
>>          }
>>  
>>          if (OVS_UNLIKELY(dropped)) {
>> @@ -2107,6 +2262,264 @@ unlock_dpdk:
>>      return err;
>>  }
>>  
>> +/* QoS Functions */
>> +
>> +/*
>> + * Initialize QoS configuration operations.
>> + */
>> +static void
>> +qos_conf_init(struct qos_conf *conf, const struct dpdk_qos_ops *ops)
>> +{
>> +    conf->ops = ops;
>> +}
>> +
>> +/*
>> + * Search existing QoS operations in qos_ops and compare each set of
>> + * operations qos_name to name. Return a dpdk_qos_ops pointer to a
>>match,
>> + * else return NULL
>> + */
>> +static const struct dpdk_qos_ops *
>> +qos_lookup_name(const char *name)
>> +{
>> +    const struct dpdk_qos_ops *const *opsp;
>> +
>> +    for (opsp = qos_confs; *opsp != NULL; opsp++) {
>> +        const struct dpdk_qos_ops *ops = *opsp;
>> +        if (!strncmp(name, ops->qos_name,DPDK_MAX_QOS_NAME_SIZE)) {
>                                            ^ missing space
>
>Do we really need strncmp? Since it comes from DB, I suspect it will
>always
>be NULL terminated.

I agree. We should drop DPDK_MAX_QOS_NAME_SIZE entirely

>
>
>> +            return ops;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +/*
>> + * Call qos_destruct to clean up items associated with the netdevs
>> + * qos_conf. Set netdevs qos_conf to NULL.
>> + */
>> +static void
>> +qos_delete_conf(struct netdev *netdev_)
>> +{
>> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> +
>> +    rte_spinlock_lock(&netdev->qos_lock);
>> +    if (netdev->qos_conf) {
>> +        if (netdev->qos_conf->ops->qos_destruct) {
>> +            netdev->qos_conf->ops->qos_destruct(netdev_,
>>netdev->qos_conf);
>> +        }
>> +        netdev->qos_conf = NULL;
>> +    }
>> +    rte_spinlock_unlock(&netdev->qos_lock);
>> +}
>> +
>> +static int
>> +netdev_dpdk_get_qos_types(const struct netdev *netdev OVS_UNUSED,
>> +                           struct sset *types)
>> +{
>> +    const struct dpdk_qos_ops *const *opsp;
>> +
>> +    for (opsp = qos_confs; *opsp != NULL; opsp++) {
>> +        const struct dpdk_qos_ops *ops = *opsp;
>> +        if (ops->qos_construct && ops->qos_name[0] != '\0') {
>> +            sset_add(types, ops->qos_name);
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int
>> +netdev_dpdk_get_qos(const struct netdev *netdev_,
>> +                    const char **typep, struct smap *details)
>> +{
>> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> +    int error = 0;
>> +
>> +    ovs_mutex_lock(&netdev->mutex);
>> +    if(netdev->qos_conf) {
>> +        *typep = netdev->qos_conf->ops->qos_name;
>> +        error = (netdev->qos_conf->ops->qos_get
>> +                 ? netdev->qos_conf->ops->qos_get(netdev_, details):
>>0);
>> +    }
>> +    ovs_mutex_unlock(&netdev->mutex);
>> +
>> +    return error;
>> +}
>> +
>> +static int
>> +netdev_dpdk_set_qos(struct netdev *netdev_,
>> +                    const char *type, const struct smap *details)
>> +{
>> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> +    const struct dpdk_qos_ops *new_ops = NULL;
>> +    int error = 0;
>> +
>> +    /* If type is empty the current QoS configuration for the
>>dpdk-netdev can
>> +     * be destroyed */
>> +    if(type[0] == '\0') {
>> +        qos_delete_conf(netdev_);
>
>Are we missing ovs_mutex_lock(&netdev->mutex) here?
>
>> +    }
>> +
>> +    new_ops = qos_lookup_name(type);
>> +    if (!new_ops || !new_ops->qos_construct) {
>> +        return EOPNOTSUPP;
>> +    }
>
>I would suggest to do:
>
>netdev_dpdk_set_qos(struct netdev *netdev_,
>                    const char *type, const struct smap *details)
>{
>   struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>   const struct dpdk_qos_ops *new_ops = NULL;
>   int error = 0;
>
>   new_ops = qos_lookup_name(type);
>   if (!new_ops || !new_ops->qos_construct) {
>       return EOPNOTSUPP;
>   }
>
>   ovs_mutex_lock(&netdev->mutex);
>   if(type[0] == '\0') {
>       qos_delete_conf(netdev_);
>   }

I would treat the case of 'type' not found or 'type'=="" equally
and delete the qos configuration.

>
>[ ...continue below as you posted ]
>> +    if (netdev->qos_conf) {
>> +        if (new_ops == netdev->qos_conf->ops) {
>> +            error = new_ops->qos_set ? new_ops->qos_set(netdev_,
>>details) : 0;
>> +        }
>> +        else {
>> +            /* Delete existing QoS configuration. */
>> +            qos_delete_conf(netdev_);
>> +            ovs_assert(netdev->qos_conf == NULL);
>> +
>> +            /* Install new QoS configuration. */
>> +            error = new_ops->qos_construct(netdev_, details);
>> +            ovs_assert((error == 0) == (netdev->qos_conf != NULL));
>> +        }
>> +    } else {
>> +        error = new_ops->qos_construct(netdev_, details);
>> +        ovs_assert((error == 0) == (netdev->qos_conf != NULL));
>> +    }
>> +
>> +    ovs_mutex_unlock(&netdev->mutex);
>> +    return error;
>> +}
>> +
>> +/* egress-policer details */
>> +
>> +struct egress_policer {
>> +    struct qos_conf qos_conf;
>> +    struct rte_meter_srtcm_params app_srtcm_params;
>> +    struct rte_meter_srtcm egress_meter;
>> +};
>> +
>> +static struct egress_policer *
>> +egress_policer_get__(const struct netdev *netdev_)
>> +{
>> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> +    return CONTAINER_OF(netdev->qos_conf, struct egress_policer,
>>qos_conf);
>> +}
>> +
>> +static int
>> +egress_policer_qos_construct(struct netdev *netdev_,
>> +                            const struct smap *details)
>> +{
>> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> +    struct egress_policer *policer;
>> +    const char *cir_s;
>> +    const char *cbs_s;
>> +    int err = 0;
>> +
>> +    rte_spinlock_lock(&netdev->qos_lock);
>> +    policer = xmalloc(sizeof *policer);
>> +    qos_conf_init(&policer->qos_conf, &egress_policer_ops);
>> +    netdev->qos_conf = &policer->qos_conf;
>> +    cir_s = smap_get(details, "cir");
>> +    cbs_s = smap_get(details, "cbs");
>> +    policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10)
>>: 0;
>> +    policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, 10)
>>: 0;
>> +    policer->app_srtcm_params.ebs = 0;
>> +    err = rte_meter_srtcm_config(&policer->egress_meter,
>> +                                    &policer->app_srtcm_params);
>> +    rte_spinlock_unlock(&netdev->qos_lock);
>> +
>> +    return err;
>> +}
>> +
>> +static void
>> +egress_policer_qos_destruct(struct netdev *netdev_ OVS_UNUSED,
>> +                        struct qos_conf *conf)
>> +{
>> +    struct egress_policer *policer = CONTAINER_OF(conf, struct
>>egress_policer,
>> +                                                qos_conf);
>> +    free(policer);
>> +}
>> +
>> +static int
>> +egress_policer_qos_get(const struct netdev *netdev, struct smap
>>*details)
>> +{
>> +    struct egress_policer *policer = egress_policer_get__(netdev);
>> +    smap_add_format(details, "cir", "%llu",
>> +                    1ULL * policer->app_srtcm_params.cir);
>> +    smap_add_format(details, "cbs", "%llu",
>> +                    1ULL * policer->app_srtcm_params.cbs);
>> +    return 0;
>> +}
>> +
>> +static int
>> +egress_policer_qos_set(struct netdev *netdev_, const struct smap
>>*details)
>> +{
>> +    struct egress_policer *policer;
>> +    const char *cir_s;
>> +    const char *cbs_s;
>> +    int err = 0;
>> +
>> +    policer = egress_policer_get__(netdev_);
>> +    cir_s = smap_get(details, "cir");
>> +    cbs_s = smap_get(details, "cbs");
>> +    policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10)
>>: 0;
>> +    policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, 10)
>>: 0;
>> +    policer->app_srtcm_params.ebs = 0;
>> +    err = rte_meter_srtcm_config(&policer->egress_meter,
>> +                                    &policer->app_srtcm_params);
>> +
>> +    return err;
>> +}
>> +
>> +static inline int
>> +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
>> +                            struct rte_mbuf *pkt, uint64_t time)
>> +{
>> +    uint8_t output_color;
>> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
>>ether_hdr);
>> +    enum egress_policer_action action = GREEN;
>> +
>> +    /* color input is not used for blind modes */
>> +    output_color = (uint8_t) rte_meter_srtcm_color_blind_check(meter,
>>time,
>> +               
>>pkt_len);
>> +
>> +    /* Check output color, 0 corresponds to GREEN i.e. packet can be
>> +     * transmitted. If greater than 0 then action is set to DROP. */
>> +    if (output_color) {
>> +        action = DROP;
>> +    }
> 
>
>This could be simpler:
>
>static inline int
>egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
>                            struct rte_mbuf *pkt, uint64_t time)
>{
>     uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
>ether_hdr);
>
>     /* If GREEN, accept it otherwise drop it */
>     if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) !=
>e_RTE_METER_GREEN) {
>         return 1;
>     }
>
>     return 0;
>}
>
>
>
>> +    return action;
>> +}
>> +
>> +static int
>> +egress_policer_alg_process(struct netdev *netdev_, struct rte_mbuf
>>**pkts,
>> +                        int pkt_cnt)
>> +{
>> +    int i = 0;
>> +    int cnt = pkt_cnt;
>> +    struct egress_policer *policer = egress_policer_get__(netdev_);
>> +    struct rte_mbuf *pkt = NULL;
>> +    uint64_t current_time = rte_rdtsc();
>> +
>> +    for(i = 0; i < pkt_cnt; i++) {
>> +        pkt = pkts[i];
>> +        /* Handle current packet */
>> +        if (egress_policer_pkt_handle__(&policer->egress_meter, pkt,
>> +                                        current_time) == DROP) {
>> +            rte_pktmbuf_free(pkt);
>> +            cnt = cnt - 1;
>> +        }
>> +    }
>> +
>> +    return cnt;
>> +}
>> +
>> +static const struct dpdk_qos_ops egress_policer_ops = {
>> +    "egress-policer",    /* qos_name */
>> +    egress_policer_qos_construct,
>> +    egress_policer_qos_destruct,
>> +    egress_policer_qos_get,
>> +    egress_policer_qos_set,
>> +    egress_policer_alg_process
>> +};
>> +
>>  #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ,
>>SEND, \
>>      GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)
>>  \
>>  {                                                             \
>> @@ -2144,10 +2557,10 @@ unlock_dpdk:
>>      NULL,                       /* set_advertisements */      \
>>                                                                \
>>      NULL,                       /* set_policing */            \
>> -    NULL,                       /* get_qos_types */           \
>> +    netdev_dpdk_get_qos_types,                                \
>>      NULL,                       /* get_qos_capabilities */    \
>> -    NULL,                       /* get_qos */                 \
>> -    NULL,                       /* set_qos */                 \
>> +    netdev_dpdk_get_qos,                                      \
>> +    netdev_dpdk_set_qos,                                      \
>>      NULL,                       /* get_queue */               \
>>      NULL,                       /* set_queue */               \
>>      NULL,                       /* delete_queue */            \
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index ce0dbc1..545d47f 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3298,6 +3298,26 @@
>>            for information on how this classifier works.
>>          </dd>
>>        </dl>
>> +      <dl>
>> +        <dt><code>egress-policer</code></dt>
>> +        <dd>
>> +          An egress policer algorithm. This implementation uses the
>>DPDK
>> +          rte_meter library. The rte_meter library provides an
>>implementation
>> +          of srTCM (RFC2697) which allows the metering and policing of
>>traffic.
>> +          The implementation in OVS creates a color blind srTCM meter
>>with a
>> +          single token bucket used to police traffic. It should be
>>noted that
>> +          when the rte_meter is configured as part of QoS there will
>>be a
>> +          performance overhead as the rte_mter itself will consume CPU
>>cycles
>> +          in order to police traffic. These CPU cycles ordinarily are
>>used for
>> +          packet proccessing. As such the drop in performance will be
>>noticed
>> +          in terms of overall aggregate traffic throughput.
>> +
>> +          For more information regarding the usage of the DPDK
>>rte_meter see
>> +          
>><code>http://dpdk.org/doc/guides/sample_app_ug/qos_metering.html</code>
>> +          For more information regarding srTCM see
>> +          <code>https://tools.ietf.org/html/rfc2697</code>
>> +        </dd>
>> +      </dl>

I'm still not sure whether we should have references to rfc2697 or the
DPDK qos sample
app.  From a user perspective this is a token bucket.  If there are only
two possible
actions (drop and forward), what's the value of considering the colors?

>>      </column>
>>  
>>      <column name="queues">
>> @@ -3334,6 +3354,38 @@
>>        </column>
>>      </group>
>>  
>> +    <group title="Configuration for egress-policer QoS">
>> +      <p>
>> +        <ref table="QoS"/> <ref table="QoS" column="type"/>
>> +        <code>egress-policer</code> provides egress policing for
>>userspace
>> +        port types with DPDK.
>> +
>> +        It has the following key-value pairs defined.
>> +      </p>
>> +
>> +      <column name="other_config" key="cir" type='{"type": "integer"}'>
>> +        The Committed Information Rate (CIR) is measured in bytes of IP
>> +        packets per second, i.e. it includes the IP header, but not
>>link
>> +        specific (e.g. Ethernet) headers. This represents the bytes
>>per second
>> +        rate at which the token bucket will be updated. The cir value
>>is
>> +        calculated by (pps x packet data size).  For example assuming
>>a user
>> +        wishes to limit a stream consisting of 64 byte packets to 1
>>million
>> +        packets per second the CIR would be set to to to 46000000.
>>This value
>> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the
>>policing
>> +        rate for the number of packets per second and 46 represents
>>the size
>> +        of the packet data for a 64 byte ip packet.
>> +      </column>
>> +      <column name="other_config" key="cbs" type='{"type": "integer"}'>
>> +        The Committed Burst Size (CBS) is measured in bytes and
>>represents a
>> +        token bucket. At a minimum this value should be be set to the
>>expected
>> +        largest size packet in the traffic stream. In practice larger
>>values
>> +        may be used to increase the size of the token bucket. If a
>>packet can
>> +        be transmitted then the cbs will be decremented by the number
>>of
>> +        bytes/tokens of the packet. If there are not enough tokens in
>>the cbs
>> +        bucket the packet will be dropped.
>> +      </column>
>> +    </group>
>> +
>>      <group title="Common Columns">
>>        The overall purpose of these columns is described under
>><code>Common
>>        Columns</code> at the beginning of this document.
>
>On a higher level, it would be nice to move QoS from
>__netdev_dpdk_vhost_send() to its caller netdev_dpdk_vhost_send()
>and then run QoS before do the array copy at dpdk_do_tx_copy().
>But I guess we need the packet as a rte_mbuf.
>
>Thanks,
>-- 
>fbl
>
>_______________________________________________
>dev mailing list
>dev@openvswitch.org
>http://openvswitch.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to