Thanks for the review Daniele, comments inline. Will re-spin another version.
> Thanks for the patch and the review. > > I agree with everything Flavio pointed out, a few more comments below > > On 10/02/2016 11:54, "Flavio Leitner" <f...@sysclose.org> wrote: > > >On Mon, 1 Feb 2016 20:47:25 +0000 > >Ian Stokes <ian.sto...@intel.com> wrote: > > > >> This patch provides the modifications required in netdev-dpdk.c and > >> vswitch.xml to allow for a DPDK user space QoS algorithm. > >> > >> This patch adds a QoS configuration structure for netdev-dpdk and > >> expected QoS operations 'dpdk_qos_ops'. Various helper functions are > >> also supplied. > >> > >> Also included are the modifications required for vswitch.xml to allow > >>a new QoS implementation for netdev-dpdk devices. This includes a new > >>QoS type `egress-policer` as well as its expected QoS table entries. > >> > >> The QoS functionality implemented for DPDK devices is `egress- > policer`. > >> This can be used to drop egress packets at a configurable rate. > >> > >> The INSTALL.DPDK.md guide has also been modified to provide an > >> example configuration of `egress-policer` QoS. > >> > >> Signed-off-by: Ian Stokes <ian.sto...@intel.com> > >> --- > >> INSTALL.DPDK.md | 20 +++ > >> lib/netdev-dpdk.c | 439 > >>++++++++++++++++++++++++++++++++++++++++++++++++-- > >> vswitchd/vswitch.xml | 52 ++++++ > >> 3 files changed, 498 insertions(+), 13 deletions(-) > >> > >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index e8ef4b5..dac70cd > >> 100644 > >> --- a/INSTALL.DPDK.md > >> +++ b/INSTALL.DPDK.md > >> @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd: > >> ./ovs-ofctl add-flow br0 in_port=2,action=output:1 > >> ``` > >> > >> +8. QoS usage example > >> + > >> + Assuming you have a vhost-user port transmitting traffic > >> + consisting > >>of > >> + packets of size 64 bytes, the following command would limit the > >>egress > >> + transmission rate of the port to ~1,000,000 packets per second: > >> + > >> + `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos > >> + create > >>qos > >> + type=egress-policer other-config:cir=46000000 > >> + other-config:cbs=2048` > >> + > >> + To examine the QoS configuration of the port: > >> + > >> + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` > >> + > >> + To clear the QoS configuration from the port and ovsdb use the > >>following: > >> + > >> + `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0 > >> + qos` > > Minor nit: the first "--" is not necessary Well spotted, will fix this. > > >> + > >> + For more details regarding egress-policer parameters please refer > >>to the > >> + vswitch.xml. > >> + > >> Performance Tuning: > >> ------------------- > >> > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > >> 09ccc2c..716da79 100644 > >> --- a/lib/netdev-dpdk.c > >> +++ b/lib/netdev-dpdk.c > >> @@ -44,6 +44,7 @@ > >> #include "ovs-rcu.h" > >> #include "packets.h" > >> #include "shash.h" > >> +#include "smap.h" > >> #include "sset.h" > >> #include "unaligned.h" > >> #include "timeval.h" > >> @@ -52,12 +53,14 @@ > >> > >> #include "rte_config.h" > >> #include "rte_mbuf.h" > >> +#include "rte_meter.h" > >> #include "rte_virtio_net.h" > >> > >> VLOG_DEFINE_THIS_MODULE(dpdk); > >> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > >> > >> #define DPDK_PORT_WATCHDOG_INTERVAL 5 > >> +#define DPDK_MAX_QOS_NAME_SIZE 10 > > > >10 is too short to give an useful description. > >The proposed one already goes over it. > > > >>>> a = 'egress-policer' > >>>> len(a) > >14 > > > > > > > >> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE #define OVS_VPORT_DPDK > >> "ovs_dpdk" > >> @@ -142,6 +145,107 @@ static int rte_eal_init_ret = ENODEV; > >> > >> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER; > >> > >> +/* Quality of Service */ > >> + > >> +/* An instance of a QoS configuration. Always associated with a > >>particular > >> + * network device. > >> + * > >> + * Each QoS implementation subclasses this with whatever additional > >>data it > >> + * needs. > >> + */ > >> +struct qos_conf { > >> + const struct dpdk_qos_ops *ops; > >> +}; > >> + > >> +/* A particular implementation of dpdk QoS operations. > >> + * > >> + * The functions below return 0 if successful or a positive errno > >>value on > >> + * failure, except where otherwise noted. All of them must be > >>provided, except > >> + * where otherwise noted. > >> + */ > >> +struct dpdk_qos_ops { > >> + > >> + /* Name of the QoS type */ > > > >Please mention DPDK_MAX_QOS_NAME_SIZE > > > > > >> + const char *qos_name; > >> + > >> + /* Called to construct the QoS implementation on 'netdev'. The > >> + * implementation should make the appropriate calls to configure > >>QoS > >> + * according to 'details'. The implementation may assume that > >> + any > >>current > >> + * QoS configuration already installed should be destroyed > before > >> + * constructing the new configuration. > >> + * > >> + * The contents of 'details' should be documented as valid for > >>'ovs_name' > >> + * in the "other_config" column in the "QoS" table in > >>vswitchd/vswitch.xml > >> + * (which is built as ovs-vswitchd.conf.db(8)). > >> + * > >> + * This function must return 0 if and only if it sets > >>'netdev->qos_conf' > >> + * to an initialized 'struct qos_conf'. > >> + * > >> + * For all QoS implementations it should always be non-null. > >> + */ > >> + int (*qos_construct)(struct netdev *netdev, const struct smap > >>*details); > >> + > >> + /* Destroys the data structures allocated by the implementation > >> + as > >>part of > >> + * 'qos_conf. > >> + * > >> + * For all QoS implementations it should always be non-null. > >> + */ > >> + void (*qos_destruct)(struct netdev *netdev, struct qos_conf > >> + *conf); > >> + > >> + /* Retrieves details of 'netdev->qos_conf' configuration into > >>'details'. > >> + * > >> + * The contents of 'details' should be documented as valid for > >>'ovs_name' > >> + * in the "other_config" column in the "QoS" table in > >>vswitchd/vswitch.xml > >> + * (which is built as ovs-vswitchd.conf.db(8)). > >> + */ > >> + int (*qos_get)(const struct netdev *netdev, struct smap > >> + *details); > >> + > >> + /* Reconfigures 'netdev->qos_conf' according to 'details', > >>performing any > >> + * required calls to complete the reconfiguration. > >> + * > >> + * The contents of 'details' should be documented as valid for > >>'ovs_name' > >> + * in the "other_config" column in the "QoS" table in > >>vswitchd/vswitch.xml > >> + * (which is built as ovs-vswitchd.conf.db(8)). > >> + * > >> + * This function may be null if 'qos_conf' is not configurable. > >> + */ > >> + int (*qos_set)(struct netdev *netdev, const struct smap > >> + *details); > >> + > >> + /* Modify an array of rte_mbufs. The modification is specific to > >> + * each qos implementation. > >> + * > >> + * The function should take and array of mbufs and an int > >>representing > >> + * the current number of mbufs present in the array. > >> + * > >> + * After the function has performed a qos modification to the > >>array of > >> + * mbufs it returns an int representing the number of mbufs now > >>present in > >> + * the array. This value is can then be passed to the port send > >>function > >> + * along with the modified array for transmission. > >> + * > >> + * For all QoS implementations it should always be non-null. > >> + */ > >> + int (*qos_alg_process)(struct netdev *netdev, struct rte_mbuf > >>**pkts, > >> + int pkt_cnt); > > > >Maybe qos_run() ? > > > > > >> +}; > >> + > >> +/* dpdk_qos_ops for each type of user space QoS implementation */ > >> +static const struct dpdk_qos_ops egress_policer_ops; > >> + > >> +/* > >> + * Array of dpdk_qos_ops, contains pointer to all supported QoS > >> + * operations. > >> + */ > >> +static const struct dpdk_qos_ops *const qos_confs[] = { > >> + &egress_policer_ops, > >> + NULL > >> +}; > >> + > >> +/* Action that can be set for a packet for rte_meter */ enum > >> +egress_policer_action { > >> + GREEN = e_RTE_METER_GREEN, > >> + DROP = 1, > >> +}; > > > >I don't think it's a good idea to mix external and internal > >declarations because if the external one changes, it will break here. > >Since GREEN and DROP have meanings only inside of OVS, we should define > >our own values. > >See my comment where this is used. > > > > > >> + > >> /* Contains all 'struct dpdk_dev's. */ static struct ovs_list > >> dpdk_list OVS_GUARDED_BY(dpdk_mutex) > >> = OVS_LIST_INITIALIZER(&dpdk_list); @@ -232,6 +336,11 @@ struct > >> netdev_dpdk { > >> > >> /* In dpdk_list. */ > >> struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); > >> + > >> + /* QoS configuration and lock for the device */ > >> + struct qos_conf *qos_conf; > >> + rte_spinlock_t qos_lock; > >> + > >> }; > >> > >> struct netdev_rxq_dpdk { > >> @@ -611,6 +720,10 @@ netdev_dpdk_init(struct netdev *netdev_, > >>unsigned int port_no, > >> goto unlock; > >> } > >> > >> + /* Initialise QoS configuration to NULL and qos lock to unlocked > */ > >> + netdev->qos_conf = NULL; > >> + rte_spinlock_init(&netdev->qos_lock); > >> + > >> netdev_->n_txq = NR_QUEUE; > >> netdev_->n_rxq = NR_QUEUE; > >> netdev->real_n_txq = NR_QUEUE; > >> @@ -1066,6 +1179,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, > >>struct dp_packet **packets, > >> return 0; > >> } > >> > >> +static inline int > >> +netdev_dpdk_qos_process__(struct netdev_dpdk *dev, struct rte_mbuf > >>**pkts, > >> + int cnt) > >> +{ > >> + struct netdev *netdev = &dev->up; > >> + > >> + if (dev->qos_conf != NULL) { > >> + rte_spinlock_lock(&dev->qos_lock); > >> + if (dev->qos_conf != NULL) { > >> + cnt = dev->qos_conf->ops->qos_alg_process(netdev, pkts, > >>cnt); > >> + } > >> + rte_spinlock_unlock(&dev->qos_lock); > >> + } > >> + > >> + return cnt; > >> +} > >> + > >> static inline void > >> netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats, > >> struct dp_packet **packets, @@ > >>-1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > >>qid, > >> struct virtio_net *virtio_dev = > netdev_dpdk_get_virtio(vhost_dev); > >> struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; > >> unsigned int total_pkts = cnt; > >> + unsigned int qos_pkts = cnt; > >> uint64_t start = 0; > >> > >> if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6 > >>+1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, > >> rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock); > >> } > >> > >> + /* Check has QoS has been configured for the netdev */ > >> + cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt); > >> + qos_pkts -= cnt; > > > >I don't see how this is working because egress_policer_alg_process() > >will loop in the array one by one freeing or not the packet. > >If it does nothing, then the array is good but otherwise it creates a > >problem because the array still references freed entries. So in loop > >below we are sending a mix of good and freed buffers up to 'cnt'. > > > > This is definitely a problem. Sorry I didn't catch it on the review of > V1. This should not be an issue due to how the token bucket updates before processing a batch of packets rather than updating after processing each packet individual packet. I would expect to see a segfault when data from a freed packet is attempted to be accessed in the send function after QoS has run otherwise. I'll double check to be sure. > > > > >> + > >> do { > >> int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; > >> unsigned int tx_pkts; > >> @@ -1147,6 +1282,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, > >>int qid, > >> } > >> > >> rte_spinlock_lock(&vhost_dev->stats_lock); > >> + cnt += qos_pkts; > >> netdev_dpdk_vhost_update_tx_counters(&vhost_dev->stats, pkts, > >>total_pkts, > >> cnt); > >> rte_spinlock_unlock(&vhost_dev->stats_lock); > >> @@ -1242,19 +1378,25 @@ dpdk_do_tx_copy(struct netdev *netdev, int > >>qid, struct dp_packet **pkts, > >> newcnt++; > >> } > >> > >> - if (OVS_UNLIKELY(dropped)) { > >> - rte_spinlock_lock(&dev->stats_lock); > >> - dev->stats.tx_dropped += dropped; > >> - rte_spinlock_unlock(&dev->stats_lock); > >> - } > >> - > >> if (dev->type == DPDK_DEV_VHOST) { > >> __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) > >>mbufs, newcnt, true); > >> } else { > >> + unsigned int qos_pkts = newcnt; > >> + > >> + /* Check if QoS has been configured for this netdev. */ > >> + newcnt = netdev_dpdk_qos_process__(dev, mbufs, newcnt); > >> + > >> + dropped += qos_pkts - newcnt; > >> dpdk_queue_pkts(dev, qid, mbufs, newcnt); > >> dpdk_queue_flush(dev, qid); > >> } > >> > >> + if (OVS_UNLIKELY(dropped)) { > >> + rte_spinlock_lock(&dev->stats_lock); > >> + dev->stats.tx_dropped += dropped; > >> + rte_spinlock_unlock(&dev->stats_lock); > >> + } > >> + > >> if (!dpdk_thread_is_pmd()) { > >> ovs_mutex_unlock(&nonpmd_mempool_mutex); > >> } > >> @@ -1304,15 +1446,24 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, > >>int qid, > >> } else { > >> int next_tx_idx = 0; > >> int dropped = 0; > >> + unsigned int qos_pkts = 0; > >> > >> for (i = 0; i < cnt; i++) { > >> int size = dp_packet_size(pkts[i]); > >> > >> if (OVS_UNLIKELY(size > dev->max_packet_len)) { > >> if (next_tx_idx != i) { > >> + cnt = i - next_tx_idx; > >> + qos_pkts = cnt; > >> + > >> + cnt = netdev_dpdk_qos_process__(dev, > >> + (struct > >>rte_mbuf**)pkts, > >> + cnt); > >> + dropped += qos_pkts - cnt; > >> dpdk_queue_pkts(dev, qid, > >> (struct rte_mbuf > >>**)&pkts[next_tx_idx], > >> - i-next_tx_idx); > >> + cnt); > >> + > >> } > >> > >> VLOG_WARN_RL(&rl, "Too big size %d max_packet_len > >>%d", @@ -1324,9 +1475,13 @@ netdev_dpdk_send__(struct netdev_dpdk > >>*dev, int qid, > >> } > >> } > >> if (next_tx_idx != cnt) { > >> - dpdk_queue_pkts(dev, qid, > >> - (struct rte_mbuf **)&pkts[next_tx_idx], > >> - cnt-next_tx_idx); > >> + cnt -= next_tx_idx; > >> + qos_pkts = cnt; > >> + > >> + cnt = netdev_dpdk_qos_process__(dev, (struct > >>rte_mbuf**)pkts, cnt); > >> + dropped += qos_pkts - cnt; > >> + dpdk_queue_pkts(dev, qid, (struct rte_mbuf > >>**)&pkts[next_tx_idx], > >> + cnt); > >> } > >> > >> if (OVS_UNLIKELY(dropped)) { @@ -2107,6 +2262,264 @@ > >> unlock_dpdk: > >> return err; > >> } > >> > >> +/* QoS Functions */ > >> + > >> +/* > >> + * Initialize QoS configuration operations. > >> + */ > >> +static void > >> +qos_conf_init(struct qos_conf *conf, const struct dpdk_qos_ops *ops) > >> +{ > >> + conf->ops = ops; > >> +} > >> + > >> +/* > >> + * Search existing QoS operations in qos_ops and compare each set of > >> + * operations qos_name to name. Return a dpdk_qos_ops pointer to a > >>match, > >> + * else return NULL > >> + */ > >> +static const struct dpdk_qos_ops * > >> +qos_lookup_name(const char *name) > >> +{ > >> + const struct dpdk_qos_ops *const *opsp; > >> + > >> + for (opsp = qos_confs; *opsp != NULL; opsp++) { > >> + const struct dpdk_qos_ops *ops = *opsp; > >> + if (!strncmp(name, ops->qos_name,DPDK_MAX_QOS_NAME_SIZE)) { > > ^ missing space > > > >Do we really need strncmp? Since it comes from DB, I suspect it will > >always be NULL terminated. > > I agree. We should drop DPDK_MAX_QOS_NAME_SIZE entirely > OK, I wasn't aware of the DB behavior. Good to know. Will remove it. > > > > > >> + return ops; > >> + } > >> + } > >> + return NULL; > >> +} > >> + > >> +/* > >> + * Call qos_destruct to clean up items associated with the netdevs > >> + * qos_conf. Set netdevs qos_conf to NULL. > >> + */ > >> +static void > >> +qos_delete_conf(struct netdev *netdev_) { > >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > >> + > >> + rte_spinlock_lock(&netdev->qos_lock); > >> + if (netdev->qos_conf) { > >> + if (netdev->qos_conf->ops->qos_destruct) { > >> + netdev->qos_conf->ops->qos_destruct(netdev_, > >>netdev->qos_conf); > >> + } > >> + netdev->qos_conf = NULL; > >> + } > >> + rte_spinlock_unlock(&netdev->qos_lock); > >> +} > >> + > >> +static int > >> +netdev_dpdk_get_qos_types(const struct netdev *netdev OVS_UNUSED, > >> + struct sset *types) { > >> + const struct dpdk_qos_ops *const *opsp; > >> + > >> + for (opsp = qos_confs; *opsp != NULL; opsp++) { > >> + const struct dpdk_qos_ops *ops = *opsp; > >> + if (ops->qos_construct && ops->qos_name[0] != '\0') { > >> + sset_add(types, ops->qos_name); > >> + } > >> + } > >> + return 0; > >> +} > >> + > >> +static int > >> +netdev_dpdk_get_qos(const struct netdev *netdev_, > >> + const char **typep, struct smap *details) { > >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > >> + int error = 0; > >> + > >> + ovs_mutex_lock(&netdev->mutex); > >> + if(netdev->qos_conf) { > >> + *typep = netdev->qos_conf->ops->qos_name; > >> + error = (netdev->qos_conf->ops->qos_get > >> + ? netdev->qos_conf->ops->qos_get(netdev_, details): > >>0); > >> + } > >> + ovs_mutex_unlock(&netdev->mutex); > >> + > >> + return error; > >> +} > >> + > >> +static int > >> +netdev_dpdk_set_qos(struct netdev *netdev_, > >> + const char *type, const struct smap *details) { > >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > >> + const struct dpdk_qos_ops *new_ops = NULL; > >> + int error = 0; > >> + > >> + /* If type is empty the current QoS configuration for the > >>dpdk-netdev can > >> + * be destroyed */ > >> + if(type[0] == '\0') { > >> + qos_delete_conf(netdev_); > > > >Are we missing ovs_mutex_lock(&netdev->mutex) here? > > > >> + } > >> + > >> + new_ops = qos_lookup_name(type); > >> + if (!new_ops || !new_ops->qos_construct) { > >> + return EOPNOTSUPP; > >> + } > > > >I would suggest to do: > > > >netdev_dpdk_set_qos(struct netdev *netdev_, > > const char *type, const struct smap *details) { > > struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > > const struct dpdk_qos_ops *new_ops = NULL; > > int error = 0; > > > > new_ops = qos_lookup_name(type); > > if (!new_ops || !new_ops->qos_construct) { > > return EOPNOTSUPP; > > } > > > > ovs_mutex_lock(&netdev->mutex); > > if(type[0] == '\0') { > > qos_delete_conf(netdev_); > > } > > I would treat the case of 'type' not found or 'type'=="" equally and > delete the qos configuration. Understood, I can't remember if I had it that way originally, let me investigate. > > > > >[ ...continue below as you posted ] > >> + if (netdev->qos_conf) { > >> + if (new_ops == netdev->qos_conf->ops) { > >> + error = new_ops->qos_set ? new_ops->qos_set(netdev_, > >>details) : 0; > >> + } > >> + else { > >> + /* Delete existing QoS configuration. */ > >> + qos_delete_conf(netdev_); > >> + ovs_assert(netdev->qos_conf == NULL); > >> + > >> + /* Install new QoS configuration. */ > >> + error = new_ops->qos_construct(netdev_, details); > >> + ovs_assert((error == 0) == (netdev->qos_conf != NULL)); > >> + } > >> + } else { > >> + error = new_ops->qos_construct(netdev_, details); > >> + ovs_assert((error == 0) == (netdev->qos_conf != NULL)); > >> + } > >> + > >> + ovs_mutex_unlock(&netdev->mutex); > >> + return error; > >> +} > >> + > >> +/* egress-policer details */ > >> + > >> +struct egress_policer { > >> + struct qos_conf qos_conf; > >> + struct rte_meter_srtcm_params app_srtcm_params; > >> + struct rte_meter_srtcm egress_meter; }; > >> + > >> +static struct egress_policer * > >> +egress_policer_get__(const struct netdev *netdev_) { > >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > >> + return CONTAINER_OF(netdev->qos_conf, struct egress_policer, > >>qos_conf); > >> +} > >> + > >> +static int > >> +egress_policer_qos_construct(struct netdev *netdev_, > >> + const struct smap *details) { > >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > >> + struct egress_policer *policer; > >> + const char *cir_s; > >> + const char *cbs_s; > >> + int err = 0; > >> + > >> + rte_spinlock_lock(&netdev->qos_lock); > >> + policer = xmalloc(sizeof *policer); > >> + qos_conf_init(&policer->qos_conf, &egress_policer_ops); > >> + netdev->qos_conf = &policer->qos_conf; > >> + cir_s = smap_get(details, "cir"); > >> + cbs_s = smap_get(details, "cbs"); > >> + policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, > >> + 10) > >>: 0; > >> + policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, > >> + 10) > >>: 0; > >> + policer->app_srtcm_params.ebs = 0; > >> + err = rte_meter_srtcm_config(&policer->egress_meter, > >> + &policer->app_srtcm_params); > >> + rte_spinlock_unlock(&netdev->qos_lock); > >> + > >> + return err; > >> +} > >> + > >> +static void > >> +egress_policer_qos_destruct(struct netdev *netdev_ OVS_UNUSED, > >> + struct qos_conf *conf) { > >> + struct egress_policer *policer = CONTAINER_OF(conf, struct > >>egress_policer, > >> + qos_conf); > >> + free(policer); > >> +} > >> + > >> +static int > >> +egress_policer_qos_get(const struct netdev *netdev, struct smap > >>*details) > >> +{ > >> + struct egress_policer *policer = egress_policer_get__(netdev); > >> + smap_add_format(details, "cir", "%llu", > >> + 1ULL * policer->app_srtcm_params.cir); > >> + smap_add_format(details, "cbs", "%llu", > >> + 1ULL * policer->app_srtcm_params.cbs); > >> + return 0; > >> +} > >> + > >> +static int > >> +egress_policer_qos_set(struct netdev *netdev_, const struct smap > >>*details) > >> +{ > >> + struct egress_policer *policer; > >> + const char *cir_s; > >> + const char *cbs_s; > >> + int err = 0; > >> + > >> + policer = egress_policer_get__(netdev_); > >> + cir_s = smap_get(details, "cir"); > >> + cbs_s = smap_get(details, "cbs"); > >> + policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, > >> + 10) > >>: 0; > >> + policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, > >> + 10) > >>: 0; > >> + policer->app_srtcm_params.ebs = 0; > >> + err = rte_meter_srtcm_config(&policer->egress_meter, > >> + &policer->app_srtcm_params); > >> + > >> + return err; > >> +} > >> + > >> +static inline int > >> +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > >> + struct rte_mbuf *pkt, uint64_t time) { > >> + uint8_t output_color; > >> + uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct > >>ether_hdr); > >> + enum egress_policer_action action = GREEN; > >> + > >> + /* color input is not used for blind modes */ > >> + output_color = (uint8_t) > >> + rte_meter_srtcm_color_blind_check(meter, > >>time, > >> + > >>pkt_len); > >> + > >> + /* Check output color, 0 corresponds to GREEN i.e. packet can be > >> + * transmitted. If greater than 0 then action is set to DROP. */ > >> + if (output_color) { > >> + action = DROP; > >> + } > > > > > >This could be simpler: > > > >static inline int > >egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > > struct rte_mbuf *pkt, uint64_t time) { > > uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct > >ether_hdr); > > > > /* If GREEN, accept it otherwise drop it */ > > if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) != > >e_RTE_METER_GREEN) { > > return 1; > > } > > > > return 0; > >} > > > > > > > >> + return action; > >> +} > >> + > >> +static int > >> +egress_policer_alg_process(struct netdev *netdev_, struct rte_mbuf > >>**pkts, > >> + int pkt_cnt) { > >> + int i = 0; > >> + int cnt = pkt_cnt; > >> + struct egress_policer *policer = egress_policer_get__(netdev_); > >> + struct rte_mbuf *pkt = NULL; > >> + uint64_t current_time = rte_rdtsc(); > >> + > >> + for(i = 0; i < pkt_cnt; i++) { > >> + pkt = pkts[i]; > >> + /* Handle current packet */ > >> + if (egress_policer_pkt_handle__(&policer->egress_meter, pkt, > >> + current_time) == DROP) { > >> + rte_pktmbuf_free(pkt); > >> + cnt = cnt - 1; > >> + } > >> + } > >> + > >> + return cnt; > >> +} > >> + > >> +static const struct dpdk_qos_ops egress_policer_ops = { > >> + "egress-policer", /* qos_name */ > >> + egress_policer_qos_construct, > >> + egress_policer_qos_destruct, > >> + egress_policer_qos_get, > >> + egress_policer_qos_set, > >> + egress_policer_alg_process > >> +}; > >> + > >> #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, > >>SEND, \ > >> GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) > >> \ > >> { \ > >> @@ -2144,10 +2557,10 @@ unlock_dpdk: > >> NULL, /* set_advertisements */ \ > >> \ > >> NULL, /* set_policing */ \ > >> - NULL, /* get_qos_types */ \ > >> + netdev_dpdk_get_qos_types, \ > >> NULL, /* get_qos_capabilities */ \ > >> - NULL, /* get_qos */ \ > >> - NULL, /* set_qos */ \ > >> + netdev_dpdk_get_qos, \ > >> + netdev_dpdk_set_qos, \ > >> NULL, /* get_queue */ \ > >> NULL, /* set_queue */ \ > >> NULL, /* delete_queue */ \ > >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > >> ce0dbc1..545d47f 100644 > >> --- a/vswitchd/vswitch.xml > >> +++ b/vswitchd/vswitch.xml > >> @@ -3298,6 +3298,26 @@ > >> for information on how this classifier works. > >> </dd> > >> </dl> > >> + <dl> > >> + <dt><code>egress-policer</code></dt> > >> + <dd> > >> + An egress policer algorithm. This implementation uses the > >>DPDK > >> + rte_meter library. The rte_meter library provides an > >>implementation > >> + of srTCM (RFC2697) which allows the metering and policing > >> + of > >>traffic. > >> + The implementation in OVS creates a color blind srTCM > >> + meter > >>with a > >> + single token bucket used to police traffic. It should be > >>noted that > >> + when the rte_meter is configured as part of QoS there will > >>be a > >> + performance overhead as the rte_mter itself will consume > >> + CPU > >>cycles > >> + in order to police traffic. These CPU cycles ordinarily > >> + are > >>used for > >> + packet proccessing. As such the drop in performance will > >> + be > >>noticed > >> + in terms of overall aggregate traffic throughput. > >> + > >> + For more information regarding the usage of the DPDK > >>rte_meter see > >> + > >><code>http://dpdk.org/doc/guides/sample_app_ug/qos_metering.html</code > >>> > >> + For more information regarding srTCM see > >> + <code>https://tools.ietf.org/html/rfc2697</code> > >> + </dd> > >> + </dl> > > I'm still not sure whether we should have references to rfc2697 or the > DPDK qos sample app. From a user perspective this is a token bucket. > If there are only two possible actions (drop and forward), what's the > value of considering the colors? > True, I'll remove these references. > >> </column> > >> > >> <column name="queues"> > >> @@ -3334,6 +3354,38 @@ > >> </column> > >> </group> > >> > >> + <group title="Configuration for egress-policer QoS"> > >> + <p> > >> + <ref table="QoS"/> <ref table="QoS" column="type"/> > >> + <code>egress-policer</code> provides egress policing for > >>userspace > >> + port types with DPDK. > >> + > >> + It has the following key-value pairs defined. > >> + </p> > >> + > >> + <column name="other_config" key="cir" type='{"type": > "integer"}'> > >> + The Committed Information Rate (CIR) is measured in bytes of > IP > >> + packets per second, i.e. it includes the IP header, but not > >>link > >> + specific (e.g. Ethernet) headers. This represents the bytes > >>per second > >> + rate at which the token bucket will be updated. The cir > >> + value > >>is > >> + calculated by (pps x packet data size). For example > >> + assuming > >>a user > >> + wishes to limit a stream consisting of 64 byte packets to 1 > >>million > >> + packets per second the CIR would be set to to to 46000000. > >>This value > >> + can be broken into '1,000,000 x 46'. Where 1,000,000 is the > >>policing > >> + rate for the number of packets per second and 46 represents > >>the size > >> + of the packet data for a 64 byte ip packet. > >> + </column> > >> + <column name="other_config" key="cbs" type='{"type": > "integer"}'> > >> + The Committed Burst Size (CBS) is measured in bytes and > >>represents a > >> + token bucket. At a minimum this value should be be set to > >> + the > >>expected > >> + largest size packet in the traffic stream. In practice > >> + larger > >>values > >> + may be used to increase the size of the token bucket. If a > >>packet can > >> + be transmitted then the cbs will be decremented by the > >> + number > >>of > >> + bytes/tokens of the packet. If there are not enough tokens > >> + in > >>the cbs > >> + bucket the packet will be dropped. > >> + </column> > >> + </group> > >> + > >> <group title="Common Columns"> > >> The overall purpose of these columns is described under > >><code>Common > >> Columns</code> at the beginning of this document. > > > >On a higher level, it would be nice to move QoS from > >__netdev_dpdk_vhost_send() to its caller netdev_dpdk_vhost_send() and > >then run QoS before do the array copy at dpdk_do_tx_copy(). > >But I guess we need the packet as a rte_mbuf. > > > >Thanks, > >-- > >fbl > > > >_______________________________________________ > >dev mailing list > >dev@openvswitch.org > >http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev