Thanks for the patch and the review. I agree with everything Flavio pointed out, a few more comments below
On 10/02/2016 11:54, "Flavio Leitner" <f...@sysclose.org> wrote: >On Mon, 1 Feb 2016 20:47:25 +0000 >Ian Stokes <ian.sto...@intel.com> wrote: > >> This patch provides the modifications required in netdev-dpdk.c and >> vswitch.xml to allow for a DPDK user space QoS algorithm. >> >> This patch adds a QoS configuration structure for netdev-dpdk and >> expected QoS operations 'dpdk_qos_ops'. Various helper functions >> are also supplied. >> >> Also included are the modifications required for vswitch.xml to allow a >> new QoS implementation for netdev-dpdk devices. This includes a new QoS >>type >> `egress-policer` as well as its expected QoS table entries. >> >> The QoS functionality implemented for DPDK devices is `egress-policer`. >> This can be used to drop egress packets at a configurable rate. >> >> The INSTALL.DPDK.md guide has also been modified to provide an example >> configuration of `egress-policer` QoS. >> >> Signed-off-by: Ian Stokes <ian.sto...@intel.com> >> --- >> INSTALL.DPDK.md | 20 +++ >> lib/netdev-dpdk.c | 439 >>++++++++++++++++++++++++++++++++++++++++++++++++-- >> vswitchd/vswitch.xml | 52 ++++++ >> 3 files changed, 498 insertions(+), 13 deletions(-) >> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> index e8ef4b5..dac70cd 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd: >> ./ovs-ofctl add-flow br0 in_port=2,action=output:1 >> ``` >> >> +8. QoS usage example >> + >> + Assuming you have a vhost-user port transmitting traffic consisting >>of >> + packets of size 64 bytes, the following command would limit the >>egress >> + transmission rate of the port to ~1,000,000 packets per second: >> + >> + `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create >>qos >> + type=egress-policer other-config:cir=46000000 other-config:cbs=2048` >> + >> + To examine the QoS configuration of the port: >> + >> + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` >> + >> + To clear the QoS configuration from the port and ovsdb use the >>following: >> + >> + `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0 qos` Minor nit: the first "--" is not necessary >> + >> + For more details regarding egress-policer parameters please refer >>to the >> + vswitch.xml. >> + >> Performance Tuning: >> ------------------- >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index 09ccc2c..716da79 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -44,6 +44,7 @@ >> #include "ovs-rcu.h" >> #include "packets.h" >> #include "shash.h" >> +#include "smap.h" >> #include "sset.h" >> #include "unaligned.h" >> #include "timeval.h" >> @@ -52,12 +53,14 @@ >> >> #include "rte_config.h" >> #include "rte_mbuf.h" >> +#include "rte_meter.h" >> #include "rte_virtio_net.h" >> >> VLOG_DEFINE_THIS_MODULE(dpdk); >> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); >> >> #define DPDK_PORT_WATCHDOG_INTERVAL 5 >> +#define DPDK_MAX_QOS_NAME_SIZE 10 > >10 is too short to give an useful description. >The proposed one already goes over it. > >>>> a = 'egress-policer' >>>> len(a) >14 > > > >> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE >> #define OVS_VPORT_DPDK "ovs_dpdk" >> @@ -142,6 +145,107 @@ static int rte_eal_init_ret = ENODEV; >> >> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER; >> >> +/* Quality of Service */ >> + >> +/* An instance of a QoS configuration. Always associated with a >>particular >> + * network device. >> + * >> + * Each QoS implementation subclasses this with whatever additional >>data it >> + * needs. >> + */ >> +struct qos_conf { >> + const struct dpdk_qos_ops *ops; >> +}; >> + >> +/* A particular implementation of dpdk QoS operations. >> + * >> + * The functions below return 0 if successful or a positive errno >>value on >> + * failure, except where otherwise noted. All of them must be >>provided, except >> + * where otherwise noted. >> + */ >> +struct dpdk_qos_ops { >> + >> + /* Name of the QoS type */ > >Please mention DPDK_MAX_QOS_NAME_SIZE > > >> + const char *qos_name; >> + >> + /* Called to construct the QoS implementation on 'netdev'. The >> + * implementation should make the appropriate calls to configure >>QoS >> + * according to 'details'. The implementation may assume that any >>current >> + * QoS configuration already installed should be destroyed before >> + * constructing the new configuration. >> + * >> + * The contents of 'details' should be documented as valid for >>'ovs_name' >> + * in the "other_config" column in the "QoS" table in >>vswitchd/vswitch.xml >> + * (which is built as ovs-vswitchd.conf.db(8)). >> + * >> + * This function must return 0 if and only if it sets >>'netdev->qos_conf' >> + * to an initialized 'struct qos_conf'. >> + * >> + * For all QoS implementations it should always be non-null. >> + */ >> + int (*qos_construct)(struct netdev *netdev, const struct smap >>*details); >> + >> + /* Destroys the data structures allocated by the implementation as >>part of >> + * 'qos_conf. >> + * >> + * For all QoS implementations it should always be non-null. >> + */ >> + void (*qos_destruct)(struct netdev *netdev, struct qos_conf *conf); >> + >> + /* Retrieves details of 'netdev->qos_conf' configuration into >>'details'. >> + * >> + * The contents of 'details' should be documented as valid for >>'ovs_name' >> + * in the "other_config" column in the "QoS" table in >>vswitchd/vswitch.xml >> + * (which is built as ovs-vswitchd.conf.db(8)). >> + */ >> + int (*qos_get)(const struct netdev *netdev, struct smap *details); >> + >> + /* Reconfigures 'netdev->qos_conf' according to 'details', >>performing any >> + * required calls to complete the reconfiguration. >> + * >> + * The contents of 'details' should be documented as valid for >>'ovs_name' >> + * in the "other_config" column in the "QoS" table in >>vswitchd/vswitch.xml >> + * (which is built as ovs-vswitchd.conf.db(8)). >> + * >> + * This function may be null if 'qos_conf' is not configurable. >> + */ >> + int (*qos_set)(struct netdev *netdev, const struct smap *details); >> + >> + /* Modify an array of rte_mbufs. The modification is specific to >> + * each qos implementation. >> + * >> + * The function should take and array of mbufs and an int >>representing >> + * the current number of mbufs present in the array. >> + * >> + * After the function has performed a qos modification to the >>array of >> + * mbufs it returns an int representing the number of mbufs now >>present in >> + * the array. This value is can then be passed to the port send >>function >> + * along with the modified array for transmission. >> + * >> + * For all QoS implementations it should always be non-null. >> + */ >> + int (*qos_alg_process)(struct netdev *netdev, struct rte_mbuf >>**pkts, >> + int pkt_cnt); > >Maybe qos_run() ? > > >> +}; >> + >> +/* dpdk_qos_ops for each type of user space QoS implementation */ >> +static const struct dpdk_qos_ops egress_policer_ops; >> + >> +/* >> + * Array of dpdk_qos_ops, contains pointer to all supported QoS >> + * operations. >> + */ >> +static const struct dpdk_qos_ops *const qos_confs[] = { >> + &egress_policer_ops, >> + NULL >> +}; >> + >> +/* Action that can be set for a packet for rte_meter */ >> +enum egress_policer_action { >> + GREEN = e_RTE_METER_GREEN, >> + DROP = 1, >> +}; > >I don't think it's a good idea to mix external and internal declarations >because if the external one changes, it will break here. Since GREEN and >DROP >have meanings only inside of OVS, we should define our own values. >See my comment where this is used. > > >> + >> /* Contains all 'struct dpdk_dev's. */ >> static struct ovs_list dpdk_list OVS_GUARDED_BY(dpdk_mutex) >> = OVS_LIST_INITIALIZER(&dpdk_list); >> @@ -232,6 +336,11 @@ struct netdev_dpdk { >> >> /* In dpdk_list. */ >> struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); >> + >> + /* QoS configuration and lock for the device */ >> + struct qos_conf *qos_conf; >> + rte_spinlock_t qos_lock; >> + >> }; >> >> struct netdev_rxq_dpdk { >> @@ -611,6 +720,10 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>int port_no, >> goto unlock; >> } >> >> + /* Initialise QoS configuration to NULL and qos lock to unlocked */ >> + netdev->qos_conf = NULL; >> + rte_spinlock_init(&netdev->qos_lock); >> + >> netdev_->n_txq = NR_QUEUE; >> netdev_->n_rxq = NR_QUEUE; >> netdev->real_n_txq = NR_QUEUE; >> @@ -1066,6 +1179,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, >>struct dp_packet **packets, >> return 0; >> } >> >> +static inline int >> +netdev_dpdk_qos_process__(struct netdev_dpdk *dev, struct rte_mbuf >>**pkts, >> + int cnt) >> +{ >> + struct netdev *netdev = &dev->up; >> + >> + if (dev->qos_conf != NULL) { >> + rte_spinlock_lock(&dev->qos_lock); >> + if (dev->qos_conf != NULL) { >> + cnt = dev->qos_conf->ops->qos_alg_process(netdev, pkts, >>cnt); >> + } >> + rte_spinlock_unlock(&dev->qos_lock); >> + } >> + >> + return cnt; >> +} >> + >> static inline void >> netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats, >> struct dp_packet **packets, >> @@ -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, >>int qid, >> struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev); >> struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; >> unsigned int total_pkts = cnt; >> + unsigned int qos_pkts = cnt; >> uint64_t start = 0; >> >> if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { >> @@ -1106,6 +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, >>int qid, >> rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock); >> } >> >> + /* Check has QoS has been configured for the netdev */ >> + cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt); >> + qos_pkts -= cnt; > >I don't see how this is working because egress_policer_alg_process() >will loop in the array one by one freeing or not the packet. >If it does nothing, then the array is good but otherwise it creates >a problem because the array still references freed entries. So in >loop below we are sending a mix of good and freed buffers up to 'cnt'. > This is definitely a problem. Sorry I didn't catch it on the review of V1. > >> + >> do { >> int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; >> unsigned int tx_pkts; >> @@ -1147,6 +1282,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, >>int qid, >> } >> >> rte_spinlock_lock(&vhost_dev->stats_lock); >> + cnt += qos_pkts; >> netdev_dpdk_vhost_update_tx_counters(&vhost_dev->stats, pkts, >>total_pkts, >> cnt); >> rte_spinlock_unlock(&vhost_dev->stats_lock); >> @@ -1242,19 +1378,25 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, >>struct dp_packet **pkts, >> newcnt++; >> } >> >> - if (OVS_UNLIKELY(dropped)) { >> - rte_spinlock_lock(&dev->stats_lock); >> - dev->stats.tx_dropped += dropped; >> - rte_spinlock_unlock(&dev->stats_lock); >> - } >> - >> if (dev->type == DPDK_DEV_VHOST) { >> __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) >>mbufs, newcnt, true); >> } else { >> + unsigned int qos_pkts = newcnt; >> + >> + /* Check if QoS has been configured for this netdev. */ >> + newcnt = netdev_dpdk_qos_process__(dev, mbufs, newcnt); >> + >> + dropped += qos_pkts - newcnt; >> dpdk_queue_pkts(dev, qid, mbufs, newcnt); >> dpdk_queue_flush(dev, qid); >> } >> >> + if (OVS_UNLIKELY(dropped)) { >> + rte_spinlock_lock(&dev->stats_lock); >> + dev->stats.tx_dropped += dropped; >> + rte_spinlock_unlock(&dev->stats_lock); >> + } >> + >> if (!dpdk_thread_is_pmd()) { >> ovs_mutex_unlock(&nonpmd_mempool_mutex); >> } >> @@ -1304,15 +1446,24 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int >>qid, >> } else { >> int next_tx_idx = 0; >> int dropped = 0; >> + unsigned int qos_pkts = 0; >> >> for (i = 0; i < cnt; i++) { >> int size = dp_packet_size(pkts[i]); >> >> if (OVS_UNLIKELY(size > dev->max_packet_len)) { >> if (next_tx_idx != i) { >> + cnt = i - next_tx_idx; >> + qos_pkts = cnt; >> + >> + cnt = netdev_dpdk_qos_process__(dev, >> + (struct >>rte_mbuf**)pkts, >> + cnt); >> + dropped += qos_pkts - cnt; >> dpdk_queue_pkts(dev, qid, >> (struct rte_mbuf >>**)&pkts[next_tx_idx], >> - i-next_tx_idx); >> + cnt); >> + >> } >> >> VLOG_WARN_RL(&rl, "Too big size %d max_packet_len %d", >> @@ -1324,9 +1475,13 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int >>qid, >> } >> } >> if (next_tx_idx != cnt) { >> - dpdk_queue_pkts(dev, qid, >> - (struct rte_mbuf **)&pkts[next_tx_idx], >> - cnt-next_tx_idx); >> + cnt -= next_tx_idx; >> + qos_pkts = cnt; >> + >> + cnt = netdev_dpdk_qos_process__(dev, (struct >>rte_mbuf**)pkts, cnt); >> + dropped += qos_pkts - cnt; >> + dpdk_queue_pkts(dev, qid, (struct rte_mbuf >>**)&pkts[next_tx_idx], >> + cnt); >> } >> >> if (OVS_UNLIKELY(dropped)) { >> @@ -2107,6 +2262,264 @@ unlock_dpdk: >> return err; >> } >> >> +/* QoS Functions */ >> + >> +/* >> + * Initialize QoS configuration operations. >> + */ >> +static void >> +qos_conf_init(struct qos_conf *conf, const struct dpdk_qos_ops *ops) >> +{ >> + conf->ops = ops; >> +} >> + >> +/* >> + * Search existing QoS operations in qos_ops and compare each set of >> + * operations qos_name to name. Return a dpdk_qos_ops pointer to a >>match, >> + * else return NULL >> + */ >> +static const struct dpdk_qos_ops * >> +qos_lookup_name(const char *name) >> +{ >> + const struct dpdk_qos_ops *const *opsp; >> + >> + for (opsp = qos_confs; *opsp != NULL; opsp++) { >> + const struct dpdk_qos_ops *ops = *opsp; >> + if (!strncmp(name, ops->qos_name,DPDK_MAX_QOS_NAME_SIZE)) { > ^ missing space > >Do we really need strncmp? Since it comes from DB, I suspect it will >always >be NULL terminated. I agree. We should drop DPDK_MAX_QOS_NAME_SIZE entirely > > >> + return ops; >> + } >> + } >> + return NULL; >> +} >> + >> +/* >> + * Call qos_destruct to clean up items associated with the netdevs >> + * qos_conf. Set netdevs qos_conf to NULL. >> + */ >> +static void >> +qos_delete_conf(struct netdev *netdev_) >> +{ >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> + >> + rte_spinlock_lock(&netdev->qos_lock); >> + if (netdev->qos_conf) { >> + if (netdev->qos_conf->ops->qos_destruct) { >> + netdev->qos_conf->ops->qos_destruct(netdev_, >>netdev->qos_conf); >> + } >> + netdev->qos_conf = NULL; >> + } >> + rte_spinlock_unlock(&netdev->qos_lock); >> +} >> + >> +static int >> +netdev_dpdk_get_qos_types(const struct netdev *netdev OVS_UNUSED, >> + struct sset *types) >> +{ >> + const struct dpdk_qos_ops *const *opsp; >> + >> + for (opsp = qos_confs; *opsp != NULL; opsp++) { >> + const struct dpdk_qos_ops *ops = *opsp; >> + if (ops->qos_construct && ops->qos_name[0] != '\0') { >> + sset_add(types, ops->qos_name); >> + } >> + } >> + return 0; >> +} >> + >> +static int >> +netdev_dpdk_get_qos(const struct netdev *netdev_, >> + const char **typep, struct smap *details) >> +{ >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> + int error = 0; >> + >> + ovs_mutex_lock(&netdev->mutex); >> + if(netdev->qos_conf) { >> + *typep = netdev->qos_conf->ops->qos_name; >> + error = (netdev->qos_conf->ops->qos_get >> + ? netdev->qos_conf->ops->qos_get(netdev_, details): >>0); >> + } >> + ovs_mutex_unlock(&netdev->mutex); >> + >> + return error; >> +} >> + >> +static int >> +netdev_dpdk_set_qos(struct netdev *netdev_, >> + const char *type, const struct smap *details) >> +{ >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> + const struct dpdk_qos_ops *new_ops = NULL; >> + int error = 0; >> + >> + /* If type is empty the current QoS configuration for the >>dpdk-netdev can >> + * be destroyed */ >> + if(type[0] == '\0') { >> + qos_delete_conf(netdev_); > >Are we missing ovs_mutex_lock(&netdev->mutex) here? > >> + } >> + >> + new_ops = qos_lookup_name(type); >> + if (!new_ops || !new_ops->qos_construct) { >> + return EOPNOTSUPP; >> + } > >I would suggest to do: > >netdev_dpdk_set_qos(struct netdev *netdev_, > const char *type, const struct smap *details) >{ > struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > const struct dpdk_qos_ops *new_ops = NULL; > int error = 0; > > new_ops = qos_lookup_name(type); > if (!new_ops || !new_ops->qos_construct) { > return EOPNOTSUPP; > } > > ovs_mutex_lock(&netdev->mutex); > if(type[0] == '\0') { > qos_delete_conf(netdev_); > } I would treat the case of 'type' not found or 'type'=="" equally and delete the qos configuration. > >[ ...continue below as you posted ] >> + if (netdev->qos_conf) { >> + if (new_ops == netdev->qos_conf->ops) { >> + error = new_ops->qos_set ? new_ops->qos_set(netdev_, >>details) : 0; >> + } >> + else { >> + /* Delete existing QoS configuration. */ >> + qos_delete_conf(netdev_); >> + ovs_assert(netdev->qos_conf == NULL); >> + >> + /* Install new QoS configuration. */ >> + error = new_ops->qos_construct(netdev_, details); >> + ovs_assert((error == 0) == (netdev->qos_conf != NULL)); >> + } >> + } else { >> + error = new_ops->qos_construct(netdev_, details); >> + ovs_assert((error == 0) == (netdev->qos_conf != NULL)); >> + } >> + >> + ovs_mutex_unlock(&netdev->mutex); >> + return error; >> +} >> + >> +/* egress-policer details */ >> + >> +struct egress_policer { >> + struct qos_conf qos_conf; >> + struct rte_meter_srtcm_params app_srtcm_params; >> + struct rte_meter_srtcm egress_meter; >> +}; >> + >> +static struct egress_policer * >> +egress_policer_get__(const struct netdev *netdev_) >> +{ >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> + return CONTAINER_OF(netdev->qos_conf, struct egress_policer, >>qos_conf); >> +} >> + >> +static int >> +egress_policer_qos_construct(struct netdev *netdev_, >> + const struct smap *details) >> +{ >> + struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> + struct egress_policer *policer; >> + const char *cir_s; >> + const char *cbs_s; >> + int err = 0; >> + >> + rte_spinlock_lock(&netdev->qos_lock); >> + policer = xmalloc(sizeof *policer); >> + qos_conf_init(&policer->qos_conf, &egress_policer_ops); >> + netdev->qos_conf = &policer->qos_conf; >> + cir_s = smap_get(details, "cir"); >> + cbs_s = smap_get(details, "cbs"); >> + policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10) >>: 0; >> + policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, 10) >>: 0; >> + policer->app_srtcm_params.ebs = 0; >> + err = rte_meter_srtcm_config(&policer->egress_meter, >> + &policer->app_srtcm_params); >> + rte_spinlock_unlock(&netdev->qos_lock); >> + >> + return err; >> +} >> + >> +static void >> +egress_policer_qos_destruct(struct netdev *netdev_ OVS_UNUSED, >> + struct qos_conf *conf) >> +{ >> + struct egress_policer *policer = CONTAINER_OF(conf, struct >>egress_policer, >> + qos_conf); >> + free(policer); >> +} >> + >> +static int >> +egress_policer_qos_get(const struct netdev *netdev, struct smap >>*details) >> +{ >> + struct egress_policer *policer = egress_policer_get__(netdev); >> + smap_add_format(details, "cir", "%llu", >> + 1ULL * policer->app_srtcm_params.cir); >> + smap_add_format(details, "cbs", "%llu", >> + 1ULL * policer->app_srtcm_params.cbs); >> + return 0; >> +} >> + >> +static int >> +egress_policer_qos_set(struct netdev *netdev_, const struct smap >>*details) >> +{ >> + struct egress_policer *policer; >> + const char *cir_s; >> + const char *cbs_s; >> + int err = 0; >> + >> + policer = egress_policer_get__(netdev_); >> + cir_s = smap_get(details, "cir"); >> + cbs_s = smap_get(details, "cbs"); >> + policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10) >>: 0; >> + policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, 10) >>: 0; >> + policer->app_srtcm_params.ebs = 0; >> + err = rte_meter_srtcm_config(&policer->egress_meter, >> + &policer->app_srtcm_params); >> + >> + return err; >> +} >> + >> +static inline int >> +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, >> + struct rte_mbuf *pkt, uint64_t time) >> +{ >> + uint8_t output_color; >> + uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct >>ether_hdr); >> + enum egress_policer_action action = GREEN; >> + >> + /* color input is not used for blind modes */ >> + output_color = (uint8_t) rte_meter_srtcm_color_blind_check(meter, >>time, >> + >>pkt_len); >> + >> + /* Check output color, 0 corresponds to GREEN i.e. packet can be >> + * transmitted. If greater than 0 then action is set to DROP. */ >> + if (output_color) { >> + action = DROP; >> + } > > >This could be simpler: > >static inline int >egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > struct rte_mbuf *pkt, uint64_t time) >{ > uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct >ether_hdr); > > /* If GREEN, accept it otherwise drop it */ > if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) != >e_RTE_METER_GREEN) { > return 1; > } > > return 0; >} > > > >> + return action; >> +} >> + >> +static int >> +egress_policer_alg_process(struct netdev *netdev_, struct rte_mbuf >>**pkts, >> + int pkt_cnt) >> +{ >> + int i = 0; >> + int cnt = pkt_cnt; >> + struct egress_policer *policer = egress_policer_get__(netdev_); >> + struct rte_mbuf *pkt = NULL; >> + uint64_t current_time = rte_rdtsc(); >> + >> + for(i = 0; i < pkt_cnt; i++) { >> + pkt = pkts[i]; >> + /* Handle current packet */ >> + if (egress_policer_pkt_handle__(&policer->egress_meter, pkt, >> + current_time) == DROP) { >> + rte_pktmbuf_free(pkt); >> + cnt = cnt - 1; >> + } >> + } >> + >> + return cnt; >> +} >> + >> +static const struct dpdk_qos_ops egress_policer_ops = { >> + "egress-policer", /* qos_name */ >> + egress_policer_qos_construct, >> + egress_policer_qos_destruct, >> + egress_policer_qos_get, >> + egress_policer_qos_set, >> + egress_policer_alg_process >> +}; >> + >> #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, >>SEND, \ >> GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) >> \ >> { \ >> @@ -2144,10 +2557,10 @@ unlock_dpdk: >> NULL, /* set_advertisements */ \ >> \ >> NULL, /* set_policing */ \ >> - NULL, /* get_qos_types */ \ >> + netdev_dpdk_get_qos_types, \ >> NULL, /* get_qos_capabilities */ \ >> - NULL, /* get_qos */ \ >> - NULL, /* set_qos */ \ >> + netdev_dpdk_get_qos, \ >> + netdev_dpdk_set_qos, \ >> NULL, /* get_queue */ \ >> NULL, /* set_queue */ \ >> NULL, /* delete_queue */ \ >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >> index ce0dbc1..545d47f 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -3298,6 +3298,26 @@ >> for information on how this classifier works. >> </dd> >> </dl> >> + <dl> >> + <dt><code>egress-policer</code></dt> >> + <dd> >> + An egress policer algorithm. This implementation uses the >>DPDK >> + rte_meter library. The rte_meter library provides an >>implementation >> + of srTCM (RFC2697) which allows the metering and policing of >>traffic. >> + The implementation in OVS creates a color blind srTCM meter >>with a >> + single token bucket used to police traffic. It should be >>noted that >> + when the rte_meter is configured as part of QoS there will >>be a >> + performance overhead as the rte_mter itself will consume CPU >>cycles >> + in order to police traffic. These CPU cycles ordinarily are >>used for >> + packet proccessing. As such the drop in performance will be >>noticed >> + in terms of overall aggregate traffic throughput. >> + >> + For more information regarding the usage of the DPDK >>rte_meter see >> + >><code>http://dpdk.org/doc/guides/sample_app_ug/qos_metering.html</code> >> + For more information regarding srTCM see >> + <code>https://tools.ietf.org/html/rfc2697</code> >> + </dd> >> + </dl> I'm still not sure whether we should have references to rfc2697 or the DPDK qos sample app. From a user perspective this is a token bucket. If there are only two possible actions (drop and forward), what's the value of considering the colors? >> </column> >> >> <column name="queues"> >> @@ -3334,6 +3354,38 @@ >> </column> >> </group> >> >> + <group title="Configuration for egress-policer QoS"> >> + <p> >> + <ref table="QoS"/> <ref table="QoS" column="type"/> >> + <code>egress-policer</code> provides egress policing for >>userspace >> + port types with DPDK. >> + >> + It has the following key-value pairs defined. >> + </p> >> + >> + <column name="other_config" key="cir" type='{"type": "integer"}'> >> + The Committed Information Rate (CIR) is measured in bytes of IP >> + packets per second, i.e. it includes the IP header, but not >>link >> + specific (e.g. Ethernet) headers. This represents the bytes >>per second >> + rate at which the token bucket will be updated. The cir value >>is >> + calculated by (pps x packet data size). For example assuming >>a user >> + wishes to limit a stream consisting of 64 byte packets to 1 >>million >> + packets per second the CIR would be set to to to 46000000. >>This value >> + can be broken into '1,000,000 x 46'. Where 1,000,000 is the >>policing >> + rate for the number of packets per second and 46 represents >>the size >> + of the packet data for a 64 byte ip packet. >> + </column> >> + <column name="other_config" key="cbs" type='{"type": "integer"}'> >> + The Committed Burst Size (CBS) is measured in bytes and >>represents a >> + token bucket. At a minimum this value should be be set to the >>expected >> + largest size packet in the traffic stream. In practice larger >>values >> + may be used to increase the size of the token bucket. If a >>packet can >> + be transmitted then the cbs will be decremented by the number >>of >> + bytes/tokens of the packet. If there are not enough tokens in >>the cbs >> + bucket the packet will be dropped. >> + </column> >> + </group> >> + >> <group title="Common Columns"> >> The overall purpose of these columns is described under >><code>Common >> Columns</code> at the beginning of this document. > >On a higher level, it would be nice to move QoS from >__netdev_dpdk_vhost_send() to its caller netdev_dpdk_vhost_send() >and then run QoS before do the array copy at dpdk_do_tx_copy(). >But I guess we need the packet as a rte_mbuf. > >Thanks, >-- >fbl > >_______________________________________________ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev