> On Thu, 11 Feb 2016 12:32:56 +0000 > "Stokes, Ian" <ian.sto...@intel.com> wrote: > > > Thanks For the review Flavio, much appreciated, comments inline. I'll > re-spin a new version also. > > Thank you for the patch! > comments inline. > > > > > > > -----Original Message----- > > > From: Flavio Leitner [mailto:f...@sysclose.org] > > > Sent: Wednesday, February 10, 2016 7:55 PM > > > To: Stokes, Ian > > > Cc: dev@openvswitch.org > > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS > functionality. > > > > > > On Mon, 1 Feb 2016 20:47:25 +0000 > > > Ian Stokes <ian.sto...@intel.com> wrote: > > > > > > > This patch provides the modifications required in netdev-dpdk.c > > > > and vswitch.xml to allow for a DPDK user space QoS algorithm. > > > > > > > > This patch adds a QoS configuration structure for netdev-dpdk and > > > > expected QoS operations 'dpdk_qos_ops'. Various helper functions > > > > are also supplied. > > > > > > > > Also included are the modifications required for vswitch.xml to > > > > allow a new QoS implementation for netdev-dpdk devices. This > > > > includes a new QoS type `egress-policer` as well as its expected > QoS table entries. > > > > > > > > The QoS functionality implemented for DPDK devices is `egress- > > > policer`. > > > > This can be used to drop egress packets at a configurable rate. > > > > > > > > The INSTALL.DPDK.md guide has also been modified to provide an > > > > example configuration of `egress-policer` QoS. > > > > > > > > Signed-off-by: Ian Stokes <ian.sto...@intel.com> > > > > --- > > > > INSTALL.DPDK.md | 20 +++ > > > > lib/netdev-dpdk.c | 439 > > > ++++++++++++++++++++++++++++++++++++++++++++++++-- > > > > vswitchd/vswitch.xml | 52 ++++++ > > > > 3 files changed, 498 insertions(+), 13 deletions(-) > > > > > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index > > > > e8ef4b5..dac70cd > > > > 100644 > > > > --- a/INSTALL.DPDK.md > > > > +++ b/INSTALL.DPDK.md > > > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd: > > > > ./ovs-ofctl add-flow br0 in_port=2,action=output:1 > > > > ``` > > > > > > > > +8. QoS usage example > > > > + > > > > + Assuming you have a vhost-user port transmitting traffic > > > consisting of > > > > + packets of size 64 bytes, the following command would limit > > > > + the > > > egress > > > > + transmission rate of the port to ~1,000,000 packets per > second: > > > > + > > > > + `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos > > > > + create > > > qos > > > > + type=egress-policer other-config:cir=46000000 > > > > + other-config:cbs=2048` > > > > + > > > > + To examine the QoS configuration of the port: > > > > + > > > > + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` > > > > + > > > > + To clear the QoS configuration from the port and ovsdb use the > > > following: > > > > + > > > > + `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port > > > > + vhost-user0 qos` > > > > + > > > > + For more details regarding egress-policer parameters please > > > > + refer > > > to the > > > > + vswitch.xml. > > > > + > > > > Performance Tuning: > > > > ------------------- > > > > > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > > > 09ccc2c..716da79 100644 > > > > --- a/lib/netdev-dpdk.c > > > > +++ b/lib/netdev-dpdk.c > > > > @@ -44,6 +44,7 @@ > > > > #include "ovs-rcu.h" > > > > #include "packets.h" > > > > #include "shash.h" > > > > +#include "smap.h" > > > > #include "sset.h" > > > > #include "unaligned.h" > > > > #include "timeval.h" > > > > @@ -52,12 +53,14 @@ > > > > > > > > #include "rte_config.h" > > > > #include "rte_mbuf.h" > > > > +#include "rte_meter.h" > > > > #include "rte_virtio_net.h" > > > > > > > > VLOG_DEFINE_THIS_MODULE(dpdk); > > > > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > > > > > > > > #define DPDK_PORT_WATCHDOG_INTERVAL 5 > > > > +#define DPDK_MAX_QOS_NAME_SIZE 10 > > Noted, will expand to 20 for the moment unless you think it should be > higher? > > I'd rather take that out completely. >
Done, Daniele had a similar comment and it's not needed. > > > > + > > > > +/* Action that can be set for a packet for rte_meter */ enum > > > > +egress_policer_action { > > > > + GREEN = e_RTE_METER_GREEN, > > > > + DROP = 1, > > > > +}; > > > > > > I don't think it's a good idea to mix external and internal > > > declarations because if the external one changes, it will break > > > here. Since GREEN and DROP have meanings only inside of OVS, we > should define our own values. > > > See my comment where this is used. > > > > > Just to clarify, is this in relation to the use of e_RTE_METER_GREEN? > > Is it prefereable for GREEN to be equal to our own declaration? > > We can use e_RTE_METER_GREEN and we should do it to be consistent with > the rte function. My concern is having two declarations: one coming > from outside and another that you defined to a fixed number. So, what > happens if e_RTE_METER_GREEN gets shifted and now its value is 1? > > But with the simplification I proposed for egress_policer_pkt_handle__() > that enum isn't needed and we can take it out completely. > Agreed, I've tested your suggestion and it works fine + is cleaner. > > > > > static inline void > > > > netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats, > > > > struct dp_packet **packets, > > > > @@ > > > > -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, > > > > int > > > qid, > > > > struct virtio_net *virtio_dev = > > > netdev_dpdk_get_virtio(vhost_dev); > > > > struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; > > > > unsigned int total_pkts = cnt; > > > > + unsigned int qos_pkts = cnt; > > > > uint64_t start = 0; > > > > > > > > if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6 > > > > +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > > > > +qid, > > > > rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock); > > > > } > > > > > > > > + /* Check has QoS has been configured for the netdev */ > > > > + cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt); > > > > + qos_pkts -= cnt; > > > > > > I don't see how this is working because egress_policer_alg_process() > > > will loop in the array one by one freeing or not the packet. > > > If it does nothing, then the array is good but otherwise it creates > > > a problem because the array still references freed entries. So in > > > loop below we are sending a mix of good and freed buffers up to > 'cnt'. > > > > > So we should never have freed packets included in cnt as it will cause > > a segfault. Consider when an array of size cnt is passed to the > > netdev_dpdk_qos_process(). The timestamp used to update the token > > bucket is taken before the for loop is entered. This way the token > > bucket is only updated once for each batch of packets of size cnt. > > The loop cycles through the array, for any packet that is green the > > token bucket decremented. If a packet is dropped then the bucket has > > been exhausted. The token bucket will not be updated again within the > > run of the loop. Any packets remaining in the array will be dropped > > (i.e. freed) as the bucket is exhausted. The value of cnt will only be > > the number of pkts that were not dropped (i.e. not freed). This value > > will then be passed to the send function ensuring no freed packets are > > present in the array that is to be processed. If the token bucket > > updated between each individual packet being processed then you could > > have the situation you describe, by processing them as a batch with a > > single update beforehand you avoid it. > > I haven't looked into the rte implementation yet so I can't tell if the > decision to drop or not the packet is based solely on the timestamp > which doesn't change during the batch processing. > > Assuming that is the case I agree with you that the cnt would have only > the first valid entries to be sent out. > > My concern is when we are processing a batch with two packets. > One big packet and another small one. While processing the big packet, > there is no enough tokens and the packet is marked as RED/DROPPED. Then > the loop gets the second small packet, but now some enough tokens were > added/still available in the bucket which passes the small packet. > Apologies, I was looking at this the wrong way. I understand your point now. Let me think about this. And I'll post a rough solution here before submitting. > > > > > + > > > > +static inline int > > > > +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > > > > + struct rte_mbuf *pkt, uint64_t time) > > > > +{ > > > > + uint8_t output_color; > > > > + uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct > > > ether_hdr); > > > > + enum egress_policer_action action = GREEN; > > > > + > > > > + /* color input is not used for blind modes */ > > > > + output_color = (uint8_t) > > > > rte_meter_srtcm_color_blind_check(meter, > > > time, > > > > + > > > pkt_len); > > > > + > > > > + /* Check output color, 0 corresponds to GREEN i.e. packet > > > > can be > > > > + * transmitted. If greater than 0 then action is set to > > > > DROP. */ > > > > + if (output_color) { > > > > + action = DROP; > > > > + } > > > > > > > > > This could be simpler: > > > > > > static inline int > > > egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > > > struct rte_mbuf *pkt, uint64_t time) { > > > uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct > > > ether_hdr); > > > > > > /* If GREEN, accept it otherwise drop it */ > > > if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) != > > > e_RTE_METER_GREEN) { > > > return 1; > > > } > > > > > > return 0; > > > } > > > > > > > > Agreed this is simpler/better. In relation to the external declaration > > comment above, Is it preferable not to use e_RTE_METER_GREEN here and > > instead have a value defined in OVS? > > We should use it to be consistent with the rte implementation. Done. > > Thanks, > -- > fbl _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev