> -----Original Message----- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Remy Horton > Sent: Monday, January 16, 2017 4:20 PM > To: dev@dpdk.org > Cc: Pattan, Reshma <reshma.pat...@intel.com>; Thomas Monjalon > <thomas.monja...@6wind.com> > Subject: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency stats > > From: Reshma Pattan <reshma.pat...@intel.com> > > Add a library designed to calculate latency statistics and report them > to the application when queried. The library measures minimum, average and > maximum latencies, and jitter in nano seconds. The current implementation > supports global latency stats, i.e. per application stats. > > Signed-off-by: Reshma Pattan <reshma.pat...@intel.com> > Signed-off-by: Remy Horton <remy.hor...@intel.com> > ---
This patch addresses a few changes - and should probably be split out a bit more than the v6 -> v7 changes done. In particular, the mbuf change should be its own commit to make it visible in the git log. Detailed comments inline. > diff --git a/MAINTAINERS b/MAINTAINERS > index 6cd9896..0a41fe5 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -714,3 +714,7 @@ F: examples/tep_termination/ > F: examples/vmdq/ > F: examples/vmdq_dcb/ > F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst > + > +Latency Stats > +M: Reshma Pattan <reshma.pat...@intel.com> > +F: lib/librte_latencystats/ Wrong section? This should be added to the "other libraries" I think. > +++ b/lib/librte_latencystats/Makefile > @@ -0,0 +1,57 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2016 Intel Corporation. All rights reserved. > +# All rights reserved. -2017 > +++ b/lib/librte_latencystats/rte_latencystats.c > @@ -0,0 +1,389 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * All rights reserved. -2017 > + > +/** Nano seconds per second */ > +#define NS_PER_SEC 1E9 > + > +/** Clock cycles per nano second */ > +#define CYCLES_PER_NS (rte_get_timer_hz() / NS_PER_SEC) Concidering this calls a function, it should probably be a static inline function instead of a #define, so it is obvious this cannot be used for static initialization. > + > +/* Macros for printing using RTE_LOG */ > +#define RTE_LOGTYPE_LATENCY_STATS RTE_LOGTYPE_USER1 > + > +static pthread_t latency_stats_thread; Concerned about the pthread - see below. > +static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats"; > +static int latency_stats_index; > +static uint64_t samp_intvl; > +static uint64_t timer_tsc; > +static uint64_t prev_tsc; > + > +static struct rte_latency_stats { > + float min_latency; /**< Minimum latency in nano seconds */ > + float avg_latency; /**< Average latency in nano seconds */ > + float max_latency; /**< Maximum latency in nano seconds */ > + float jitter; /** Latency variation */ > +} *glob_stats; Style guide advises against this type of initialization IIRC; http://dpdk.org/doc/guides/contributing/coding_style.html#structure-declarations Its not very clear, but I do think a separate line makes the variable declaration on its own line more obvious: static struct rte_latency_stats *glob_stats; > + > +static struct rxtx_cbs { > + struct rte_eth_rxtx_callback *cb; > +} rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT], > + tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; Same as previous comment - this struct looks very confusing to me :) Please declare explicitly on own lines: static struct rxtx_cbs { struct rte_eth_rxtx_callback *cb; }; static struct rxtx_cbs rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; static struct rxtx_cbs tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; > + > +struct latency_stats_nameoff { > + char name[RTE_ETH_XSTATS_NAME_SIZE]; > + unsigned int offset; > +}; > + > +static const struct latency_stats_nameoff lat_stats_strings[] = { > + {"min_latency_ns", offsetof(struct rte_latency_stats, min_latency)}, > + {"avg_latency_ns", offsetof(struct rte_latency_stats, avg_latency)}, > + {"max_latency_ns", offsetof(struct rte_latency_stats, max_latency)}, > + {"jitter_ns", offsetof(struct rte_latency_stats, jitter)}, > +}; > + > +#define NUM_LATENCY_STATS (sizeof(lat_stats_strings) / \ > + sizeof(lat_stats_strings[0])) > + > +static __attribute__((noreturn)) void * > +report_latency_stats(__rte_unused void *arg) > +{ > + for (;;) { > + unsigned int i; > + float *stats_ptr = NULL; > + uint64_t values[NUM_LATENCY_STATS] = {0}; > + int ret; > + > + for (i = 0; i < NUM_LATENCY_STATS; i++) { > + stats_ptr = RTE_PTR_ADD(glob_stats, > + lat_stats_strings[i].offset); > + values[i] = (uint64_t)floor((*stats_ptr)/ > + CYCLES_PER_NS); > + } > + > + ret = rte_metrics_update_metrics(RTE_METRICS_GLOBAL, > + latency_stats_index, > + values, NUM_LATENCY_STATS); > + if (ret < 0) > + RTE_LOG(INFO, LATENCY_STATS, > + "Failed to push the stats\n"); > + } > +} This function is strange - the library creates a pthread and then calls this function internally? This does not look like a good design from my point-of-view. The Timer library may be of use for inspiration, it requires the application polls a function to handle expired timers: void rte_timer_manage(). > +static void > +rte_latencystats_fill_values(struct rte_metric_value *values) > +{ > + unsigned int i; > + float *stats_ptr = NULL; > + > + for (i = 0; i < NUM_LATENCY_STATS; i++) { > + stats_ptr = RTE_PTR_ADD(glob_stats, > + lat_stats_strings[i].offset); > + values[i].key = i; > + values[i].value = (uint64_t)floor((*stats_ptr)/ > + CYCLES_PER_NS); > + } > +} > + > +static uint16_t > +add_time_stamps(uint8_t pid __rte_unused, > + uint16_t qid __rte_unused, > + struct rte_mbuf **pkts, > + uint16_t nb_pkts, > + uint16_t max_pkts __rte_unused, > + void *user_cb __rte_unused) > +{ > + unsigned int i; > + uint64_t diff_tsc, now; > + > + /* > + * For every sample interval, > + * time stamp is marked on one received packet. > + */ > + now = rte_rdtsc(); > + for (i = 0; i < nb_pkts; i++) { > + diff_tsc = now - prev_tsc; > + timer_tsc += diff_tsc; > + if (timer_tsc >= samp_intvl) { > + /* > + * TBD: Mark the timestamp only > + * if not already marked by the > + * hardware or the PMD. > + */ > + pkts[i]->timestamp = now; > + timer_tsc = 0; > + } > + prev_tsc = now; > + now = rte_rdtsc(); > + } > + > + return nb_pkts; > +} > + > +static uint16_t > +calc_latency(uint8_t pid __rte_unused, > + uint16_t qid __rte_unused, > + struct rte_mbuf **pkts, > + uint16_t nb_pkts, > + void *_ __rte_unused) > +{ > + unsigned int i, cnt = 0; > + uint64_t now; > + float latency[nb_pkts]; > + static float prev_latency; > + /* > + * Alpha represents degree of weighting decrease in EWMA, > + * a constant smoothing factor between 0 and 1. The value > + * is used below for measuring average latency. > + */ > + const float alpha = 0.2; > + > + now = rte_rdtsc(); > + for (i = 0; i < nb_pkts; i++) { > + if (pkts[i]->timestamp) > + latency[cnt++] = now - pkts[i]->timestamp; > + } > + > + for (i = 0; i < cnt; i++) { > + /* > + * The jitter is calculated as statistical mean of interpacket > + * delay variation. The "jitter estimate" is computed by taking > + * the absolute values of the ipdv sequence and applying an > + * exponential filter with parameter 1/16 to generate the > + * estimate. i.e J=J+(|D(i-1,i)|-J)/16. Where J is jitter, > + * D(i-1,i) is difference in latency of two consecutive packets > + * i-1 and i. > + * Reference: Calculated as per RFC 5481, sec 4.1, > + * RFC 3393 sec 4.5, RFC 1889 sec. > + */ > + glob_stats->jitter += (abs(prev_latency - latency[i]) > + - glob_stats->jitter)/16; > + if (glob_stats->min_latency == 0) > + glob_stats->min_latency = latency[i]; > + else if (latency[i] < glob_stats->min_latency) > + glob_stats->min_latency = latency[i]; > + else if (latency[i] > glob_stats->max_latency) > + glob_stats->max_latency = latency[i]; > + /* > + * The average latency is measured using exponential moving > + * average, i.e. using EWMA > + * https://en.wikipedia.org/wiki/Moving_average > + */ > + glob_stats->avg_latency += > + alpha * (latency[i] - glob_stats->avg_latency); > + prev_latency = latency[i]; > + } > + > + return nb_pkts; > +} > + > +int > +rte_latencystats_init(uint64_t samp_intvl, > + rte_latency_stats_flow_type_fn user_cb) > +{ > + unsigned int i; > + uint8_t pid; > + uint16_t qid; > + struct rxtx_cbs *cbs = NULL; > + const uint8_t nb_ports = rte_eth_dev_count(); > + const char *ptr_strings[NUM_LATENCY_STATS] = {0}; > + const struct rte_memzone *mz = NULL; > + const unsigned int flags = 0; > + > + /** Allocate stats in shared memory fo muliti process support */ > + mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*glob_stats), > + rte_socket_id(), flags); > + if (mz == NULL) { > + RTE_LOG(ERR, LATENCY_STATS, "Cannot reserve memory: %s:%d\n", > + __func__, __LINE__); > + return -ENOMEM; > + } There is no check for double-initialization of the library - although it is an application error to initialize twice, the library should handle it. > + > + glob_stats = mz->addr; > + samp_intvl *= CYCLES_PER_NS; > + > + /** Register latency stats with stats library */ > + for (i = 0; i < NUM_LATENCY_STATS; i++) > + ptr_strings[i] = lat_stats_strings[i].name; > + > + latency_stats_index = rte_metrics_reg_metrics(ptr_strings, > + NUM_LATENCY_STATS); > + if (latency_stats_index < 0) { > + RTE_LOG(DEBUG, LATENCY_STATS, > + "Failed to register latency stats names\n"); > + return -1; > + } > + > + /** Register Rx/Tx callbacks */ > + for (pid = 0; pid < nb_ports; pid++) { > + struct rte_eth_dev_info dev_info; > + rte_eth_dev_info_get(pid, &dev_info); > + for (qid = 0; qid < dev_info.nb_rx_queues; qid++) { > + cbs = &rx_cbs[pid][qid]; > + cbs->cb = rte_eth_add_first_rx_callback(pid, qid, > + add_time_stamps, user_cb); > + if (!cbs->cb) > + RTE_LOG(INFO, LATENCY_STATS, "Failed to " > + "register Rx callback for pid=%d, " > + "qid=%d\n", pid, qid); > + } > + for (qid = 0; qid < dev_info.nb_tx_queues; qid++) { > + cbs = &tx_cbs[pid][qid]; > + cbs->cb = rte_eth_add_tx_callback(pid, qid, > + calc_latency, user_cb); > + if (!cbs->cb) > + RTE_LOG(INFO, LATENCY_STATS, "Failed to " > + "register Tx callback for pid=%d, " > + "qid=%d\n", pid, qid); > + } > + } > + > + int ret = 0; > + char thread_name[RTE_MAX_THREAD_NAME_LEN]; > + > + /** Create the host thread to update latency stats to stats library */ > + ret = pthread_create(&latency_stats_thread, NULL, report_latency_stats, > + NULL); > + if (ret != 0) { > + RTE_LOG(ERR, LATENCY_STATS, > + "Failed to create the latency stats thread:%s, %s:%d\n", > + strerror(errno), __func__, __LINE__); > + return -1; > + } This pthread_create() raises some flags, spawning a thread "behind the applications back" isn't a good plan. We should require a thread poll the latency stats, not create a thread ourselves. See note above on using an application thread. > + /** Set thread_name for aid in debugging */ > + snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "latency-stats-thread"); > + ret = rte_thread_setname(latency_stats_thread, thread_name); > + if (ret != 0) > + RTE_LOG(DEBUG, LATENCY_STATS, > + "Failed to set thread name for latency stats > handling\n"); > + > + return 0; > +} > + > +int > +rte_latencystats_uninit(void) > +{ > + uint8_t pid; > + uint16_t qid; > + int ret = 0; > + struct rxtx_cbs *cbs = NULL; > + const uint8_t nb_ports = rte_eth_dev_count(); > + > + /** De register Rx/Tx callbacks */ > + for (pid = 0; pid < nb_ports; pid++) { > + struct rte_eth_dev_info dev_info; > + rte_eth_dev_info_get(pid, &dev_info); > + for (qid = 0; qid < dev_info.nb_rx_queues; qid++) { > + cbs = &rx_cbs[pid][qid]; > + ret = rte_eth_remove_rx_callback(pid, qid, cbs->cb); > + if (ret) > + RTE_LOG(INFO, LATENCY_STATS, "failed to " > + "remove Rx callback for pid=%d, " > + "qid=%d\n", pid, qid); > + } > + for (qid = 0; qid < dev_info.nb_tx_queues; qid++) { > + cbs = &tx_cbs[pid][qid]; > + ret = rte_eth_remove_tx_callback(pid, qid, cbs->cb); > + if (ret) > + RTE_LOG(INFO, LATENCY_STATS, "failed to " > + "remove Tx callback for pid=%d, " > + "qid=%d\n", pid, qid); > + } > + } > + > + /** Cancel the thread */ > + ret = pthread_cancel(latency_stats_thread); > + if (ret != 0) { > + RTE_LOG(ERR, LATENCY_STATS, > + "Failed to cancel latency stats update thread:" > + "%s,%s:%d\n", > + strerror(errno), __func__, __LINE__); > + return -1; > + } > + > + return 0; > +} > + > +int > +rte_latencystats_get_names(struct rte_metric_name *names, uint16_t size) > +{ > + unsigned int i; > + > + if (names == NULL || size < NUM_LATENCY_STATS) > + return NUM_LATENCY_STATS; > + > + for (i = 0; i < NUM_LATENCY_STATS; i++) > + snprintf(names[i].name, sizeof(names[i].name), > + "%s", lat_stats_strings[i].name); > + > + return NUM_LATENCY_STATS; > +} > + > +int > +rte_latencystats_get(struct rte_metric_value *values, uint16_t size) > +{ > + if (size < NUM_LATENCY_STATS || values == NULL) > + return NUM_LATENCY_STATS; > + > + if (rte_eal_process_type() == RTE_PROC_SECONDARY) { > + const struct rte_memzone *mz; > + mz = rte_memzone_lookup(MZ_RTE_LATENCY_STATS); > + if (mz == NULL) { > + RTE_LOG(ERR, LATENCY_STATS, > + "Latency stats memzone not found\n"); > + return -ENOMEM; > + } > + glob_stats = mz->addr; > + } > + > + /* Retrieve latency stats */ > + rte_latencystats_fill_values(values); > + > + return NUM_LATENCY_STATS; > +} > diff --git a/lib/librte_latencystats/rte_latencystats.h > b/lib/librte_latencystats/rte_latencystats.h > new file mode 100644 > index 0000000..405b878 > --- /dev/null > +++ b/lib/librte_latencystats/rte_latencystats.h > @@ -0,0 +1,146 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _RTE_LATENCYSTATS_H_ > +#define _RTE_LATENCYSTATS_H_ > + > +/** > + * @file > + * RTE latency stats > + * > + * library to provide application and flow based latency stats. > + */ > + > +#include <rte_metrics.h> > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * Note: This function pointer is for future flow based latency stats > + * implementation. > + * > + * Function type used for identifting flow types of a Rx packet. > + * > + * The callback function is called on Rx for each packet. > + * This function is used for flow based latency calculations. > + * > + * @param pkt > + * Packet that has to be identified with its flow types. > + * @param user_param > + * The arbitrary user parameter passed in by the application when > + * the callback was originally configured. > + * @return > + * The flow_mask, representing the multiple flow types of a packet. > + */ > +typedef uint16_t (*rte_latency_stats_flow_type_fn)(struct rte_mbuf *pkt, > + void *user_param); > + > +/** > + * Registers Rx/Tx callbacks for each active port, queue. > + * > + * @param samp_intvl > + * Sampling time period in nano seconds, at which packet > + * should be marked with time stamp. > + * @param user_cb > + * Note: This param is for future flow based latency stats > + * implementation. > + * User callback to be called to get flow types of a packet. > + * Used for flow based latency calculation. > + * If the value is NULL, global stats will be calculated, > + * else flow based latency stats will be calculated. > + * For now just pass on the NULL value to this param. > + * @return > + * -1 : On error > + * -ENOMEM: On error > + * 0 : On success > + */ > +int rte_latencystats_init(uint64_t samp_intvl, > + rte_latency_stats_flow_type_fn user_cb); > + > +/** > + * Removes registered Rx/Tx callbacks for each active port, queue. > + * > + * @return > + * -1: On error > + * 0: On success > + */ > +int rte_latencystats_uninit(void); > + > +/** > + * Retrieve names of latency statistics > + * > + * @param names > + * Block of memory to insert names into. Must be at least size in capacity. > + * If set to NULL, function returns required capacity. > + * @param size > + * Capacity of latency stats names (number of names). > + * @return > + * - positive value lower or equal to size: success. The return value > + * is the number of entries filled in the stats table. > + * - positive value higher than size: error, the given statistics table > + * is too small. The return value corresponds to the size that should > + * be given to succeed. The entries in the table are not valid and > + * shall not be used by the caller. > + */ > +int rte_latencystats_get_names(struct rte_metric_name *names, > + uint16_t size); > + > +/** > + * Retrieve latency statistics. > + * > + * @param values > + * A pointer to a table of structure of type *rte_metric_value* > + * to be filled with latency statistics ids and values. > + * This parameter can be set to NULL if size is 0. > + * @param size > + * The size of the stats table, which should be large enough to store > + * all the latency stats. > + * @return > + * - positive value lower or equal to size: success. The return value > + * is the number of entries filled in the stats table. > + * - positive value higher than size: error, the given statistics table > + * is too small. The return value corresponds to the size that should > + * be given to succeed. The entries in the table are not valid and > + * shall not be used by the caller. > + * -ENOMEM: On failure. > + */ > +int rte_latencystats_get(struct rte_metric_value *values, > + uint16_t size); > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_LATENCYSTATS_H_ */ > diff --git a/lib/librte_latencystats/rte_latencystats_version.map > b/lib/librte_latencystats/rte_latencystats_version.map > new file mode 100644 > index 0000000..502018e > --- /dev/null > +++ b/lib/librte_latencystats/rte_latencystats_version.map > @@ -0,0 +1,10 @@ > +DPDK_17.02 { > + global: > + > + rte_latencystats_get; > + rte_latencystats_get_names; > + rte_latencystats_init; > + rte_latencystats_uninit; > + > + local: *; > +}; > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index bfce9f4..c35ba0a 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -512,6 +512,9 @@ struct rte_mbuf { > > /** Timesync flags for use with IEEE1588. */ > uint16_t timesync; > + > + /** Timestamp for measuring latency. */ > + uint64_t timestamp; > } __rte_cache_aligned; > This change needs its own patch, so it can be reviewed independently and the position in the struct for timestamp can be analysed easily. > /** > diff --git a/mk/rte.app.mk b/mk/rte.app.mk > index 6aac5ac..1d36fad 100644 > --- a/mk/rte.app.mk > +++ b/mk/rte.app.mk > @@ -100,7 +100,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE) += > -lrte_cmdline > _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE) += -lrte_cfgfile > _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS) += -lrte_metrics > _LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE) += -lrte_bitratestats > - > +_LDLIBS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += -lrte_latencystats > > _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += -lrte_pmd_bond > _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += -lrte_pmd_xenvirt -lxenstore > -- > 2.5.5