> -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming > Sent: Sunday, October 12, 2014 12:11 PM > To: Neil Horman > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx > cycles/packet > > Hi Neil, > > Very appreciate your comments. > I add inline reply, will send v3 asap when we get alignment. > > BRs, > Liang Cunming > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > Sent: Saturday, October 11, 2014 1:52 AM > > To: Liang, Cunming > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx > cycles/packet > > > > On Fri, Oct 10, 2014 at 08:29:58PM +0800, Cunming Liang wrote: > > > It provides unit test to measure cycles/packet in NIC loopback mode. > > > It simply gives the average cycles of IO used per packet without test > equipment. > > > When doing the test, make sure the link is UP. > > > > > > Usage Example: > > > 1. Run unit test app in interactive mode > > > app/test -c f -n 4 -- -i > > > 2. Run and wait for the result > > > pmd_perf_autotest > > > > > > There's option to choose rx/tx pair, default is vector. > > > set_rxtx_mode [vector|scalar|full|hybrid] > > > Note: To get acurate scalar fast, please choose 'vector' or 'hybrid' > > > without > > INC_VEC=y in config > > > > > > Signed-off-by: Cunming Liang <cunming.liang at intel.com> > > > Acked-by: Bruce Richardson <bruce.richardson at intel.com> > > > > Notes inline > > > > > --- > > > app/test/Makefile | 1 + > > > app/test/commands.c | 38 +++ > > > app/test/packet_burst_generator.c | 4 +- > > > app/test/test.h | 4 + > > > app/test/test_pmd_perf.c | 626 > > +++++++++++++++++++++++++++++++++++ > > > lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 6 + > > > 6 files changed, 677 insertions(+), 2 deletions(-) > > > create mode 100644 app/test/test_pmd_perf.c > > > > > > diff --git a/app/test/Makefile b/app/test/Makefile > > > index 6af6d76..ebfa0ba 100644 > > > --- a/app/test/Makefile > > > +++ b/app/test/Makefile > > > @@ -56,6 +56,7 @@ SRCS-y += test_memzone.c > > > > > > SRCS-y += test_ring.c > > > SRCS-y += test_ring_perf.c > > > +SRCS-y += test_pmd_perf.c > > > > > > ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) > > > SRCS-y += test_table.c > > > diff --git a/app/test/commands.c b/app/test/commands.c > > > index a9e36b1..f1e746e 100644 > > > --- a/app/test/commands.c > > > +++ b/app/test/commands.c > > > @@ -310,12 +310,50 @@ cmdline_parse_inst_t cmd_quit = { > > > > > > +#define NB_ETHPORTS_USED (1) > > > +#define NB_SOCKETS (2) > > > +#define MEMPOOL_CACHE_SIZE 250 > > > +#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + > > RTE_PKTMBUF_HEADROOM) > > Don't you want to size this in accordance with the amount of data your > sending > > (64 Bytes as noted above)? > [Liang, Cunming] The case is designed to measure small packet IO cost with > normal mbuf size. > Even if decreasing the size, it won't gain significant cycles. > > > > > +static void > > > +print_ethaddr(const char *name, const struct ether_addr *eth_addr) > > > +{ > > > + printf("%s%02X:%02X:%02X:%02X:%02X:%02X", name, > > > + eth_addr->addr_bytes[0], > > > + eth_addr->addr_bytes[1], > > > + eth_addr->addr_bytes[2], > > > + eth_addr->addr_bytes[3], > > > + eth_addr->addr_bytes[4], > > > + eth_addr->addr_bytes[5]); > > > +} > > > + > > This was copieed from print_ethaddr. Seems like a good candidate for a > common > > function in rte_ether.h > [Liang, Cunming] Agree with you, some of samples now use it with the same > copy. > I'll rework it. Adding 'ether_format_addr' in rte_ether.h only for format the > 48bits address output. > And leaving other prints for application customization. > > > > > > > +} > > > + > > > +static void > > > +signal_handler(int signum) > > > +{ > > > + /* When we receive a USR1 signal, print stats */ > > I think you mean SIGUSR2, below, SIGUSR1 tears the test down and exits > the > > program > [Liang, Cunming] Thanks, it's a typo. > > > > > + if (signum == SIGUSR1) { > > SIGINT instead. Thats the common practice. > [Liang, Cunming] I understood your opinion. > The considerations I'm not using SIGINT instead are: > 1. We unset ISIG in c_lflag of term. CRTL+C won't trigger SIGINT in command > interactive. > It always has to explicitly send signal. No matter SIGUSR1 or SIGINT. > 2. By SIGINT semantic, expect to terminate the process. > Here I expect to force stop this case, but still alive in command line. > After it stopped, it can run again or start to run other test cases. > So I keep SIGINT, SIGUSR1 in different behavior. > 3. It should be rarely used. > Only when exception timeout, I leave this backdoor for automation test > control. > For manual test, we can easily force kill the process. > > > > > > + printf("Force Stop!\n"); > > > + stop = 1; > > > + } > > > + if (signum == SIGUSR2) > > > + stats_display(0); > > > +} > > > +/* main processing loop */ > > > +static int > > > +main_loop(__rte_unused void *args) > > > +{ > > > +#define PACKET_SIZE 64 > > > +#define FRAME_GAP 12 > > > +#define MAC_PREAMBLE 8 > > > + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > > > + unsigned lcore_id; > > > + unsigned i, portid, nb_rx = 0, nb_tx = 0; > > > + struct lcore_conf *conf; > > > + uint64_t prev_tsc, cur_tsc; > > > + int pkt_per_port; > > > + uint64_t packets_per_second, total_packets; > > > + > > > + lcore_id = rte_lcore_id(); > > > + conf = &lcore_conf[lcore_id]; > > > + if (conf->status != LCORE_USED) > > > + return 0; > > > + > > > + pkt_per_port = MAX_TRAFIC_BURST / conf->nb_ports; > > > + > > > + int idx = 0; > > > + for (i = 0; i < conf->nb_ports; i++) { > > > + int num = pkt_per_port; > > > + portid = conf->portlist[i]; > > > + printf("inject %d packet to port %d\n", num, portid); > > > + while (num) { > > > + nb_tx = RTE_MIN(MAX_PKT_BURST, num); > > > + nb_tx = rte_eth_tx_burst(portid, 0, > > > + &tx_burst[idx], nb_tx); > > > + num -= nb_tx; > > > + idx += nb_tx; > > > + } > > > + } > > > + printf("Total packets inject to prime ports = %u\n", idx); > > > + > > > + packets_per_second = (link_mbps * 1000 * 1000) / > > > + +((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * > CHAR_BIT); > > > + printf("Each port will do %"PRIu64" packets per second\n", > > > + +packets_per_second); > > > + > > > + total_packets = RTE_TEST_DURATION * conf->nb_ports * > > packets_per_second; > > > + printf("Test will stop after at least %"PRIu64" packets received\n", > > > + + total_packets); > > > + > > > + prev_tsc = rte_rdtsc(); > > > + > > > + while (likely(!stop)) { > > > + for (i = 0; i < conf->nb_ports; i++) { > > > + portid = conf->portlist[i]; > > > + nb_rx = rte_eth_rx_burst((uint8_t) portid, 0, > > > + pkts_burst, > MAX_PKT_BURST); > > > + if (unlikely(nb_rx == 0)) { > > > + idle++; > > > + continue; > > > + } > > > + > > > + count += nb_rx; > > Doesn't take into consideration error conditions. rte_eth_rx_burst can > return > > -ENOTSUP > [Liang, Cunming] It returns -ENOTSUP when turning on ETHDEV_DEBUG > CONFIG. > The error is used to avoid no function call register. > When ETHDEV_DEBUG turn off, the NULL function call cause segfault > directly. > So I think it's a library internal error. > In such library exceptional case, I prefer not expecting sample/application to > condition check library functional error. > > > > > + nb_tx = rte_eth_tx_burst(portid, 0, pkts_burst, > nb_rx); > > Ditto with -ENOTSUP > > > > > + if (unlikely(nb_tx < nb_rx)) { > > What makes this unlikely? Seems like a perfectly reasonable condition to > happen > > to me. If the network is busy, its completely likely that you will receive > more > > frames than you send, if you elect to receive all frames. > [Liang, Cunming] For this case, NIC works in MAC loopback mode. > It firstly injects numbers of packets to NIC. Then NIC will loopback all > packets > to ingress side. > The code here will receive the packets, and send them back to NIC > Packets looping inside all come from initial injection. > As the total number of injected packets is much less than in-chip queue size, > the tx egress queue shouldn't block desc. ring update. > So when receiving packets in line rate and nb_tx < nb_rx, it means tx cannot > archive line rate. > When it happens, the cycles/packets result make no sense, as the bottle > neck is NIC. > The drop counter can record it. > > > > > + drop += (nb_rx - nb_tx); > > > + do { > > > + > rte_pktmbuf_free(pkts_burst[nb_tx]); > > Defer this, it skews your timing > [Liang, Cunming] Agree with you, I ever thought about it. > This test cases is designed to measure pure IO RX/TX routine. > When doing tx burst with result nb_tx < nb_rx, we either repeat re-send or > drop it. > Each way introduces noise(adding additional control code), resending much > times even cost more than free it. > The cycles/packets is useful when there's no packet drop, otherwise it gives > the hint where the problem comes from (by idle or drop). > > > > > + } while (++nb_tx < nb_rx); > > > + } > > > + } > > > + if (unlikely(count >= total_packets)) > > > + break; > > Whats the reasoning here? Do you only ever expect to receive frames that > you > > send? If so, seems like this should call for a big warning to get printed. > [Liang, Cunming] The loop exits when the pre-calculated total_packets are > received. > As the nb_rx is unpredictable, the packet counter may large equal than > total_packets the last time. > The reason unlikely used here is because the condition becomes true only > the last time. > > > > > + } > > > + > > > + cur_tsc = rte_rdtsc(); > > > + > > > + for (i = 0; i < conf->nb_ports; i++) { > > > + portid = conf->portlist[i]; > > > + int nb_free = pkt_per_port; > > > + do { /* dry out */ > > > + nb_rx = rte_eth_rx_burst((uint8_t) portid, 0, > > > + pkts_burst, > MAX_PKT_BURST); > > > + nb_tx = 0; > > > + while (nb_tx < nb_rx) > > > + rte_pktmbuf_free(pkts_burst[nb_tx++]); > > > + nb_free -= nb_rx; > > > + } while (nb_free != 0); > > > + printf("free %d mbuf left in port %u\n", pkt_per_port, > portid); > > > + } > > > + > > Whats the purpose of this? Are you trying to flush the device? Wouldn't it > be > > enough just to stop the interface? > [Liang, Cunming] If we only run the cases once and exit, it doesn't matter. > But it's designed to run multi-times without exit, for the purpose of warming > up or for getting average number. > So stopping device is not enough, we have to release the flying packets. > > > > > > + if (count == 0) > > > + return -1; > > > + > > > + printf("%lu packet, %lu drop, %lu idle\n", count, drop, idle); > > > + printf("Result: %ld cycles per packet\n", (cur_tsc - prev_tsc) / count); > > > + > > Bad math here. Theres no guarantee that the tsc hasn't wrapped > (potentially > > more than once) depending on your test length. you need to check the tsc > before > > and after each burst and record an average of deltas instead, accounting in > each > > instance for the possibility of wrap. > [Liang, Cunming] I'm not sure catch your point correctly. > I think both cur_tsc and prev_tsc are 64 bits width. > For 3GHz, I think it won't wrapped so quick. > As it's uint64_t, so even get wrapped, the delta should still be correct.
You need to change those %lu to %PRIu64, or you will get a compilation error when using 32-bit targets, since those variables are uint64_t. There are other parts of the code with same problem, as well. > > > > > + return 0; > > > +} > > > + > > > +static int > > > +test_pmd_perf(void) > > > +{ > > > + uint16_t nb_ports, num, nb_lcores, slave_id = (uint16_t)-1; > > > + uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT; > > > + uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT; > > > + uint16_t portid; > > > + uint16_t nb_rx_queue = 1, nb_tx_queue = 1; > > > + int socketid = -1; > > > + int ret; > > > + > > > + printf("Start PMD RXTX cycles cost test.\n"); > > > + > > > + signal(SIGUSR1, signal_handler); > > Again SIGINT here. > > > > > + signal(SIGUSR2, signal_handler); > > > + > > > + nb_ports = rte_eth_dev_count(); > > > + if (nb_ports < NB_ETHPORTS_USED) { > > > + printf("At least %u port(s) used for perf. test\n", > > > + NB_ETHPORTS_USED); > > > + return -1; > > > + } > > > + > > > + if (nb_ports > RTE_MAX_ETHPORTS) > > > + nb_ports = RTE_MAX_ETHPORTS; > > > + > > > + nb_lcores = rte_lcore_count(); > > > + > > > + memset(lcore_conf, 0, sizeof(lcore_conf)); > > > + init_lcores(); > > > + > > > + init_mbufpool(NB_MBUF); > > > + > > > + reset_count(); > > > + num = 0; > > > + for (portid = 0; portid < nb_ports; portid++) { > > > + if (socketid == -1) { > > > + socketid = rte_eth_dev_socket_id(portid); > > > + slave_id = alloc_lcore(socketid); > > > + if (slave_id == (uint16_t)-1) { > > > + printf("No avail lcore to run test\n"); > > > + return -1; > > > + } > > > + printf("Performance test runs on lcore %u socket > %u\n", > > > + slave_id, socketid); > > > + } > > > + > > > + if (socketid != rte_eth_dev_socket_id(portid)) { > > > + printf("Skip port %d\n", portid); > > > + continue; > > > + } > > > + > > > + /* port configure */ > > > + ret = rte_eth_dev_configure(portid, nb_rx_queue, > > > + nb_tx_queue, &port_conf); > > > + if (ret < 0) > > > + rte_exit(EXIT_FAILURE, > > > + "Cannot configure device: err=%d, > port=%d\n", > > > + ret, portid); > > > + > > > + rte_eth_macaddr_get(portid, &ports_eth_addr[portid]); > > > + printf("Port %u ", portid); > > > + print_ethaddr("Address:", &ports_eth_addr[portid]); > > > + printf("\n"); > > > + > > > + /* tx queue setup */ > > > + ret = rte_eth_tx_queue_setup(portid, 0, nb_txd, > > > + socketid, &tx_conf); > > > + if (ret < 0) > > > + rte_exit(EXIT_FAILURE, > > > + "rte_eth_tx_queue_setup: err=%d, " > > > + "port=%d\n", ret, portid); > > > + > > > + /* rx queue steup */ > > > + ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, > > > + socketid, &rx_conf, > > > + mbufpool[socketid]); > > > + if (ret < 0) > > > + rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup: > err=%d," > > > + "port=%d\n", ret, portid); > > > + > > > + /* Start device */ > > > + stop = 0; > > > + ret = rte_eth_dev_start(portid); > > > + if (ret < 0) > > > + rte_exit(EXIT_FAILURE, > > > + "rte_eth_dev_start: err=%d, port=%d\n", > > > + ret, portid); > > > + > > > + /* always eanble promiscuous */ > > > + rte_eth_promiscuous_enable(portid); > > > + > > > + lcore_conf[slave_id].portlist[num++] = portid; > > > + lcore_conf[slave_id].nb_ports++; > > > + } > > > + check_all_ports_link_status(nb_ports, RTE_PORT_ALL); > > > + > > > + init_traffic(mbufpool[socketid], tx_burst, MAX_TRAFIC_BURST); > > > + > > > + rte_eal_remote_launch(main_loop, NULL, slave_id); > > > + if (rte_eal_wait_lcore(slave_id) < 0) > > > + return -1; > > > + > > > + /* port tear down */ > > > + for (portid = 0; portid < nb_ports; portid++) { > > > + if (socketid != rte_eth_dev_socket_id(portid)) > > > + continue; > > > + > > > + rte_eth_dev_stop(portid); > > > + } > > > + > > Clean up your allocated memory/lcores/etc? > [Liang, Cunming] It do cleanup on the beginning of case. > "Init_lcores","init_mbufpool","reset_count" guarantee the data clean before > each testing. > And mbufpool only allocated once even if we run multiple times. > > > > Neil