On Mon, Sep 09, 2019 at 10:29:38AM +0200, Marcin Baran wrote: > From: Pawel Modrak <pawelx.mod...@intel.com> > > A new sample app demonstrating use of driver for CBDMA. > The app receives packets, performs software or hardware copy, changes > packets' MAC addresses (if enabled) and forwards them. The patch > includes sample application as well as it's guide. > > Signed-off-by: Pawel Modrak <pawelx.mod...@intel.com> > Signed-off-by: Marcin Baran <marcinx.ba...@intel.com> > ---
Thanks, Pawel and Marcin. Some comments on doc and code inline below. > doc/guides/sample_app_ug/index.rst | 1 + > doc/guides/sample_app_ug/intro.rst | 4 + > doc/guides/sample_app_ug/ioat.rst | 691 +++++++++++++++++++ > examples/Makefile | 3 + > examples/ioat/Makefile | 54 ++ > examples/ioat/ioatfwd.c | 1010 ++++++++++++++++++++++++++++ > examples/ioat/meson.build | 13 + > examples/meson.build | 1 + > 8 files changed, 1777 insertions(+) > create mode 100644 doc/guides/sample_app_ug/ioat.rst create mode > 100644 examples/ioat/Makefile create mode 100644 > examples/ioat/ioatfwd.c create mode 100644 examples/ioat/meson.build > > diff --git a/doc/guides/sample_app_ug/index.rst > b/doc/guides/sample_app_ug/index.rst > index f23f8f59e..a6a1d9e7a 100644 > --- a/doc/guides/sample_app_ug/index.rst > +++ b/doc/guides/sample_app_ug/index.rst > @@ -23,6 +23,7 @@ Sample Applications User Guides > ip_reassembly > kernel_nic_interface > keep_alive > + ioat > l2_forward_crypto > l2_forward_job_stats > l2_forward_real_virtual > diff --git a/doc/guides/sample_app_ug/intro.rst > b/doc/guides/sample_app_ug/intro.rst > index 90704194a..74462312f 100644 > --- a/doc/guides/sample_app_ug/intro.rst > +++ b/doc/guides/sample_app_ug/intro.rst > @@ -91,6 +91,10 @@ examples are highlighted below. > forwarding, or ``l3fwd`` application does forwarding based on Internet > Protocol, IPv4 or IPv6 like a simple router. > > +* :doc:`Hardware packet copying<ioat>`: The Hardware packet copying, > + or ``ioatfwd`` application demonstrates how to use IOAT rawdev > +driver for > + copying packets between two threads. > + > * :doc:`Packet Distributor<dist_app>`: The Packet Distributor > demonstrates how to distribute packets arriving on an Rx port to different > cores for processing and transmission. > diff --git a/doc/guides/sample_app_ug/ioat.rst > b/doc/guides/sample_app_ug/ioat.rst > new file mode 100644 > index 000000000..378d70b81 > --- /dev/null > +++ b/doc/guides/sample_app_ug/ioat.rst > @@ -0,0 +1,691 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright(c) 2019 Intel Corporation. > + > +Sample Application of packet copying using Intel\|reg| QuickData > +Technology > +===================================================================== > +======= You need a space before the |reg| bit otherwise the reg doesn't get the symbol replaced. It should be "Intel\ |reg|". [Marcin] Will fix this in v2 > + > +Overview > +-------- > + > +This sample is intended as a demonstration of the basic components of > +a DPDK forwarding application and example of how to use IOAT driver > +API to make packets copies. > + > +Also while forwarding, the MAC addresses are affected as follows: > + > +* The source MAC address is replaced by the TX port MAC address > + > +* The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID > + > +This application can be used to compare performance of using software > +packet copy with copy done using a DMA device for different sizes of packets. > +The example will print out statistics each second. The stats shows > +received/send packets and packets dropped or failed to copy. > + > +Compiling the Application > +------------------------- > + > +To compile the sample application see :doc:`compiling`. > + > +The application is located in the ``ioat`` sub-directory. > + > + > +Running the Application > +----------------------- > + > +In order to run the hardware copy application, the copying device > +needs to be bound to user-space IO driver. > + > +Refer to the *IOAT Rawdev Driver for Intel\ |reg| QuickData > +Technology* guide for information on using the driver. > + > +The application requires a number of command line options: > + > +.. code-block:: console > + > + ./build/ioatfwd [EAL options] -- -p MASK [-C CT] > + [--[no-]mac-updating] I think the app uses lower case "c" rather than upper case, as called out below. Since the "CT" value can only be one of two possibilities, I think you should explicitly include them, e.g. "[-c <sw|rawdev>]". "rawdev" is also a rather long name for this parameter, why not just call them sw and hw? [Marcin] Yes, it should be lower case. It is my mistake, the doc was not updated before sending patch. Proper guide is already prepared for v2. As for the "rawdev" parameter name, I will correct it. > + > +where, > + > +* p MASK: A hexadecimal bitmask of the ports to configure > + > +* c CT: Performed packet copy type: software (sw) or hardware using > + DMA (rawdev) > + > +* s RS: size of IOAT rawdev ring for hardware copy mode or rte_ring for > + software copy mode > + This parameter is missing from the summary above. [Marcin] Didn't update doc along with code. It is fixed for v2. > +* --[no-]mac-updating: Whether MAC address of packets should be changed > + or not > + > +The application can be launched in 2 different configurations: > + > +* Performing software packet copying > + > +* Performing hardware packet copying Two thoughts here: a) is this not obvious from the parameter list b) is not more that two configurations, given that you can have: * sw copy with mac updating * sw copy without mac updating * etc. not including the possibly port-mask, ring size and single-core vs two core configurations. [Marcin] Good point. I will rephrase that. > + > +Each port needs 2 lcores: one of them receives incoming traffic and > +makes a copy of each packet. The second lcore then updates MAC > +address and sends the copy. For each configuration an additional > +lcore is needed since master lcore in use which is responsible for > +configuration, statistics printing and safe deinitialization of all ports > and devices. > + I believe the app also supports running with 1 or 2 cores total, right? [Marcin]Yes, that's right. Didn't update doc along with code. It is fixed for v2. > +The application can use a maximum of 8 ports. Why this limitation? [Marcin] Seemed reasonable since on testing machine one IOAT device had 8 channels. > + > +To run the application in a Linux environment with 3 lcores (one of > +them is master lcore), 1 port (port 0), software copying and MAC > +updating issue the command: > + > +.. code-block:: console > + > + $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw > + > +To run the application in a Linux environment with 5 lcores (one of > +them is master lcore), 2 ports (ports 0 and 1), hardware copying and > +no MAC updating issue the command: > + > +.. code-block:: console > + > + $ ./build/ioatfwd -l 0-4 -n 1 -- -p 0x3 --no-mac-updating -c > + rawdev > + > +Refer to the *DPDK Getting Started Guide* for general information on > +running applications and the Environment Abstraction Layer (EAL) options. > + <snip> > diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c new > file mode 100644 index 000000000..8463d82f3 > --- /dev/null > +++ b/examples/ioat/ioatfwd.c > @@ -0,0 +1,1010 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2019 Intel Corporation */ > + > +#include <stdint.h> > +#include <getopt.h> > +#include <signal.h> > +#include <stdbool.h> > +#include <unistd.h> > + > +#include <rte_malloc.h> > +#include <rte_ethdev.h> > +#include <rte_rawdev.h> > +#include <rte_ioat_rawdev.h> > + > +/* size of ring used for software copying between rx and tx. */ > +#define RTE_LOGTYPE_IOAT RTE_LOGTYPE_USER1 #define MAX_PKT_BURST 32 Seems a low max, assume this is actually the default burst size? > +#define MEMPOOL_CACHE_SIZE 512 > +#define MIN_POOL_SIZE 65536U > +#define CMD_LINE_OPT_MAC_UPDATING "mac-updating" > +#define CMD_LINE_OPT_NO_MAC_UPDATING "no-mac-updating" > +#define CMD_LINE_OPT_PORTMASK "portmask" > +#define CMD_LINE_OPT_NB_QUEUE "nb-queue" > +#define CMD_LINE_OPT_COPY_TYPE "copy-type" > +#define CMD_LINE_OPT_RING_SIZE "ring-size" > + > +/* configurable number of RX/TX ring descriptors */ #define > +RX_DEFAULT_RINGSIZE 1024 #define TX_DEFAULT_RINGSIZE 1024 > + > +/* max number of RX queues per port */ #define MAX_RX_QUEUES_COUNT 8 > + > +struct rxtx_port_config { > + /* common config */ > + uint16_t rxtx_port; > + uint16_t nb_queues; > + /* for software copy mode */ > + struct rte_ring *rx_to_tx_ring; > + /* for IOAT rawdev copy mode */ > + uint16_t ioat_ids[MAX_RX_QUEUES_COUNT]; }; > + > +struct rxtx_transmission_config { > + struct rxtx_port_config ports[RTE_MAX_ETHPORTS]; > + uint16_t nb_ports; > + uint16_t nb_lcores; > +}; > + > +/* per-port statistics struct */ > +struct ioat_port_statistics { > + uint64_t rx[RTE_MAX_ETHPORTS]; > + uint64_t tx[RTE_MAX_ETHPORTS]; > + uint64_t tx_dropped[RTE_MAX_ETHPORTS]; > + uint64_t copy_dropped[RTE_MAX_ETHPORTS]; }; struct > +ioat_port_statistics port_statistics; > + > +struct total_statistics { > + uint64_t total_packets_dropped; > + uint64_t total_packets_tx; > + uint64_t total_packets_rx; > + uint64_t total_successful_enqueues; > + uint64_t total_failed_enqueues; > +}; > + > +typedef enum copy_mode_t { > +#define COPY_MODE_SW "sw" > + COPY_MODE_SW_NUM, > +#define COPY_MODE_IOAT "rawdev" > + COPY_MODE_IOAT_NUM, > + COPY_MODE_INVALID_NUM, > + COPY_MODE_SIZE_NUM = COPY_MODE_INVALID_NUM } copy_mode_t; > + > +/* mask of enabled ports */ > +static uint32_t ioat_enabled_port_mask; > + > +/* number of RX queues per port */ > +static uint16_t nb_queues = 1; > + > +/* MAC updating enabled by default. */ static int mac_updating = 1; > + > +/* hardare copy mode enabled by default. */ static copy_mode_t > +copy_mode = COPY_MODE_IOAT_NUM; > + > +/* size of IOAT rawdev ring for hardware copy mode or > + * rte_ring for software copy mode > + */ > +static unsigned short ring_size = 2048; > + > +/* global transmission config */ > +struct rxtx_transmission_config cfg; > + > +/* configurable number of RX/TX ring descriptors */ static uint16_t > +nb_rxd = RX_DEFAULT_RINGSIZE; static uint16_t nb_txd = > +TX_DEFAULT_RINGSIZE; > + > +static volatile bool force_quit; > + > +/* ethernet addresses of ports */ > +static struct rte_ether_addr ioat_ports_eth_addr[RTE_MAX_ETHPORTS]; > + > +static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS]; > +struct rte_mempool *ioat_pktmbuf_pool; > + > +/* Print out statistics for one port. */ static void > +print_port_stats(uint16_t port_id) { > + printf("\nStatistics for port %u ------------------------------" > + "\nPackets sent: %34"PRIu64 > + "\nPackets received: %30"PRIu64 > + "\nPackets dropped on tx: %25"PRIu64 > + "\nPackets dropped on copy: %23"PRIu64, > + port_id, > + port_statistics.tx[port_id], > + port_statistics.rx[port_id], > + port_statistics.tx_dropped[port_id], > + port_statistics.copy_dropped[port_id]); > +} > + > +/* Print out statistics for one IOAT rawdev device. */ static void > +print_rawdev_stats(uint32_t dev_id, uint64_t *xstats, > + uint16_t nb_xstats, struct rte_rawdev_xstats_name *names_xstats) { > + uint16_t i; > + > + printf("\nIOAT channel %u", dev_id); > + for (i = 0; i < nb_xstats; i++) > + if (strstr(names_xstats[i].name, "enqueues")) > + printf("\n\t %s: %*"PRIu64, > + names_xstats[i].name, > + (int)(37 - strlen(names_xstats[i].name)), > + xstats[i]); > +} > + > +static void > +print_total_stats(struct total_statistics *ts) { > + printf("\nAggregate statistics ===============================" > + "\nTotal packets sent: %28"PRIu64 > + "\nTotal packets received: %24"PRIu64 > + "\nTotal packets dropped: %25"PRIu64, > + ts->total_packets_tx, > + ts->total_packets_rx, > + ts->total_packets_dropped); > + > + if (copy_mode == COPY_MODE_IOAT_NUM) { > + printf("\nTotal IOAT successful enqueues: %16"PRIu64 > + "\nTotal IOAT failed enqueues: %20"PRIu64, > + ts->total_successful_enqueues, > + ts->total_failed_enqueues); > + } > + > + printf("\n====================================================\n"); > +} > + For these stats, it would be nice to have deltas i.e. pps, rather than (or as well as) the raw packet count numbers. Since your main stats loop below has a "sleep(1)" at the start, just computing the deltas should give a good enough PPS value. [Marcin] Ok, will be prepared for v2 as PPS value. > +/* Print out statistics on packets dropped. */ static void > +print_stats(char *prgname) { > + struct total_statistics ts; > + uint32_t i, port_id, dev_id; > + struct rte_rawdev_xstats_name *names_xstats; > + uint64_t *xstats; > + unsigned int *ids_xstats; > + unsigned int nb_xstats, id_fail_enq, id_succ_enq; > + char status_string[120]; /* to print at the top of the output */ > + int status_strlen; > + > + > + const char clr[] = { 27, '[', '2', 'J', '\0' }; > + const char topLeft[] = { 27, '[', '1', ';', '1', 'H', '\0' }; > + > + status_strlen = snprintf(status_string, sizeof(status_string), > + "%s, ", prgname); > + status_strlen += snprintf(status_string + status_strlen, > + sizeof(status_string) - status_strlen, > + "Worker Threads = %d, ", > + rte_lcore_count() > 2 ? 2 : 1); > + status_strlen += snprintf(status_string + status_strlen, > + sizeof(status_string) - status_strlen, > + "Copy Mode = %s,\n", copy_mode == COPY_MODE_SW_NUM ? > + COPY_MODE_SW : COPY_MODE_IOAT); > + status_strlen += snprintf(status_string + status_strlen, > + sizeof(status_string) - status_strlen, > + "Updating MAC = %s, ", mac_updating ? > + "enabled" : "disabled"); > + status_strlen += snprintf(status_string + status_strlen, > + sizeof(status_string) - status_strlen, > + "Rx Queues = %d, ", nb_queues); > + status_strlen += snprintf(status_string + status_strlen, > + sizeof(status_string) - status_strlen, > + "Ring Size = %d\n", ring_size); > + > + /* Allocate memory for xstats names and values */ > + nb_xstats = rte_rawdev_xstats_names_get( > + cfg.ports[0].ioat_ids[0], NULL, 0); > + > + names_xstats = malloc(sizeof(*names_xstats) * nb_xstats); > + if (names_xstats == NULL) { > + rte_exit(EXIT_FAILURE, > + "Error allocating xstat names memory\n"); > + } > + rte_rawdev_xstats_names_get(cfg.ports[0].ioat_ids[0], > + names_xstats, nb_xstats); > + > + ids_xstats = malloc(sizeof(*ids_xstats) * nb_xstats); > + if (ids_xstats == NULL) { > + rte_exit(EXIT_FAILURE, > + "Error allocating xstat ids_xstats memory\n"); > + } > + > + for (i = 0; i < nb_xstats; i++) > + ids_xstats[i] = i; > + > + xstats = malloc(sizeof(*xstats) * nb_xstats); > + if (xstats == NULL) { > + rte_exit(EXIT_FAILURE, > + "Error allocating xstat memory\n"); > + } > + > + /* Get failed/successful enqueues stats index */ > + id_fail_enq = id_succ_enq = nb_xstats; > + for (i = 0; i < nb_xstats; i++) { > + if (!strcmp(names_xstats[i].name, "failed_enqueues")) > + id_fail_enq = i; > + else if (!strcmp(names_xstats[i].name, "successful_enqueues")) > + id_succ_enq = i; > + if (id_fail_enq < nb_xstats && id_succ_enq < nb_xstats) > + break; > + } > + if (id_fail_enq == nb_xstats || id_succ_enq == nb_xstats) { > + rte_exit(EXIT_FAILURE, > + "Error getting failed/successful enqueues stats > index\n"); > + } > + > + while (!force_quit) { > + /* Sleep for 1 second each round - init sleep allows reading > + * messages from app startup. > + */ > + sleep(1); > + > + /* Clear screen and move to top left */ > + printf("%s%s", clr, topLeft); > + > + memset(&ts, 0, sizeof(struct total_statistics)); > + > + printf("%s", status_string); > + > + for (i = 0; i < cfg.nb_ports; i++) { > + port_id = cfg.ports[i].rxtx_port; > + print_port_stats(port_id); > + > + ts.total_packets_dropped += > + port_statistics.tx_dropped[port_id] > + + port_statistics.copy_dropped[port_id]; > + ts.total_packets_tx += port_statistics.tx[port_id]; > + ts.total_packets_rx += port_statistics.rx[port_id]; > + > + if (copy_mode == COPY_MODE_IOAT_NUM) { > + uint32_t j; > + > + for (j = 0; j < cfg.ports[i].nb_queues; j++) { > + dev_id = cfg.ports[i].ioat_ids[j]; > + rte_rawdev_xstats_get(dev_id, > + ids_xstats, xstats, nb_xstats); > + > + print_rawdev_stats(dev_id, xstats, > + nb_xstats, names_xstats); > + > + ts.total_successful_enqueues += > + xstats[id_succ_enq]; > + ts.total_failed_enqueues += > + xstats[id_fail_enq]; > + } > + } > + } > + printf("\n"); > + > + print_total_stats(&ts); > + } > + > + free(names_xstats); > + free(xstats); > + free(ids_xstats); > +} <snip>