On Sat, Mar 05, 2011 at 12:35:58PM -0300, Christiano F. Haesbaert wrote: > Rodolfo Gouveia kindly did some tests on the tcp performance after the > diff with gigabits ifs. Apparently there was a significant drop in > performance (from 950mbps to 750mbps) after the change from poll to > libevent, I'm not sure what caused the drop but it's on the receiver > side, so my first suspect is kevent, than the libevent overhead. > > Would be great if people could test tcp mode before and after the > patch, also try running the patched version with: > export EVENT_NOKQUEUE=yes > > This should make libevent fall back to poll(2) as noted by nicm@. > > If you can please, run the tests and send over, unfortunately I don't > have gigabit interfaces.
I fixed the drop on performance, I was calling arc4random_buf() in the sender loop, should be called just once for the whole buffer, Rodolfo Gouveia did the same tests after the change and we got similar results to base tcpbench. The UDP mode also got more stable, here is the sender output: elendil:tcpbench: ./tcpbench -u gandalf Elapsed: 1005 Mbps: 107.050 Peak Mbps: 107.050 Tx PPS: 9090.000 Elapsed: 2009 Mbps: 95.758 Peak Mbps: 107.050 Tx PPS: 8131.000 Elapsed: 3013 Mbps: 95.756 Peak Mbps: 107.050 Tx PPS: 8131.000 Elapsed: 4018 Mbps: 95.745 Peak Mbps: 107.050 Tx PPS: 8130.000 Elapsed: 5020 Mbps: 95.795 Peak Mbps: 107.050 Tx PPS: 8134.000 Elapsed: 6022 Mbps: 95.784 Peak Mbps: 107.050 Tx PPS: 8133.000 Elapsed: 7025 Mbps: 95.759 Peak Mbps: 107.050 Tx PPS: 8131.000 Elapsed: 8028 Mbps: 95.758 Peak Mbps: 107.050 Tx PPS: 8131.000 Elapsed: 9033 Mbps: 95.768 Peak Mbps: 107.050 Tx PPS: 8132.000 Almost no change in PPS. Here is the final diff, with the arc4random_buf() fix and the manpage changes suggested by sthen@. My many thanks to Rodolfo which made the gigabit tests. Index: Makefile =================================================================== RCS file: /cvs/src/usr.bin/tcpbench/Makefile,v retrieving revision 1.3 diff -d -u -p -w -r1.3 Makefile --- Makefile 26 Jun 2008 07:05:56 -0000 1.3 +++ Makefile 6 Mar 2011 19:51:00 -0000 @@ -15,7 +15,7 @@ CDIAGFLAGS+= -Wshadow PROG=tcpbench -LDADD=-lkvm +LDADD=-lkvm -levent #BINGRP= kmem #BINMODE=2555 Index: tcpbench.1 =================================================================== RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.1,v retrieving revision 1.10 diff -d -u -p -w -r1.10 tcpbench.1 --- tcpbench.1 26 Oct 2010 20:26:37 -0000 1.10 +++ tcpbench.1 6 Mar 2011 19:51:00 -0000 @@ -19,17 +19,18 @@ .Os .Sh NAME .Nm tcpbench -.Nd TCP benchmarking and measurement tool +.Nd TCP and UDP benchmarking and measurement tool .Sh SYNOPSIS .Nm .Fl l .Nm .Op Fl v +.Op Fl u .Op Fl B Ar buf .Op Fl k Ar kvars .Op Fl n Ar connections .Op Fl p Ar port -.Op Fl r Ar rate +.Op Fl r Ar interval .Op Fl S Ar space .Op Fl V Ar rtable .Ar hostname @@ -37,10 +38,11 @@ .Bk -words .Fl s .Op Fl v +.Op Fl u .Op Fl B Ar buf .Op Fl k Ar kvars .Op Fl p Ar port -.Op Fl r Ar rate +.Op Fl r Ar interval .Op Fl S Ar space .Op Fl V Ar rtable .Ek @@ -58,13 +60,13 @@ The client must be invoked with the .Ar hostname of a listening server to connect to. .Pp -Once connected, the client will send TCP traffic as fast as possible to +Once connected, the client will send TCP or UDP traffic as fast as possible to the server. Both the client and server will periodically display throughput statistics along with any kernel variables the user has selected to sample (using the .Fl k -option). +option, which is only available in TCP mode). A list of available kernel variables may be obtained using the .Fl l option. @@ -74,30 +76,43 @@ The options are as follows: .It Fl B Ar buf Specify the size of the internal read/write buffer used by .Nm . -The default is 262144 bytes. +The default is 262144 bytes for TCP client/server and UDP server. + +In UDP client mode, this may be used to specify the packet size on the test +stream. .It Fl k Ar kvars Specify one or more kernel variables to monitor; multiple variables must be -separated with commas. +separated with commas. This option is only valid in TCP mode. The default is not to monitor any variables. Using this option requires read access to .Pa /dev/kmem . .It Fl l List the name of kernel variables available for monitoring and exit. .It Fl n Ar connections -Use the given number of TCP connections (default: 1). +Use the given number of TCP connections (default: 1). UDP is connectionless so +this option isn't valid. .It Fl p Ar port -Specify the port used for the TCP test stream (default: 12345). -.It Fl r Ar rate -Specify the statistics reporting rate in milliseconds (default: 1000). +Specify the port used for the TCP or UDP test stream (default: 12345). +.It Fl r Ar interval +Specify the statistics interval reporting rate in milliseconds (default: 1000). .It Fl S Ar space -Set the size of the socket buffer used for the TCP test stream. +Set the size of the socket buffer used for the TCP or UDP test stream. On the client this option will resize the send buffer; on the server it will resize the receive buffer. .It Fl s Place .Nm in server mode, where it will listen on all interfaces for incoming -connections. +connections. Defaults to TCP if +.Fl u +not specified. +.It Fl u +Use UDP instead of TCP, this must be specified on both client and +server. Transmitted packets per second (TX PPS) will be accounted on the client +side, while received packets per second (RX PPS) whill be accounted on the +server side. UDP has no Protocol Control Block (PCB) so the +.Fl k +flags don't apply. .It Fl V Ar rtable Set the routing table to be used. The default is 0. @@ -118,3 +133,6 @@ The .Nm program was written by .An Damien Miller Aq d...@openbsd.org . + +UDP mode and libevent port by +.An Christiano F. Haesbaert Aq haesba...@haesbaert.org . Index: tcpbench.c =================================================================== RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.c,v retrieving revision 1.19 diff -d -u -p -w -r1.19 tcpbench.c --- tcpbench.c 19 Oct 2010 10:03:23 -0000 1.19 +++ tcpbench.c 6 Mar 2011 19:51:00 -0000 @@ -1,5 +1,6 @@ /* * Copyright (c) 2008 Damien Miller <d...@mindrot.org> + * Copyright (c) 2011 Christiano F. Haesbaert <haesba...@haesbaert.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above @@ -19,6 +20,7 @@ #include <sys/socket.h> #include <sys/socketvar.h> #include <sys/resource.h> +#include <sys/queue.h> #include <net/route.h> @@ -39,6 +41,7 @@ #include <stdio.h> #include <string.h> #include <errno.h> +#include <event.h> #include <netdb.h> #include <signal.h> #include <err.h> @@ -50,29 +53,67 @@ #define DEFAULT_PORT "12345" #define DEFAULT_STATS_INTERVAL 1000 /* ms */ -#define DEFAULT_BUF 256 * 1024 +#define DEFAULT_BUF (256 * 1024) +#define DEFAULT_UDP_PKT (1500 - 28) /* TODO don't hardcode this */ +#define TCP_MODE !ptb->uflag +#define UDP_MODE ptb->uflag #define MAX_FD 1024 -sig_atomic_t done = 0; -sig_atomic_t proc_slice = 0; - -static u_int rtableid; -static char **kflag; -static size_t Bflag; -static int Sflag; -static int rflag; -static int sflag; -static int vflag; +/* Our tcpbench globals */ +struct { + u_int Vflag; /* rtableid */ + int Sflag; /* Socket buffer size (tcp mode) */ + u_int rflag; /* Report rate (ms) */ + int sflag; /* True if server */ + int vflag; /* Verbose */ + int uflag; /* UDP mode */ + kvm_t *kvmh; /* Kvm handler */ + char **kvars; /* Kvm enabled vars */ + u_long ktcbtab; /* Ktcb */ + char *dummybuf; /* IO buffer */ + size_t dummybuf_len; /* IO buffer len */ +} tcpbench, *ptb; -/* stats for a single connection */ +/* stats for a single tcp connection, udp uses only one */ struct statctx { + TAILQ_ENTRY(statctx) entry; struct timeval t_start, t_last; unsigned long long bytes; - u_long tcbaddr; - char **kvars; - kvm_t *kh; + int fd; + char *buf; + size_t buflen; + struct event ev; + /* TCP only */ + u_long tcp_tcbaddr; + /* UDP only */ + u_long udp_slice_pkts; }; +static void signal_handler(int, short, void *); +static void saddr_ntop(const struct sockaddr *, socklen_t, char *, size_t); +static void drop_gid(void); +static void set_slice_timer(int); +static void print_tcp_header(void); +static void kget(u_long, void *, size_t); +static u_long kfind_tcb(int); +static void kupdate_stats(u_long, struct inpcb *, struct tcpcb *, + struct socket *); +static void list_kvars(void); +static void check_kvar(const char *); +static char ** check_prepare_kvars(char *); +static void stats_prepare(struct statctx *); +static void tcp_stats_display(unsigned long long, long double, float, + struct statctx *, struct inpcb *, struct tcpcb *, struct socket *); +static void tcp_process_slice(int, short, void *); +static void tcp_server_handle_sc(int, short, void *); +static void tcp_server_accept(int, short, void *); +static nfds_t server_init(struct addrinfo *, struct statctx *); +static void client_handle_sc(int, short, void *); +static void client_init(struct addrinfo *, int, struct statctx *); +static int clock_gettime_tv(clockid_t, struct timeval *); +static void udp_server_handle_sc(int, short, void *); +static void udp_process_slice(int, short, void *); + /* * We account the mainstats here, that is the stats * for all connections, all variables starting with slice @@ -82,12 +123,12 @@ struct statctx { */ static struct { unsigned long long slice_bytes; /* bytes for last slice */ - struct timeval t_start; /* when we started counting */ long double peak_mbps; /* peak mbps so far */ int nconns; /* connected clients */ + struct event timer; /* process timer */ } mainstats; -/* When adding variables, also add to stats_display() */ +/* When adding variables, also add to tcp_stats_display() */ static const char *allowed_kvars[] = { "inpcb.inp_flags", "sockb.so_rcv.sb_cc", @@ -124,32 +165,40 @@ static const char *allowed_kvars[] = { NULL }; -static void -exitsighand(int signo) -{ - done = signo; -} - -static void -alarmhandler(int signo) -{ - proc_slice = 1; - signal(signo, alarmhandler); -} +TAILQ_HEAD(, statctx) sc_queue; static void __dead usage(void) { fprintf(stderr, "usage: tcpbench -l\n" - " tcpbench [-v] [-B buf] [-k kvars] [-n connections] [-p port]\n" - " [-r rate] [-S space] [-V rtable] hostname\n" - " tcpbench -s [-v] [-B buf] [-k kvars] [-p port]\n" - " [-r rate] [-S space] [-V rtable]\n"); + " tcpbench [-v] [-u] [-B buf] [-k kvars] [-n connections]\n" + " [-p port] [-q] [-r rate] [-S space] [-V rtable] hostname\n" + " tcpbench -s [-v] [-u] [-B buf] [-k kvars] [-p port]\n" + " [-q] [-r rate] [-S space] [-V rtable]\n"); exit(1); } static void +signal_handler(int sig, short event, void *bula) +{ + /* + * signal handler rules don't apply, libevent decouples for us + */ + switch (sig) { + case SIGINT: + case SIGTERM: + case SIGHUP: + warnx("Terminated by signal %d", sig); + exit(0); + break; /* NOTREACHED */ + default: + errx(1, "unexpected signal %d", sig); + break; /* NOTREACHED */ + } +} + +static void saddr_ntop(const struct sockaddr *addr, socklen_t alen, char *buf, size_t len) { char hbuf[NI_MAXHOST], pbuf[NI_MAXSERV]; @@ -166,47 +215,72 @@ saddr_ntop(const struct sockaddr *addr, } static void -set_timer(int toggle) +drop_gid(void) { - struct itimerval itv; + gid_t gid; - if (rflag <= 0) + gid = getgid(); + if (setresgid(gid, gid, gid) == -1) + err(1, "setresgid"); +} + +static void +set_slice_timer(int on) +{ + struct timeval tv; + + if (ptb->rflag == 0) return; - if (toggle) { - itv.it_interval.tv_sec = rflag / 1000; - itv.it_interval.tv_usec = (rflag % 1000) * 1000; - itv.it_value = itv.it_interval; + if (on) { + if (evtimer_pending(&mainstats.timer, NULL)) + return; + timerclear(&tv); + /* XXX Is there a better way to do this ? */ + tv.tv_sec = ptb->rflag / 1000; + tv.tv_usec = (ptb->rflag % 1000) * 1000; + + evtimer_add(&mainstats.timer, &tv); + } else { + if (evtimer_pending(&mainstats.timer, NULL)) + evtimer_del(&mainstats.timer); + } } - else - bzero(&itv, sizeof(itv)); - setitimer(ITIMER_REAL, &itv, NULL); +static int +clock_gettime_tv(clockid_t clock_id, struct timeval *tv) +{ + struct timespec ts; + + if (clock_gettime(clock_id, &ts) == -1) + return (-1); + + TIMESPEC_TO_TIMEVAL(tv, &ts); + + return (0); } static void -print_header(void) +print_tcp_header(void) { char **kv; - printf("%12s %14s %12s %8s ", "elapsed_ms", "bytes", "mbps", "bwidth"); - - for (kv = kflag; kflag != NULL && *kv != NULL; kv++) - printf("%s%s", kv != kflag ? "," : "", *kv); + for (kv = ptb->kvars; ptb->kvars != NULL && *kv != NULL; kv++) + printf("%s%s", kv != ptb->kvars ? "," : "", *kv); printf("\n"); } static void -kget(kvm_t *kh, u_long addr, void *buf, size_t size) +kget(u_long addr, void *buf, size_t size) { - if (kvm_read(kh, addr, buf, size) != (ssize_t)size) - errx(1, "kvm_read: %s", kvm_geterr(kh)); + if (kvm_read(ptb->kvmh, addr, buf, size) != (ssize_t)size) + errx(1, "kvm_read: %s", kvm_geterr(ptb->kvmh)); } static u_long -kfind_tcb(kvm_t *kh, u_long ktcbtab, int sock) +kfind_tcb(int sock) { struct inpcbtable tcbtab; struct inpcb *head, *next, *prev; @@ -230,27 +304,27 @@ kfind_tcb(kvm_t *kh, u_long ktcbtab, int errx(1, "%s: me.ss_family != them.ss_family", __func__); if (me.ss_family != AF_INET && me.ss_family != AF_INET6) errx(1, "%s: unknown socket family", __func__); - if (vflag >= 2) { + if (ptb->vflag >= 2) { saddr_ntop((struct sockaddr *)&me, me.ss_len, tmp1, sizeof(tmp1)); saddr_ntop((struct sockaddr *)&them, them.ss_len, tmp2, sizeof(tmp2)); fprintf(stderr, "Our socket local %s remote %s\n", tmp1, tmp2); } - if (vflag >= 2) - fprintf(stderr, "Using PCB table at %lu\n", ktcbtab); + if (ptb->vflag >= 2) + fprintf(stderr, "Using PCB table at %lu\n", ptb->ktcbtab); retry: - kget(kh, ktcbtab, &tcbtab, sizeof(tcbtab)); + kget(ptb->ktcbtab, &tcbtab, sizeof(tcbtab)); prev = head = (struct inpcb *)&CIRCLEQ_FIRST( - &((struct inpcbtable *)ktcbtab)->inpt_queue); + &((struct inpcbtable *)ptb->ktcbtab)->inpt_queue); next = CIRCLEQ_FIRST(&tcbtab.inpt_queue); - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "PCB head at %p\n", head); while (next != head) { - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Checking PCB %p\n", next); - kget(kh, (u_long)next, &inpcb, sizeof(inpcb)); + kget((u_long)next, &inpcb, sizeof(inpcb)); if (CIRCLEQ_PREV(&inpcb, inp_queue) != prev) { if (nretry--) { warnx("pcb prev pointer insane"); @@ -265,11 +339,11 @@ retry: if (me.ss_family == AF_INET) { if ((inpcb.inp_flags & INP_IPV6) != 0) { - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Skip: INP_IPV6"); continue; } - if (vflag >= 2) { + if (ptb->vflag >= 2) { inet_ntop(AF_INET, &inpcb.inp_laddr, tmp1, sizeof(tmp1)); inet_ntop(AF_INET, &inpcb.inp_faddr, @@ -292,7 +366,7 @@ retry: } else { if ((inpcb.inp_flags & INP_IPV6) == 0) continue; - if (vflag >= 2) { + if (ptb->vflag >= 2) { inet_ntop(AF_INET6, &inpcb.inp_laddr6, tmp1, sizeof(tmp1)); inet_ntop(AF_INET6, &inpcb.inp_faddr6, @@ -313,27 +387,27 @@ retry: in6->sin6_port != inpcb.inp_fport) continue; } - kget(kh, (u_long)inpcb.inp_ppcb, &tcpcb, sizeof(tcpcb)); + kget((u_long)inpcb.inp_ppcb, &tcpcb, sizeof(tcpcb)); if (tcpcb.t_state != TCPS_ESTABLISHED) { - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Not established\n"); continue; } - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Found PCB at %p\n", prev); - return (u_long)prev; + return ((u_long)prev); } errx(1, "No matching PCB found"); } static void -kupdate_stats(kvm_t *kh, u_long tcbaddr, - struct inpcb *inpcb, struct tcpcb *tcpcb, struct socket *sockb) +kupdate_stats(u_long tcbaddr, struct inpcb *inpcb, + struct tcpcb *tcpcb, struct socket *sockb) { - kget(kh, tcbaddr, inpcb, sizeof(*inpcb)); - kget(kh, (u_long)inpcb->inp_ppcb, tcpcb, sizeof(*tcpcb)); - kget(kh, (u_long)inpcb->inp_socket, sockb, sizeof(*sockb)); + kget(tcbaddr, inpcb, sizeof(*inpcb)); + kget((u_long)inpcb->inp_ppcb, tcpcb, sizeof(*tcpcb)); + kget((u_long)inpcb->inp_socket, sockb, sizeof(*sockb)); } static void @@ -371,39 +445,25 @@ check_prepare_kvars(char *list) errx(1, "strdup"); ret[n] = NULL; } - return ret; + return (ret); } static void -stats_prepare(struct statctx *sc, int fd, kvm_t *kh, u_long ktcbtab) +stats_prepare(struct statctx *sc) { - if (rflag <= 0) - return; - sc->kh = kh; - sc->kvars = kflag; - if (kflag) - sc->tcbaddr = kfind_tcb(kh, ktcbtab, fd); - if (gettimeofday(&sc->t_start, NULL) == -1) - err(1, "gettimeofday"); - sc->t_last = sc->t_start; - sc->bytes = 0; -} + sc->buf = ptb->dummybuf; + sc->buflen = ptb->dummybuf_len; -static void -stats_update(struct statctx *sc, ssize_t n) -{ - sc->bytes += n; - mainstats.slice_bytes += n; -} + if (ptb->kvars) + sc->tcp_tcbaddr = kfind_tcb(sc->fd); + if (clock_gettime_tv(CLOCK_MONOTONIC, &sc->t_start) == -1) + err(1, "clock_gettime_tv"); + sc->t_last = sc->t_start; -static void -stats_cleanslice(void) -{ - mainstats.slice_bytes = 0; } static void -stats_display(unsigned long long total_elapsed, long double mbps, +tcp_stats_display(unsigned long long total_elapsed, long double mbps, float bwperc, struct statctx *sc, struct inpcb *inpcb, struct tcpcb *tcpcb, struct socket *sockb) { @@ -412,14 +472,14 @@ stats_display(unsigned long long total_e printf("%12llu %14llu %12.3Lf %7.2f%% ", total_elapsed, sc->bytes, mbps, bwperc); - if (sc->kvars != NULL) { - kupdate_stats(sc->kh, sc->tcbaddr, inpcb, tcpcb, + if (ptb->kvars != NULL) { + kupdate_stats(sc->tcp_tcbaddr, inpcb, tcpcb, sockb); - for (j = 0; sc->kvars[j] != NULL; j++) { + for (j = 0; ptb->kvars[j] != NULL; j++) { #define S(a) #a #define P(b, v, f) \ - if (strcmp(sc->kvars[j], S(b.v)) == 0) { \ + if (strcmp(ptb->kvars[j], S(b.v)) == 0) { \ printf("%s"f, j > 0 ? "," : "", b->v); \ continue; \ } @@ -463,30 +523,24 @@ stats_display(unsigned long long total_e } static void -mainstats_display(long double slice_mbps, long double avg_mbps) -{ - printf("Conn: %3d Mbps: %12.3Lf Peak Mbps: %12.3Lf Avg Mbps: %12.3Lf\n", - mainstats.nconns, slice_mbps, mainstats.peak_mbps, avg_mbps); -} - -static void -process_slice(struct statctx *sc, size_t nsc) +tcp_process_slice(int fd, short event, void *bula) { unsigned long long total_elapsed, since_last; long double mbps, slice_mbps = 0; float bwperc; - nfds_t i; + struct statctx *sc; struct timeval t_cur, t_diff; struct inpcb inpcb; struct tcpcb tcpcb; struct socket sockb; - for (i = 0; i < nsc; i++, sc++) { - if (gettimeofday(&t_cur, NULL) == -1) - err(1, "gettimeofday"); - if (sc->kvars != NULL) /* process kernel stats */ - kupdate_stats(sc->kh, sc->tcbaddr, &inpcb, &tcpcb, + TAILQ_FOREACH(sc, &sc_queue, entry) { + if (clock_gettime_tv(CLOCK_MONOTONIC, &t_cur) == -1) + err(1, "clock_gettime_tv"); + if (ptb->kvars != NULL) /* process kernel stats */ + kupdate_stats(sc->tcp_tcbaddr, &inpcb, &tcpcb, &sockb); + timersub(&t_cur, &sc->t_start, &t_diff); total_elapsed = t_diff.tv_sec * 1000 + t_diff.tv_usec / 1000; timersub(&t_cur, &sc->t_last, &t_diff); @@ -495,80 +549,181 @@ process_slice(struct statctx *sc, size_t mbps = (sc->bytes * 8) / (since_last * 1000.0); slice_mbps += mbps; - stats_display(total_elapsed, mbps, bwperc, sc, + tcp_stats_display(total_elapsed, mbps, bwperc, sc, &inpcb, &tcpcb, &sockb); sc->t_last = t_cur; sc->bytes = 0; - } /* process stats for this slice */ if (slice_mbps > mainstats.peak_mbps) mainstats.peak_mbps = slice_mbps; - mainstats_display(slice_mbps, slice_mbps / mainstats.nconns); + printf("Conn: %3d Mbps: %12.3Lf Peak Mbps: %12.3Lf Avg Mbps: %12.3Lf\n", + mainstats.nconns, slice_mbps, mainstats.peak_mbps, + slice_mbps / mainstats.nconns); + mainstats.slice_bytes = 0; + + set_slice_timer(mainstats.nconns > 0); } -static int -handle_connection(struct statctx *sc, int fd, char *buf, size_t buflen) +static void +udp_process_slice(int fd, short event, void *v_sc) +{ + struct statctx *sc = v_sc; + unsigned long long total_elapsed, since_last; + long double slice_mbps, pps; + struct timeval t_cur, t_diff; + + if (clock_gettime_tv(CLOCK_MONOTONIC, &t_cur) == -1) + err(1, "clock_gettime_tv"); + /* Calculate pps */ + timersub(&t_cur, &sc->t_start, &t_diff); + total_elapsed = t_diff.tv_sec * 1000 + t_diff.tv_usec / 1000; + timersub(&t_cur, &sc->t_last, &t_diff); + since_last = t_diff.tv_sec * 1000 + t_diff.tv_usec / 1000; + slice_mbps = (sc->bytes * 8) / (since_last * 1000.0); + pps = (sc->udp_slice_pkts * 1000) / since_last; + if (slice_mbps > mainstats.peak_mbps) + mainstats.peak_mbps = slice_mbps; + printf("Elapsed: %11llu Mbps: %11.3Lf Peak Mbps: %11.3Lf %s PPS: %10.3Lf\n", + total_elapsed, slice_mbps, mainstats.peak_mbps, ptb->sflag ? "Rx" : "Tx", + pps); + + /* Clean up this slice time */ + sc->t_last = t_cur; + sc->bytes = 0; + sc->udp_slice_pkts = 0; + set_slice_timer(1); +} + +static void +udp_server_handle_sc(int fd, short event, void *v_sc) { ssize_t n; + struct statctx *sc = v_sc; again: - n = read(fd, buf, buflen); - if (n == -1) { + n = read(fd, ptb->dummybuf, ptb->dummybuf_len); + if (n == 0) + return; + else if (n == -1) { if (errno == EINTR) goto again; else if (errno == EWOULDBLOCK) - return 0; + return; warn("fd %d read error", fd); + return; + } - return -1; + if (ptb->vflag >= 3) + fprintf(stderr, "read: %zd bytes\n", n); + /* If this was our first packet, start slice timer */ + if (mainstats.peak_mbps == 0) + set_slice_timer(1); + /* Account packet */ + sc->udp_slice_pkts++; + sc->bytes += n; } - else if (n == 0) { - if (vflag) - fprintf(stderr, "%8d closed by remote end\n", fd); - close(fd); - return -1; + +static void +tcp_server_handle_sc(int fd, short event, void *v_sc) +{ + struct statctx *sc = v_sc; + ssize_t n; + +again: + n = read(sc->fd, sc->buf, sc->buflen); + if (n == -1) { + if (errno == EINTR) + goto again; + else if (errno == EWOULDBLOCK) + return; + warn("fd %d read error", sc->fd); + return; + } else if (n == 0) { + if (ptb->vflag) + fprintf(stderr, "%8d closed by remote end\n", sc->fd); + close(sc->fd); + TAILQ_REMOVE(&sc_queue, sc, entry); + free(sc); + mainstats.nconns--; + set_slice_timer(mainstats.nconns > 0); + return; } - if (vflag >= 3) + if (ptb->vflag >= 3) fprintf(stderr, "read: %zd bytes\n", n); + sc->bytes += n; + mainstats.slice_bytes += n; +} - stats_update(sc, n); - return 0; +static void +tcp_server_accept(int fd, short event, void *bula) +{ + int sock, r; + struct statctx *sc; + struct sockaddr_storage ss; + socklen_t sslen; + char tmp[128]; + + sslen = sizeof(ss); +again: + if ((sock = accept(fd, (struct sockaddr *)&ss, + &sslen)) == -1) { + if (errno == EINTR) + goto again; + warn("accept"); + return; + } + saddr_ntop((struct sockaddr *)&ss, sslen, + tmp, sizeof(tmp)); + if ((r = fcntl(sock, F_GETFL, 0)) == -1) + err(1, "fcntl(F_GETFL)"); + r |= O_NONBLOCK; + if (fcntl(sock, F_SETFL, r) == -1) + err(1, "fcntl(F_SETFL, O_NONBLOCK)"); + /* Alloc client structure and register reading callback */ + if ((sc = calloc(1, sizeof(*sc))) == NULL) + err(1, "calloc"); + sc->fd = sock; + stats_prepare(sc); + event_set(&sc->ev, sc->fd, EV_READ | EV_PERSIST, + tcp_server_handle_sc, sc); + event_add(&sc->ev, NULL); + TAILQ_INSERT_TAIL(&sc_queue, sc, entry); + mainstats.nconns++; + set_slice_timer(mainstats.nconns > 0); + if (ptb->vflag) + warnx("Accepted connection from %s, fd = %d\n", tmp, sc->fd); } static nfds_t -serverbind(struct pollfd *pfd, nfds_t max_nfds, struct addrinfo *aitop) +server_init(struct addrinfo *aitop, struct statctx *udp_sc) { char tmp[128]; int sock, on = 1; struct addrinfo *ai; + struct event *ev; nfds_t lnfds; lnfds = 0; for (ai = aitop; ai != NULL; ai = ai->ai_next) { - if (lnfds == max_nfds) { - fprintf(stderr, - "maximum number of listening fds reached\n"); - break; - } saddr_ntop(ai->ai_addr, ai->ai_addrlen, tmp, sizeof(tmp)); - if (vflag) - fprintf(stderr, "Try to listen on %s\n", tmp); + if (ptb->vflag) + fprintf(stderr, "Try to bind to %s\n", tmp); if ((sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) == -1) { if (ai->ai_next == NULL) err(1, "socket"); - if (vflag) + if (ptb->vflag) warn("socket"); continue; } - if (rtableid && ai->ai_family == AF_INET) { + if (ptb->Vflag && ai->ai_family == AF_INET) { if (setsockopt(sock, IPPROTO_IP, SO_RTABLE, - &rtableid, sizeof(rtableid)) == -1) + &ptb->Vflag, sizeof(ptb->Vflag)) == -1) err(1, "setsockopt SO_RTABLE"); - } else if (rtableid) + } else if (ptb->Vflag) warnx("rtable only supported on AF_INET"); if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) == -1) @@ -576,214 +731,108 @@ serverbind(struct pollfd *pfd, nfds_t ma if (bind(sock, ai->ai_addr, ai->ai_addrlen) != 0) { if (ai->ai_next == NULL) err(1, "bind"); - if (vflag) + if (ptb->vflag) warn("bind"); close(sock); continue; } - if (Sflag) { + if (ptb->Sflag) { if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, - &Sflag, sizeof(Sflag)) == -1) - warn("set TCP receive buffer size"); + &ptb->Sflag, sizeof(ptb->Sflag)) == -1) + warn("set receive buffer size"); } + if (TCP_MODE) if (listen(sock, 64) == -1) { if (ai->ai_next == NULL) err(1, "listen"); - if (vflag) + if (ptb->vflag) warn("listen"); close(sock); continue; } - if (vflag >= 3) - fprintf(stderr, "listening on fd %d\n", sock); + if ((ev = calloc(1, sizeof(*ev))) == NULL) + err(1, "calloc"); + if (UDP_MODE) + event_set(ev, sock, EV_READ | EV_PERSIST, + udp_server_handle_sc, udp_sc); + else + event_set(ev, sock, EV_READ | EV_PERSIST, + tcp_server_accept, NULL); + event_add(ev, NULL); + if (ptb->vflag >= 3) + fprintf(stderr, "bound to fd %d\n", sock); lnfds++; - pfd[lnfds - 1].fd = sock; - pfd[lnfds - 1].events = POLLIN; - } freeaddrinfo(aitop); if (lnfds == 0) errx(1, "No working listen addresses found"); - return lnfds; + return (lnfds); } static void -set_listening(struct pollfd *pfd, nfds_t lfds, int toggle) { - int i; - - for (i = 0; i < (int)lfds; i++) { - if (toggle) - pfd[i].events = POLLIN; - else - pfd[i].events = 0; - } - -} -static void __dead -serverloop(kvm_t *kvmh, u_long ktcbtab, struct addrinfo *aitop) +client_handle_sc(int fd, short event, void *v_sc) { - socklen_t sslen; - struct pollfd *pfd; - char tmp[128], *buf; - struct statctx *psc; - struct sockaddr_storage ss; - nfds_t i, nfds, lfds; - size_t nalloc; - int r, sock, client_id; - - sslen = sizeof(ss); - nalloc = 128; - if ((pfd = calloc(sizeof(*pfd), nalloc)) == NULL) - err(1, "calloc"); - if ((psc = calloc(sizeof(*psc), nalloc)) == NULL) - err(1, "calloc"); - if ((buf = malloc(Bflag)) == NULL) - err(1, "malloc"); - lfds = nfds = serverbind(pfd, nalloc - 1, aitop); - if (vflag >= 3) - fprintf(stderr, "listening on %d fds\n", lfds); - if (setpgid(0, 0) == -1) - err(1, "setpgid"); - - print_header(); - - client_id = 0; - while (!done) { - if (proc_slice) { - process_slice(psc + lfds, nfds - lfds); - stats_cleanslice(); - proc_slice = 0; - } - if (vflag >= 3) - fprintf(stderr, "mainstats.nconns = %u\n", - mainstats.nconns); - if ((r = poll(pfd, nfds, INFTIM)) == -1) { - if (errno == EINTR) - continue; - warn("poll"); - break; - } + struct statctx *sc = v_sc; + ssize_t n; - if (vflag >= 3) - fprintf(stderr, "poll: %d\n", r); - for (i = 0 ; r > 0 && i < nfds; i++) { - if ((pfd[i].revents & POLLIN) == 0) - continue; - if (pfd[i].fd == -1) - errx(1, "pfd insane"); - r--; - if (vflag >= 3) - fprintf(stderr, "fd %d active i = %d\n", - pfd[i].fd, i); - /* new connection */ - if (i < lfds) { - if ((sock = accept(pfd[i].fd, - (struct sockaddr *)&ss, - &sslen)) == -1) { - if (errno == EINTR) - continue; - else if (errno == EMFILE || - errno == ENFILE) - set_listening(pfd, lfds, 0); - warn("accept"); - continue; - } - if ((r = fcntl(sock, F_GETFL, 0)) == -1) - err(1, "fcntl(F_GETFL)"); - r |= O_NONBLOCK; - if (fcntl(sock, F_SETFL, r) == -1) - err(1, "fcntl(F_SETFL, O_NONBLOCK)"); - saddr_ntop((struct sockaddr *)&ss, sslen, - tmp, sizeof(tmp)); - if (vflag) - fprintf(stderr, - "Accepted connection %d from " - "%s, fd = %d\n", client_id++, tmp, - sock); - /* alloc more space if we're full */ - if (nfds == nalloc) { - nalloc *= 2; - if ((pfd = realloc(pfd, - sizeof(*pfd) * nalloc)) == NULL) - err(1, "realloc"); - if ((psc = realloc(psc, - sizeof(*psc) * nalloc)) == NULL) - err(1, "realloc"); - } - pfd[nfds].fd = sock; - pfd[nfds].events = POLLIN; - stats_prepare(&psc[nfds], sock, kvmh, ktcbtab); - nfds++; - if (!mainstats.nconns++) - set_timer(1); - continue; +again: + if ((n = write(sc->fd, sc->buf, sc->buflen)) == -1) { + if (errno == EINTR || errno == EAGAIN || (UDP_MODE && ENOBUFS)) + goto again; + err(1, "write"); } - /* event in fd */ - if (vflag >= 3) - fprintf(stderr, - "fd %d active", pfd[i].fd); - while (handle_connection(&psc[i], pfd[i].fd, - buf, Bflag) == -1) { - pfd[i] = pfd[nfds - 1]; - pfd[nfds - 1].fd = -1; - psc[i] = psc[nfds - 1]; - mainstats.nconns--; - nfds--; - /* stop display if no clients */ - if (!mainstats.nconns) { - proc_slice = 1; - set_timer(0); + if (TCP_MODE && n == 0) { + warnx("Remote end closed connection"); + exit(1); } - /* if we were full */ - set_listening(pfd, lfds, 1); + if (ptb->vflag >= 3) + warnx("write: %zd bytes\n", n); + sc->bytes += n; + mainstats.slice_bytes += n; + if (UDP_MODE) + sc->udp_slice_pkts++; - /* is there an event pending on the last fd? */ - if (pfd[i].fd == -1 || - (pfd[i].revents & POLLIN) == 0) - break; - } - } - } - exit(1); } -void -clientconnect(struct addrinfo *aitop, struct pollfd *pfd, int nconn) +static void +client_init(struct addrinfo *aitop, int nconn, struct statctx *udp_sc) { - char tmp[128]; + struct statctx *sc; struct addrinfo *ai; + char tmp[128]; int i, r, sock; + sc = udp_sc; for (i = 0; i < nconn; i++) { for (sock = -1, ai = aitop; ai != NULL; ai = ai->ai_next) { saddr_ntop(ai->ai_addr, ai->ai_addrlen, tmp, sizeof(tmp)); - if (vflag && i == 0) + if (ptb->vflag && i == 0) fprintf(stderr, "Trying %s\n", tmp); if ((sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) == -1) { if (ai->ai_next == NULL) err(1, "socket"); - if (vflag) + if (ptb->vflag) warn("socket"); continue; } - if (rtableid && ai->ai_family == AF_INET) { + if (ptb->Vflag && ai->ai_family == AF_INET) { if (setsockopt(sock, IPPROTO_IP, SO_RTABLE, - &rtableid, sizeof(rtableid)) == -1) + &ptb->Vflag, sizeof(ptb->Vflag)) == -1) err(1, "setsockopt SO_RTABLE"); - } else if (rtableid) + } else if (ptb->Vflag) warnx("rtable only supported on AF_INET"); - if (Sflag) { + if (ptb->Sflag) { if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, - &Sflag, sizeof(Sflag)) == -1) + &ptb->Sflag, sizeof(ptb->Sflag)) == -1) warn("set TCP send buffer size"); } if (connect(sock, ai->ai_addr, ai->ai_addrlen) != 0) { if (ai->ai_next == NULL) err(1, "connect"); - if (vflag) + if (ptb->vflag) warn("connect"); close(sock); sock = -1; @@ -793,96 +842,31 @@ clientconnect(struct addrinfo *aitop, st } if (sock == -1) errx(1, "No host found"); - if ((r = fcntl(sock, F_GETFL, 0)) == -1) err(1, "fcntl(F_GETFL)"); r |= O_NONBLOCK; if (fcntl(sock, F_SETFL, r) == -1) err(1, "fcntl(F_SETFL, O_NONBLOCK)"); - - pfd[i].fd = sock; - pfd[i].events = POLLOUT; - } - freeaddrinfo(aitop); - - if (vflag && nconn > 1) - fprintf(stderr, "%u connections established\n", nconn); + /* Alloc and prepare stats */ + if (TCP_MODE) { + if ((sc = calloc(1, sizeof(*sc))) == NULL) + err(1, "calloc"); } - -static void __dead -clientloop(kvm_t *kvmh, u_long ktcbtab, struct addrinfo *aitop, int nconn) -{ - struct statctx *psc; - struct pollfd *pfd; - char *buf; - int i; - ssize_t n; - - if ((pfd = calloc(nconn, sizeof(*pfd))) == NULL) - err(1, "clientloop pfd calloc"); - if ((psc = calloc(nconn, sizeof(*psc))) == NULL) - err(1, "clientloop psc calloc"); - - clientconnect(aitop, pfd, nconn); - - for (i = 0; i < nconn; i++) { - stats_prepare(psc + i, pfd[i].fd, kvmh, ktcbtab); + sc->fd = sock; + stats_prepare(sc); + event_set(&sc->ev, sc->fd, EV_WRITE | EV_PERSIST, + client_handle_sc, sc); + event_add(&sc->ev, NULL); + TAILQ_INSERT_TAIL(&sc_queue, sc, entry); mainstats.nconns++; - } - - if ((buf = malloc(Bflag)) == NULL) - err(1, "malloc"); - arc4random_buf(buf, Bflag); - - print_header(); - set_timer(1); - - while (!done) { - if (proc_slice) { - process_slice(psc, nconn); - stats_cleanslice(); - proc_slice = 0; - } - if (poll(pfd, nconn, INFTIM) == -1) { - if (errno == EINTR) - continue; - err(1, "poll"); - } - for (i = 0; i < nconn; i++) { - if (pfd[i].revents & POLLOUT) { - if ((n = write(pfd[i].fd, buf, Bflag)) == -1) { - if (errno == EINTR || errno == EAGAIN) - continue; - err(1, "write"); - } - if (n == 0) { - warnx("Remote end closed connection"); - done = -1; + set_slice_timer(mainstats.nconns > 0); + if (UDP_MODE) break; } - if (vflag >= 3) - fprintf(stderr, "write: %zd bytes\n", - n); - stats_update(psc + i, n); - } - } - } - - if (done > 0) - warnx("Terminated by signal %d", done); - - free(buf); - exit(0); -} - -static void -drop_gid(void) -{ - gid_t gid; + freeaddrinfo(aitop); - gid = getgid(); - if (setresgid(gid, gid, gid) == -1) - err(1, "setresgid"); + if (ptb->vflag && nconn > 1) + fprintf(stderr, "%u connections established\n", nconn); } int @@ -890,23 +874,26 @@ main(int argc, char **argv) { extern int optind; extern char *optarg; - char kerr[_POSIX2_LINE_MAX], *tmp; struct addrinfo *aitop, hints; const char *errstr; - kvm_t *kvmh = NULL; struct rlimit rl; - int ch, herr; + int ch, herr, nconn; struct nlist nl[] = { { "_tcbtable" }, { "" } }; const char *host = NULL, *port = DEFAULT_PORT; - int nconn = 1; + struct event ev_sigint, ev_sigterm, ev_sighup; + struct statctx *udp_sc = NULL; - Bflag = DEFAULT_BUF; - Sflag = sflag = vflag = rtableid = 0; - kflag = NULL; - rflag = DEFAULT_STATS_INTERVAL; + /* Init world */ + ptb = &tcpbench; + ptb->dummybuf_len = 0; + ptb->Sflag = ptb->sflag = ptb->vflag = ptb->Vflag = 0; + ptb->kvmh = NULL; + ptb->kvars = NULL; + ptb->rflag = DEFAULT_STATS_INTERVAL; + nconn = 1; - while ((ch = getopt(argc, argv, "B:hlk:n:p:r:sS:vV:")) != -1) { + while ((ch = getopt(argc, argv, "B:hlk:n:p:r:sS:uvV:")) != -1) { switch (ch) { case 'l': list_kvars(); @@ -914,11 +901,11 @@ main(int argc, char **argv) case 'k': if ((tmp = strdup(optarg)) == NULL) errx(1, "strdup"); - kflag = check_prepare_kvars(tmp); + ptb->kvars = check_prepare_kvars(tmp); free(tmp); break; case 'r': - rflag = strtonum(optarg, 0, 60 * 60 * 24 * 1000, + ptb->rflag = strtonum(optarg, 0, 60 * 60 * 24 * 1000, &errstr); if (errstr != NULL) errx(1, "statistics interval is %s: %s", @@ -928,27 +915,27 @@ main(int argc, char **argv) port = optarg; break; case 's': - sflag = 1; + ptb->sflag = 1; break; case 'S': - Sflag = strtonum(optarg, 0, 1024*1024*1024, + ptb->Sflag = strtonum(optarg, 0, 1024*1024*1024, &errstr); if (errstr != NULL) errx(1, "receive space interval is %s: %s", errstr, optarg); break; case 'B': - Bflag = strtonum(optarg, 0, 1024*1024*1024, + ptb->dummybuf_len = strtonum(optarg, 0, 1024*1024*1024, &errstr); if (errstr != NULL) errx(1, "read/write buffer size is %s: %s", errstr, optarg); break; case 'v': - vflag++; + ptb->vflag++; break; case 'V': - rtableid = (unsigned int)strtonum(optarg, 0, + ptb->Vflag = (unsigned int)strtonum(optarg, 0, RT_TABLEID_MAX, &errstr); if (errstr) errx(1, "rtable value is %s: %s", @@ -960,6 +947,9 @@ main(int argc, char **argv) errx(1, "number of connections is %s: %s", errstr, optarg); break; + case 'u': + ptb->uflag = 1; + break; case 'h': default: usage(); @@ -968,15 +958,31 @@ main(int argc, char **argv) argv += optind; argc -= optind; - if (argc != (sflag ? 0 : 1)) + if ((argc != (ptb->sflag ? 0 : 1)) || + (UDP_MODE && (ptb->kvars || nconn != 1))) usage(); - if (!sflag) + if (!ptb->sflag) host = argv[0]; + /* + * Rationale, + * If TCP, use a big buffer with big reads/writes. + * If UDP, use a big buffer in server and a buffer the size of a + * ethernet packet. + */ + if (!ptb->dummybuf_len) { + if (ptb->sflag || TCP_MODE) + ptb->dummybuf_len = DEFAULT_BUF; + else + ptb->dummybuf_len = DEFAULT_UDP_PKT; + } bzero(&hints, sizeof(hints)); + if (UDP_MODE) + hints.ai_socktype = SOCK_DGRAM; + else hints.ai_socktype = SOCK_STREAM; - if (sflag) + if (ptb->sflag) hints.ai_flags = AI_PASSIVE; if ((herr = getaddrinfo(host, port, &hints, &aitop)) != 0) { if (herr == EAI_SYSTEM) @@ -984,23 +990,17 @@ main(int argc, char **argv) else errx(1, "getaddrinfo: %s", gai_strerror(herr)); } - - if (kflag) { - if ((kvmh = kvm_openfiles(NULL, NULL, NULL, + if (ptb->kvars) { + if ((ptb->kvmh = kvm_openfiles(NULL, NULL, NULL, O_RDONLY, kerr)) == NULL) errx(1, "kvm_open: %s", kerr); drop_gid(); - if (kvm_nlist(kvmh, nl) < 0 || nl[0].n_type == 0) + if (kvm_nlist(ptb->kvmh, nl) < 0 || nl[0].n_type == 0) errx(1, "kvm: no namelist"); + ptb->ktcbtab = nl[0].n_value; } else drop_gid(); - signal(SIGINT, exitsighand); - signal(SIGTERM, exitsighand); - signal(SIGHUP, exitsighand); - signal(SIGPIPE, SIG_IGN); - signal(SIGALRM, alarmhandler); - if (getrlimit(RLIMIT_NOFILE, &rl) == -1) err(1, "getrlimit"); if (rl.rlim_cur < MAX_FD) @@ -1010,10 +1010,46 @@ main(int argc, char **argv) if (getrlimit(RLIMIT_NOFILE, &rl) == -1) err(1, "getrlimit"); - if (sflag) - serverloop(kvmh, nl[0].n_value, aitop); + /* Init world */ + TAILQ_INIT(&sc_queue); + if ((ptb->dummybuf = malloc(ptb->dummybuf_len)) == NULL) + err(1, "malloc"); + arc4random_buf(ptb->dummybuf, ptb->dummybuf_len); + + if (UDP_MODE) { + if ((udp_sc = calloc(1, sizeof(*udp_sc))) == NULL) + err(1, "calloc"); + udp_sc->fd = -1; + stats_prepare(udp_sc); + } + + /* Setup libevent and signals */ + event_init(); + signal_set(&ev_sigterm, SIGTERM, signal_handler, NULL); + signal_set(&ev_sighup, SIGHUP, signal_handler, NULL); + signal_set(&ev_sigint, SIGINT, signal_handler, NULL); + signal_add(&ev_sigint, NULL); + signal_add(&ev_sigterm, NULL); + signal_add(&ev_sighup, NULL); + signal(SIGPIPE, SIG_IGN); + + if (TCP_MODE) + print_tcp_header(); + + if (UDP_MODE) + evtimer_set(&mainstats.timer, udp_process_slice, udp_sc); else - clientloop(kvmh, nl[0].n_value, aitop, nconn); + evtimer_set(&mainstats.timer, tcp_process_slice, NULL); - return 0; + if (ptb->sflag) { + (void)server_init(aitop, udp_sc); + if (setpgid(0, 0) == -1) + err(1, "setpgid"); + } else + client_init(aitop, nconn, udp_sc); + + /* libevent main loop*/ + event_dispatch(); + + return (0); }