So here is a better diff, plus the changes to the manpage and usage() which were lacking. When I get a ENOBUFS in UDP I simply return to event_dispatch(). I've also placed the global vars on a tcpbench structure just to group them together.
Been using this for a while now and it seems fine. Tested on i386 and sparc64. I had a happy surprise upgrading from a November snapshot today, my ultra 5 with vr(4) could do around 4000PPS@1472, now it makes around 8000PPS@1472, searching the cvs I found this: == revision 1.107 date: 2011/01/13 11:28:14; author: kettenis; state: Exp; lines: +6 -10 Get rid of "forever" loop in the interrupt handler such that we drop out of the interrupt handler if the "no rx buffer available" bit is set and no new mbufs are available to populate descriptors. While it doesn't make livelock mitigation work for everybody, it does resolve some lockup issues. ok sthen@ ---------------------------- revision 1.106 date: 2010/09/03 18:14:54; author: kettenis; state: Exp; lines: +5 -1 bus_dmamap_sync() freshly initialized Rx descriptors before flipping the bit that hands them over to the hardware. This prevents the hardware from seeing stale contents if the compiler decides to re-order stores or if the hardware does store-reordering. There are sme doubts whether the i386/amd64 bus_dmamap_sync() implementation will be able to convince future compilers that do even more insanely stupid optimizations from re-ordering stores. That will be addressed in a seperate patch. == I'll do some regression tests to check if those are the commits responsible for the improvement. I'm no English native as you might have noticed so my manpage writings should be double checked. Index: Makefile =================================================================== RCS file: /cvs/src/usr.bin/tcpbench/Makefile,v retrieving revision 1.3 diff -d -u -p -w -r1.3 Makefile --- Makefile 26 Jun 2008 07:05:56 -0000 1.3 +++ Makefile 17 Feb 2011 21:39:26 -0000 @@ -15,7 +15,7 @@ CDIAGFLAGS+= -Wshadow PROG=tcpbench -LDADD=-lkvm +LDADD=-lkvm -levent #BINGRP= kmem #BINMODE=2555 Index: tcpbench.1 =================================================================== RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.1,v retrieving revision 1.10 diff -d -u -p -w -r1.10 tcpbench.1 --- tcpbench.1 26 Oct 2010 20:26:37 -0000 1.10 +++ tcpbench.1 17 Feb 2011 21:39:26 -0000 @@ -19,12 +19,13 @@ .Os .Sh NAME .Nm tcpbench -.Nd TCP benchmarking and measurement tool +.Nd TCP and UDP benchmarking and measurement tool .Sh SYNOPSIS .Nm .Fl l .Nm .Op Fl v +.Op Fl u .Op Fl B Ar buf .Op Fl k Ar kvars .Op Fl n Ar connections @@ -37,6 +38,7 @@ .Bk -words .Fl s .Op Fl v +.Op Fl u .Op Fl B Ar buf .Op Fl k Ar kvars .Op Fl p Ar port @@ -58,13 +60,13 @@ The client must be invoked with the .Ar hostname of a listening server to connect to. .Pp -Once connected, the client will send TCP traffic as fast as possible to +Once connected, the client will send TCP or UDP traffic as fast as possible to the server. Both the client and server will periodically display throughput statistics along with any kernel variables the user has selected to sample (using the .Fl k -option). +option, which is only available in TCP mode). A list of available kernel variables may be obtained using the .Fl l option. @@ -74,30 +76,41 @@ The options are as follows: .It Fl B Ar buf Specify the size of the internal read/write buffer used by .Nm . -The default is 262144 bytes. +The default is 262144 bytes for TCP client/server and UDP server. In UDP client, +this may be used to specify the packet size on the test stream. .It Fl k Ar kvars Specify one or more kernel variables to monitor; multiple variables must be -separated with commas. +separated with commas. This option is only valid in TCP mode. The default is not to monitor any variables. Using this option requires read access to .Pa /dev/kmem . .It Fl l List the name of kernel variables available for monitoring and exit. .It Fl n Ar connections -Use the given number of TCP connections (default: 1). +Use the given number of TCP connections (default: 1). UDP is connectionless so +this option isn't valid. .It Fl p Ar port -Specify the port used for the TCP test stream (default: 12345). +Specify the port used for the TCP or UDP test stream (default: 12345). .It Fl r Ar rate Specify the statistics reporting rate in milliseconds (default: 1000). .It Fl S Ar space -Set the size of the socket buffer used for the TCP test stream. +Set the size of the socket buffer used for the TCP or UDP test stream. On the client this option will resize the send buffer; on the server it will resize the receive buffer. .It Fl s Place .Nm in server mode, where it will listen on all interfaces for incoming -connections. +connections. Defaults to TCP if +.Fl u +not specified. +.It Fl u +Use UDP instead of TCP, this must be specified on both client and +server. Transmitted packets per second (TX PPS) will be accounted on the client +side, while received packets per second (RX PPS) whill be accounted on the +server side. UDP has no Protocol Control Block (PCB) so the +.Fl k +flags don't apply. .It Fl V Ar rtable Set the routing table to be used. The default is 0. @@ -118,3 +131,6 @@ The .Nm program was written by .An Damien Miller Aq d...@openbsd.org . + +UDP mode and libevent port by +.An Christiano F. Haesbaert Aq haesba...@haesbaert.org . Index: tcpbench.c =================================================================== RCS file: /cvs/src/usr.bin/tcpbench/tcpbench.c,v retrieving revision 1.19 diff -d -u -p -w -r1.19 tcpbench.c --- tcpbench.c 19 Oct 2010 10:03:23 -0000 1.19 +++ tcpbench.c 17 Feb 2011 21:39:26 -0000 @@ -1,5 +1,6 @@ /* * Copyright (c) 2008 Damien Miller <d...@mindrot.org> + * Copyright (c) 2010 Christiano F. Haesbaert <haesba...@haesbaert.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above @@ -19,6 +20,7 @@ #include <sys/socket.h> #include <sys/socketvar.h> #include <sys/resource.h> +#include <sys/queue.h> #include <net/route.h> @@ -39,6 +41,7 @@ #include <stdio.h> #include <string.h> #include <errno.h> +#include <event.h> #include <netdb.h> #include <signal.h> #include <err.h> @@ -50,29 +53,67 @@ #define DEFAULT_PORT "12345" #define DEFAULT_STATS_INTERVAL 1000 /* ms */ -#define DEFAULT_BUF 256 * 1024 +#define DEFAULT_BUF (256 * 1024) +#define DEFAULT_UDP_PKT (1500 - 28) /* TODO don't hardcode this */ +#define TCP_MODE !ptb->uflag +#define UDP_MODE ptb->uflag #define MAX_FD 1024 -sig_atomic_t done = 0; -sig_atomic_t proc_slice = 0; - -static u_int rtableid; -static char **kflag; -static size_t Bflag; -static int Sflag; -static int rflag; -static int sflag; -static int vflag; +/* Our tcpbench globals */ +struct { + u_int Vflag; /* rtableid */ + int Sflag; /* Socket buffer size (tcp mode) */ + u_int rflag; /* Report rate (ms) */ + int sflag; /* True if server */ + int vflag; /* Verbose */ + int uflag; /* UDP mode */ + kvm_t *kvmh; /* Kvm handler */ + char **kvars; /* Kvm enabled vars */ + u_long ktcbtab; /* Ktcb */ + char *dummybuf; /* IO buffer */ + size_t dummybuf_len; /* IO buffer len */ +} tcpbench, *ptb; -/* stats for a single connection */ +/* stats for a single tcp connection, udp uses only one */ struct statctx { + TAILQ_ENTRY(statctx) entry; struct timeval t_start, t_last; unsigned long long bytes; - u_long tcbaddr; - char **kvars; - kvm_t *kh; + int fd; + char *buf; + size_t buflen; + struct event ev; + /* TCP only */ + u_long tcp_tcbaddr; + /* UDP only */ + u_long udp_slice_pkts; }; +static void signal_handler(int, short, void *); +static void saddr_ntop(const struct sockaddr *, socklen_t, char *, size_t); +static void drop_gid(void); +static void set_slice_timer(int); +static void print_tcp_header(void); +static void kget(u_long, void *, size_t); +static u_long kfind_tcb(int); +static void kupdate_stats(u_long, struct inpcb *, struct tcpcb *, + struct socket *); +static void list_kvars(void); +static void check_kvar(const char *); +static char ** check_prepare_kvars(char *); +static void stats_prepare(struct statctx *); +static void tcp_stats_display(unsigned long long, long double, float, + struct statctx *, struct inpcb *, struct tcpcb *, struct socket *); +static void tcp_process_slice(int, short, void *); +static void tcp_server_handle_sc(int, short, void *); +static void tcp_server_accept(int, short, void *); +static nfds_t server_init(struct addrinfo *, struct statctx *); +static void client_handle_sc(int, short, void *); +static void client_init(struct addrinfo *, int, struct statctx *); +static int clock_gettime_tv(clockid_t, struct timeval *); +static void udp_server_handle_sc(int, short, void *); +static void udp_process_slice(int, short, void *); + /* * We account the mainstats here, that is the stats * for all connections, all variables starting with slice @@ -82,12 +123,12 @@ struct statctx { */ static struct { unsigned long long slice_bytes; /* bytes for last slice */ - struct timeval t_start; /* when we started counting */ long double peak_mbps; /* peak mbps so far */ int nconns; /* connected clients */ + struct event timer; /* process timer */ } mainstats; -/* When adding variables, also add to stats_display() */ +/* When adding variables, also add to tcp_stats_display() */ static const char *allowed_kvars[] = { "inpcb.inp_flags", "sockb.so_rcv.sb_cc", @@ -124,32 +165,40 @@ static const char *allowed_kvars[] = { NULL }; -static void -exitsighand(int signo) -{ - done = signo; -} - -static void -alarmhandler(int signo) -{ - proc_slice = 1; - signal(signo, alarmhandler); -} +TAILQ_HEAD(, statctx) sc_queue; static void __dead usage(void) { fprintf(stderr, "usage: tcpbench -l\n" - " tcpbench [-v] [-B buf] [-k kvars] [-n connections] [-p port]\n" - " [-r rate] [-S space] [-V rtable] hostname\n" - " tcpbench -s [-v] [-B buf] [-k kvars] [-p port]\n" - " [-r rate] [-S space] [-V rtable]\n"); + " tcpbench [-v] [-u] [-B buf] [-k kvars] [-n connections]\n" + " [-p port] [-q] [-r rate] [-S space] [-V rtable] hostname\n" + " tcpbench -s [-v] [-u] [-B buf] [-k kvars] [-p port]\n" + " [-q] [-r rate] [-S space] [-V rtable]\n"); exit(1); } static void +signal_handler(int sig, short event, void *bula) +{ + /* + * signal handler rules don't apply, libevent decouples for us + */ + switch (sig) { + case SIGINT: + case SIGTERM: + case SIGHUP: + warnx("Terminated by signal %d", sig); + exit(0); + break; /* NOTREACHED */ + default: + errx(1, "unexpected signal %d", sig); + break; /* NOTREACHED */ + } +} + +static void saddr_ntop(const struct sockaddr *addr, socklen_t alen, char *buf, size_t len) { char hbuf[NI_MAXHOST], pbuf[NI_MAXSERV]; @@ -166,47 +215,72 @@ saddr_ntop(const struct sockaddr *addr, } static void -set_timer(int toggle) +drop_gid(void) { - struct itimerval itv; + gid_t gid; - if (rflag <= 0) + gid = getgid(); + if (setresgid(gid, gid, gid) == -1) + err(1, "setresgid"); +} + +static void +set_slice_timer(int on) +{ + struct timeval tv; + + if (ptb->rflag == 0) return; - if (toggle) { - itv.it_interval.tv_sec = rflag / 1000; - itv.it_interval.tv_usec = (rflag % 1000) * 1000; - itv.it_value = itv.it_interval; + if (on) { + if (evtimer_pending(&mainstats.timer, NULL)) + return; + timerclear(&tv); + /* XXX Is there a better way to do this ? */ + tv.tv_sec = ptb->rflag / 1000; + tv.tv_usec = (ptb->rflag % 1000) * 1000; + + evtimer_add(&mainstats.timer, &tv); + } else { + if (evtimer_pending(&mainstats.timer, NULL)) + evtimer_del(&mainstats.timer); + } } - else - bzero(&itv, sizeof(itv)); - setitimer(ITIMER_REAL, &itv, NULL); +static int +clock_gettime_tv(clockid_t clock_id, struct timeval *tv) +{ + struct timespec ts; + + if (clock_gettime(clock_id, &ts) == -1) + return (-1); + + TIMESPEC_TO_TIMEVAL(tv, &ts); + + return (0); } static void -print_header(void) +print_tcp_header(void) { char **kv; - printf("%12s %14s %12s %8s ", "elapsed_ms", "bytes", "mbps", "bwidth"); - - for (kv = kflag; kflag != NULL && *kv != NULL; kv++) - printf("%s%s", kv != kflag ? "," : "", *kv); + for (kv = ptb->kvars; ptb->kvars != NULL && *kv != NULL; kv++) + printf("%s%s", kv != ptb->kvars ? "," : "", *kv); printf("\n"); } static void -kget(kvm_t *kh, u_long addr, void *buf, size_t size) +kget(u_long addr, void *buf, size_t size) { - if (kvm_read(kh, addr, buf, size) != (ssize_t)size) - errx(1, "kvm_read: %s", kvm_geterr(kh)); + if (kvm_read(ptb->kvmh, addr, buf, size) != (ssize_t)size) + errx(1, "kvm_read: %s", kvm_geterr(ptb->kvmh)); } static u_long -kfind_tcb(kvm_t *kh, u_long ktcbtab, int sock) +kfind_tcb(int sock) { struct inpcbtable tcbtab; struct inpcb *head, *next, *prev; @@ -230,27 +304,27 @@ kfind_tcb(kvm_t *kh, u_long ktcbtab, int errx(1, "%s: me.ss_family != them.ss_family", __func__); if (me.ss_family != AF_INET && me.ss_family != AF_INET6) errx(1, "%s: unknown socket family", __func__); - if (vflag >= 2) { + if (ptb->vflag >= 2) { saddr_ntop((struct sockaddr *)&me, me.ss_len, tmp1, sizeof(tmp1)); saddr_ntop((struct sockaddr *)&them, them.ss_len, tmp2, sizeof(tmp2)); fprintf(stderr, "Our socket local %s remote %s\n", tmp1, tmp2); } - if (vflag >= 2) - fprintf(stderr, "Using PCB table at %lu\n", ktcbtab); + if (ptb->vflag >= 2) + fprintf(stderr, "Using PCB table at %lu\n", ptb->ktcbtab); retry: - kget(kh, ktcbtab, &tcbtab, sizeof(tcbtab)); + kget(ptb->ktcbtab, &tcbtab, sizeof(tcbtab)); prev = head = (struct inpcb *)&CIRCLEQ_FIRST( - &((struct inpcbtable *)ktcbtab)->inpt_queue); + &((struct inpcbtable *)ptb->ktcbtab)->inpt_queue); next = CIRCLEQ_FIRST(&tcbtab.inpt_queue); - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "PCB head at %p\n", head); while (next != head) { - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Checking PCB %p\n", next); - kget(kh, (u_long)next, &inpcb, sizeof(inpcb)); + kget((u_long)next, &inpcb, sizeof(inpcb)); if (CIRCLEQ_PREV(&inpcb, inp_queue) != prev) { if (nretry--) { warnx("pcb prev pointer insane"); @@ -265,11 +339,11 @@ retry: if (me.ss_family == AF_INET) { if ((inpcb.inp_flags & INP_IPV6) != 0) { - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Skip: INP_IPV6"); continue; } - if (vflag >= 2) { + if (ptb->vflag >= 2) { inet_ntop(AF_INET, &inpcb.inp_laddr, tmp1, sizeof(tmp1)); inet_ntop(AF_INET, &inpcb.inp_faddr, @@ -292,7 +366,7 @@ retry: } else { if ((inpcb.inp_flags & INP_IPV6) == 0) continue; - if (vflag >= 2) { + if (ptb->vflag >= 2) { inet_ntop(AF_INET6, &inpcb.inp_laddr6, tmp1, sizeof(tmp1)); inet_ntop(AF_INET6, &inpcb.inp_faddr6, @@ -313,27 +387,27 @@ retry: in6->sin6_port != inpcb.inp_fport) continue; } - kget(kh, (u_long)inpcb.inp_ppcb, &tcpcb, sizeof(tcpcb)); + kget((u_long)inpcb.inp_ppcb, &tcpcb, sizeof(tcpcb)); if (tcpcb.t_state != TCPS_ESTABLISHED) { - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Not established\n"); continue; } - if (vflag >= 2) + if (ptb->vflag >= 2) fprintf(stderr, "Found PCB at %p\n", prev); - return (u_long)prev; + return ((u_long)prev); } errx(1, "No matching PCB found"); } static void -kupdate_stats(kvm_t *kh, u_long tcbaddr, - struct inpcb *inpcb, struct tcpcb *tcpcb, struct socket *sockb) +kupdate_stats(u_long tcbaddr, struct inpcb *inpcb, + struct tcpcb *tcpcb, struct socket *sockb) { - kget(kh, tcbaddr, inpcb, sizeof(*inpcb)); - kget(kh, (u_long)inpcb->inp_ppcb, tcpcb, sizeof(*tcpcb)); - kget(kh, (u_long)inpcb->inp_socket, sockb, sizeof(*sockb)); + kget(tcbaddr, inpcb, sizeof(*inpcb)); + kget((u_long)inpcb->inp_ppcb, tcpcb, sizeof(*tcpcb)); + kget((u_long)inpcb->inp_socket, sockb, sizeof(*sockb)); } static void @@ -371,39 +445,24 @@ check_prepare_kvars(char *list) errx(1, "strdup"); ret[n] = NULL; } - return ret; + return (ret); } static void -stats_prepare(struct statctx *sc, int fd, kvm_t *kh, u_long ktcbtab) +stats_prepare(struct statctx *sc) { - if (rflag <= 0) - return; - sc->kh = kh; - sc->kvars = kflag; - if (kflag) - sc->tcbaddr = kfind_tcb(kh, ktcbtab, fd); - if (gettimeofday(&sc->t_start, NULL) == -1) - err(1, "gettimeofday"); + sc->buf = ptb->dummybuf; + sc->buflen = ptb->dummybuf_len; + if (ptb->kvars) + sc->tcp_tcbaddr = kfind_tcb(sc->fd); + if (clock_gettime_tv(CLOCK_MONOTONIC, &sc->t_start) == -1) + err(1, "clock_gettime_tv"); sc->t_last = sc->t_start; - sc->bytes = 0; -} - -static void -stats_update(struct statctx *sc, ssize_t n) -{ - sc->bytes += n; - mainstats.slice_bytes += n; -} -static void -stats_cleanslice(void) -{ - mainstats.slice_bytes = 0; } static void -stats_display(unsigned long long total_elapsed, long double mbps, +tcp_stats_display(unsigned long long total_elapsed, long double mbps, float bwperc, struct statctx *sc, struct inpcb *inpcb, struct tcpcb *tcpcb, struct socket *sockb) { @@ -412,14 +471,14 @@ stats_display(unsigned long long total_e printf("%12llu %14llu %12.3Lf %7.2f%% ", total_elapsed, sc->bytes, mbps, bwperc); - if (sc->kvars != NULL) { - kupdate_stats(sc->kh, sc->tcbaddr, inpcb, tcpcb, + if (ptb->kvars != NULL) { + kupdate_stats(sc->tcp_tcbaddr, inpcb, tcpcb, sockb); - for (j = 0; sc->kvars[j] != NULL; j++) { + for (j = 0; ptb->kvars[j] != NULL; j++) { #define S(a) #a #define P(b, v, f) \ - if (strcmp(sc->kvars[j], S(b.v)) == 0) { \ + if (strcmp(ptb->kvars[j], S(b.v)) == 0) { \ printf("%s"f, j > 0 ? "," : "", b->v); \ continue; \ } @@ -463,30 +522,24 @@ stats_display(unsigned long long total_e } static void -mainstats_display(long double slice_mbps, long double avg_mbps) -{ - printf("Conn: %3d Mbps: %12.3Lf Peak Mbps: %12.3Lf Avg Mbps: %12.3Lf\n", - mainstats.nconns, slice_mbps, mainstats.peak_mbps, avg_mbps); -} - -static void -process_slice(struct statctx *sc, size_t nsc) +tcp_process_slice(int fd, short event, void *bula) { unsigned long long total_elapsed, since_last; long double mbps, slice_mbps = 0; float bwperc; - nfds_t i; + struct statctx *sc; struct timeval t_cur, t_diff; struct inpcb inpcb; struct tcpcb tcpcb; struct socket sockb; - for (i = 0; i < nsc; i++, sc++) { - if (gettimeofday(&t_cur, NULL) == -1) - err(1, "gettimeofday"); - if (sc->kvars != NULL) /* process kernel stats */ - kupdate_stats(sc->kh, sc->tcbaddr, &inpcb, &tcpcb, + TAILQ_FOREACH(sc, &sc_queue, entry) { + if (clock_gettime_tv(CLOCK_MONOTONIC, &t_cur) == -1) + err(1, "clock_gettime_tv"); + if (ptb->kvars != NULL) /* process kernel stats */ + kupdate_stats(sc->tcp_tcbaddr, &inpcb, &tcpcb, &sockb); + timersub(&t_cur, &sc->t_start, &t_diff); total_elapsed = t_diff.tv_sec * 1000 + t_diff.tv_usec / 1000; timersub(&t_cur, &sc->t_last, &t_diff); @@ -495,80 +548,181 @@ process_slice(struct statctx *sc, size_t mbps = (sc->bytes * 8) / (since_last * 1000.0); slice_mbps += mbps; - stats_display(total_elapsed, mbps, bwperc, sc, + tcp_stats_display(total_elapsed, mbps, bwperc, sc, &inpcb, &tcpcb, &sockb); sc->t_last = t_cur; sc->bytes = 0; - } /* process stats for this slice */ if (slice_mbps > mainstats.peak_mbps) mainstats.peak_mbps = slice_mbps; - mainstats_display(slice_mbps, slice_mbps / mainstats.nconns); + printf("Conn: %3d Mbps: %12.3Lf Peak Mbps: %12.3Lf Avg Mbps: %12.3Lf\n", + mainstats.nconns, slice_mbps, mainstats.peak_mbps, + slice_mbps / mainstats.nconns); + mainstats.slice_bytes = 0; + + set_slice_timer(mainstats.nconns > 0); } -static int -handle_connection(struct statctx *sc, int fd, char *buf, size_t buflen) +static void +udp_process_slice(int fd, short event, void *v_sc) +{ + struct statctx *sc = v_sc; + unsigned long long total_elapsed, since_last; + long double slice_mbps, pps; + struct timeval t_cur, t_diff; + + if (clock_gettime_tv(CLOCK_MONOTONIC, &t_cur) == -1) + err(1, "clock_gettime_tv"); + /* Calculate pps */ + timersub(&t_cur, &sc->t_start, &t_diff); + total_elapsed = t_diff.tv_sec * 1000 + t_diff.tv_usec / 1000; + timersub(&t_cur, &sc->t_last, &t_diff); + since_last = t_diff.tv_sec * 1000 + t_diff.tv_usec / 1000; + slice_mbps = (sc->bytes * 8) / (since_last * 1000.0); + pps = (sc->udp_slice_pkts * 1000) / since_last; + if (slice_mbps > mainstats.peak_mbps) + mainstats.peak_mbps = slice_mbps; + printf("Elapsed: %11llu Mbps: %11.3Lf Peak Mbps: %11.3Lf %s PPS: %10.3Lf\n", + total_elapsed, slice_mbps, mainstats.peak_mbps, ptb->sflag ? "Rx" : "Tx", + pps); + + /* Clean up this slice time */ + sc->t_last = t_cur; + sc->bytes = 0; + sc->udp_slice_pkts = 0; + set_slice_timer(1); +} + +static void +udp_server_handle_sc(int fd, short event, void *v_sc) { ssize_t n; + struct statctx *sc = v_sc; again: - n = read(fd, buf, buflen); - if (n == -1) { + n = read(fd, ptb->dummybuf, ptb->dummybuf_len); + if (n == 0) + return; + else if (n == -1) { if (errno == EINTR) goto again; else if (errno == EWOULDBLOCK) - return 0; + return; warn("fd %d read error", fd); + return; + } - return -1; + if (ptb->vflag >= 3) + fprintf(stderr, "read: %zd bytes\n", n); + /* If this was our first packet, start slice timer */ + if (mainstats.peak_mbps == 0) + set_slice_timer(1); + /* Account packet */ + sc->udp_slice_pkts++; + sc->bytes += n; } - else if (n == 0) { - if (vflag) - fprintf(stderr, "%8d closed by remote end\n", fd); - close(fd); - return -1; + +static void +tcp_server_handle_sc(int fd, short event, void *v_sc) +{ + struct statctx *sc = v_sc; + ssize_t n; + +again: + n = read(sc->fd, sc->buf, sc->buflen); + if (n == -1) { + if (errno == EINTR) + goto again; + else if (errno == EWOULDBLOCK) + return; + warn("fd %d read error", sc->fd); + return; + } else if (n == 0) { + if (ptb->vflag) + fprintf(stderr, "%8d closed by remote end\n", sc->fd); + close(sc->fd); + TAILQ_REMOVE(&sc_queue, sc, entry); + free(sc); + mainstats.nconns--; + set_slice_timer(mainstats.nconns > 0); + return; } - if (vflag >= 3) + if (ptb->vflag >= 3) fprintf(stderr, "read: %zd bytes\n", n); + sc->bytes += n; + mainstats.slice_bytes += n; +} - stats_update(sc, n); - return 0; +static void +tcp_server_accept(int fd, short event, void *bula) +{ + int sock, r; + struct statctx *sc; + struct sockaddr_storage ss; + socklen_t sslen; + char tmp[128]; + + sslen = sizeof(ss); +again: + if ((sock = accept(fd, (struct sockaddr *)&ss, + &sslen)) == -1) { + if (errno == EINTR) + goto again; + warn("accept"); + return; + } + saddr_ntop((struct sockaddr *)&ss, sslen, + tmp, sizeof(tmp)); + if ((r = fcntl(sock, F_GETFL, 0)) == -1) + err(1, "fcntl(F_GETFL)"); + r |= O_NONBLOCK; + if (fcntl(sock, F_SETFL, r) == -1) + err(1, "fcntl(F_SETFL, O_NONBLOCK)"); + /* Alloc client structure and register reading callback */ + if ((sc = calloc(1, sizeof(*sc))) == NULL) + err(1, "calloc"); + sc->fd = sock; + stats_prepare(sc); + event_set(&sc->ev, sc->fd, EV_READ | EV_PERSIST, + tcp_server_handle_sc, sc); + event_add(&sc->ev, NULL); + TAILQ_INSERT_TAIL(&sc_queue, sc, entry); + mainstats.nconns++; + set_slice_timer(mainstats.nconns > 0); + if (ptb->vflag) + warnx("Accepted connection from %s, fd = %d\n", tmp, sc->fd); } static nfds_t -serverbind(struct pollfd *pfd, nfds_t max_nfds, struct addrinfo *aitop) +server_init(struct addrinfo *aitop, struct statctx *udp_sc) { char tmp[128]; int sock, on = 1; struct addrinfo *ai; + struct event *ev; nfds_t lnfds; lnfds = 0; for (ai = aitop; ai != NULL; ai = ai->ai_next) { - if (lnfds == max_nfds) { - fprintf(stderr, - "maximum number of listening fds reached\n"); - break; - } saddr_ntop(ai->ai_addr, ai->ai_addrlen, tmp, sizeof(tmp)); - if (vflag) + if (ptb->vflag && TCP_MODE) fprintf(stderr, "Try to listen on %s\n", tmp); if ((sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) == -1) { if (ai->ai_next == NULL) err(1, "socket"); - if (vflag) + if (ptb->vflag) warn("socket"); continue; } - if (rtableid && ai->ai_family == AF_INET) { + if (ptb->Vflag && ai->ai_family == AF_INET) { if (setsockopt(sock, IPPROTO_IP, SO_RTABLE, - &rtableid, sizeof(rtableid)) == -1) + &ptb->Vflag, sizeof(ptb->Vflag)) == -1) err(1, "setsockopt SO_RTABLE"); - } else if (rtableid) + } else if (ptb->Vflag) warnx("rtable only supported on AF_INET"); if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) == -1) @@ -576,214 +730,110 @@ serverbind(struct pollfd *pfd, nfds_t ma if (bind(sock, ai->ai_addr, ai->ai_addrlen) != 0) { if (ai->ai_next == NULL) err(1, "bind"); - if (vflag) + if (ptb->vflag) warn("bind"); close(sock); continue; } - if (Sflag) { + if (ptb->Sflag) { if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, - &Sflag, sizeof(Sflag)) == -1) - warn("set TCP receive buffer size"); + &ptb->Sflag, sizeof(ptb->Sflag)) == -1) + warn("set receive buffer size"); } + if (TCP_MODE) if (listen(sock, 64) == -1) { if (ai->ai_next == NULL) err(1, "listen"); - if (vflag) + if (ptb->vflag) warn("listen"); close(sock); continue; } - if (vflag >= 3) - fprintf(stderr, "listening on fd %d\n", sock); + if ((ev = calloc(1, sizeof(*ev))) == NULL) + err(1, "calloc"); + if (UDP_MODE) + event_set(ev, sock, EV_READ | EV_PERSIST, + udp_server_handle_sc, udp_sc); + else + event_set(ev, sock, EV_READ | EV_PERSIST, + tcp_server_accept, NULL); + event_add(ev, NULL); + if (ptb->vflag >= 3) + fprintf(stderr, "bound to fd %d\n", sock); lnfds++; - pfd[lnfds - 1].fd = sock; - pfd[lnfds - 1].events = POLLIN; - } freeaddrinfo(aitop); if (lnfds == 0) errx(1, "No working listen addresses found"); - return lnfds; + return (lnfds); } static void -set_listening(struct pollfd *pfd, nfds_t lfds, int toggle) { - int i; - - for (i = 0; i < (int)lfds; i++) { - if (toggle) - pfd[i].events = POLLIN; - else - pfd[i].events = 0; - } - -} -static void __dead -serverloop(kvm_t *kvmh, u_long ktcbtab, struct addrinfo *aitop) +client_handle_sc(int fd, short event, void *v_sc) { - socklen_t sslen; - struct pollfd *pfd; - char tmp[128], *buf; - struct statctx *psc; - struct sockaddr_storage ss; - nfds_t i, nfds, lfds; - size_t nalloc; - int r, sock, client_id; - - sslen = sizeof(ss); - nalloc = 128; - if ((pfd = calloc(sizeof(*pfd), nalloc)) == NULL) - err(1, "calloc"); - if ((psc = calloc(sizeof(*psc), nalloc)) == NULL) - err(1, "calloc"); - if ((buf = malloc(Bflag)) == NULL) - err(1, "malloc"); - lfds = nfds = serverbind(pfd, nalloc - 1, aitop); - if (vflag >= 3) - fprintf(stderr, "listening on %d fds\n", lfds); - if (setpgid(0, 0) == -1) - err(1, "setpgid"); - - print_header(); - - client_id = 0; - while (!done) { - if (proc_slice) { - process_slice(psc + lfds, nfds - lfds); - stats_cleanslice(); - proc_slice = 0; - } - if (vflag >= 3) - fprintf(stderr, "mainstats.nconns = %u\n", - mainstats.nconns); - if ((r = poll(pfd, nfds, INFTIM)) == -1) { - if (errno == EINTR) - continue; - warn("poll"); - break; - } + struct statctx *sc = v_sc; + ssize_t n; - if (vflag >= 3) - fprintf(stderr, "poll: %d\n", r); - for (i = 0 ; r > 0 && i < nfds; i++) { - if ((pfd[i].revents & POLLIN) == 0) - continue; - if (pfd[i].fd == -1) - errx(1, "pfd insane"); - r--; - if (vflag >= 3) - fprintf(stderr, "fd %d active i = %d\n", - pfd[i].fd, i); - /* new connection */ - if (i < lfds) { - if ((sock = accept(pfd[i].fd, - (struct sockaddr *)&ss, - &sslen)) == -1) { - if (errno == EINTR) - continue; - else if (errno == EMFILE || - errno == ENFILE) - set_listening(pfd, lfds, 0); - warn("accept"); - continue; - } - if ((r = fcntl(sock, F_GETFL, 0)) == -1) - err(1, "fcntl(F_GETFL)"); - r |= O_NONBLOCK; - if (fcntl(sock, F_SETFL, r) == -1) - err(1, "fcntl(F_SETFL, O_NONBLOCK)"); - saddr_ntop((struct sockaddr *)&ss, sslen, - tmp, sizeof(tmp)); - if (vflag) - fprintf(stderr, - "Accepted connection %d from " - "%s, fd = %d\n", client_id++, tmp, - sock); - /* alloc more space if we're full */ - if (nfds == nalloc) { - nalloc *= 2; - if ((pfd = realloc(pfd, - sizeof(*pfd) * nalloc)) == NULL) - err(1, "realloc"); - if ((psc = realloc(psc, - sizeof(*psc) * nalloc)) == NULL) - err(1, "realloc"); - } - pfd[nfds].fd = sock; - pfd[nfds].events = POLLIN; - stats_prepare(&psc[nfds], sock, kvmh, ktcbtab); - nfds++; - if (!mainstats.nconns++) - set_timer(1); - continue; + /* XXX: Can this block if we lack entropy ? */ + arc4random_buf(ptb->dummybuf, ptb->dummybuf_len); +again: + if ((n = write(sc->fd, sc->buf, sc->buflen)) == -1) { + if (errno == EINTR || errno == EAGAIN || (UDP_MODE && ENOBUFS)) + goto again; + err(1, "write"); } - /* event in fd */ - if (vflag >= 3) - fprintf(stderr, - "fd %d active", pfd[i].fd); - while (handle_connection(&psc[i], pfd[i].fd, - buf, Bflag) == -1) { - pfd[i] = pfd[nfds - 1]; - pfd[nfds - 1].fd = -1; - psc[i] = psc[nfds - 1]; - mainstats.nconns--; - nfds--; - /* stop display if no clients */ - if (!mainstats.nconns) { - proc_slice = 1; - set_timer(0); + if (TCP_MODE && n == 0) { + warnx("Remote end closed connection"); + exit(1); } - /* if we were full */ - set_listening(pfd, lfds, 1); + if (ptb->vflag >= 3) + warnx("write: %zd bytes\n", n); + sc->bytes += n; + mainstats.slice_bytes += n; + if (UDP_MODE) + sc->udp_slice_pkts++; - /* is there an event pending on the last fd? */ - if (pfd[i].fd == -1 || - (pfd[i].revents & POLLIN) == 0) - break; - } - } - } - exit(1); } -void -clientconnect(struct addrinfo *aitop, struct pollfd *pfd, int nconn) +static void +client_init(struct addrinfo *aitop, int nconn, struct statctx *udp_sc) { - char tmp[128]; + struct statctx *sc; struct addrinfo *ai; + char tmp[128]; int i, r, sock; + sc = udp_sc; for (i = 0; i < nconn; i++) { for (sock = -1, ai = aitop; ai != NULL; ai = ai->ai_next) { saddr_ntop(ai->ai_addr, ai->ai_addrlen, tmp, sizeof(tmp)); - if (vflag && i == 0) + if (ptb->vflag && i == 0) fprintf(stderr, "Trying %s\n", tmp); if ((sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol)) == -1) { if (ai->ai_next == NULL) err(1, "socket"); - if (vflag) + if (ptb->vflag) warn("socket"); continue; } - if (rtableid && ai->ai_family == AF_INET) { + if (ptb->Vflag && ai->ai_family == AF_INET) { if (setsockopt(sock, IPPROTO_IP, SO_RTABLE, - &rtableid, sizeof(rtableid)) == -1) + &ptb->Vflag, sizeof(ptb->Vflag)) == -1) err(1, "setsockopt SO_RTABLE"); - } else if (rtableid) + } else if (ptb->Vflag) warnx("rtable only supported on AF_INET"); - if (Sflag) { + if (ptb->Sflag) { if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, - &Sflag, sizeof(Sflag)) == -1) + &ptb->Sflag, sizeof(ptb->Sflag)) == -1) warn("set TCP send buffer size"); } if (connect(sock, ai->ai_addr, ai->ai_addrlen) != 0) { if (ai->ai_next == NULL) err(1, "connect"); - if (vflag) + if (ptb->vflag) warn("connect"); close(sock); sock = -1; @@ -793,96 +843,31 @@ clientconnect(struct addrinfo *aitop, st } if (sock == -1) errx(1, "No host found"); - if ((r = fcntl(sock, F_GETFL, 0)) == -1) err(1, "fcntl(F_GETFL)"); r |= O_NONBLOCK; if (fcntl(sock, F_SETFL, r) == -1) err(1, "fcntl(F_SETFL, O_NONBLOCK)"); - - pfd[i].fd = sock; - pfd[i].events = POLLOUT; - } - freeaddrinfo(aitop); - - if (vflag && nconn > 1) - fprintf(stderr, "%u connections established\n", nconn); + /* Alloc and prepare stats */ + if (TCP_MODE) { + if ((sc = calloc(1, sizeof(*sc))) == NULL) + err(1, "calloc"); } - -static void __dead -clientloop(kvm_t *kvmh, u_long ktcbtab, struct addrinfo *aitop, int nconn) -{ - struct statctx *psc; - struct pollfd *pfd; - char *buf; - int i; - ssize_t n; - - if ((pfd = calloc(nconn, sizeof(*pfd))) == NULL) - err(1, "clientloop pfd calloc"); - if ((psc = calloc(nconn, sizeof(*psc))) == NULL) - err(1, "clientloop psc calloc"); - - clientconnect(aitop, pfd, nconn); - - for (i = 0; i < nconn; i++) { - stats_prepare(psc + i, pfd[i].fd, kvmh, ktcbtab); + sc->fd = sock; + stats_prepare(sc); + event_set(&sc->ev, sc->fd, EV_WRITE | EV_PERSIST, + client_handle_sc, sc); + event_add(&sc->ev, NULL); + TAILQ_INSERT_TAIL(&sc_queue, sc, entry); mainstats.nconns++; - } - - if ((buf = malloc(Bflag)) == NULL) - err(1, "malloc"); - arc4random_buf(buf, Bflag); - - print_header(); - set_timer(1); - - while (!done) { - if (proc_slice) { - process_slice(psc, nconn); - stats_cleanslice(); - proc_slice = 0; - } - if (poll(pfd, nconn, INFTIM) == -1) { - if (errno == EINTR) - continue; - err(1, "poll"); - } - for (i = 0; i < nconn; i++) { - if (pfd[i].revents & POLLOUT) { - if ((n = write(pfd[i].fd, buf, Bflag)) == -1) { - if (errno == EINTR || errno == EAGAIN) - continue; - err(1, "write"); - } - if (n == 0) { - warnx("Remote end closed connection"); - done = -1; + set_slice_timer(mainstats.nconns > 0); + if (UDP_MODE) break; } - if (vflag >= 3) - fprintf(stderr, "write: %zd bytes\n", - n); - stats_update(psc + i, n); - } - } - } - - if (done > 0) - warnx("Terminated by signal %d", done); - - free(buf); - exit(0); -} - -static void -drop_gid(void) -{ - gid_t gid; + freeaddrinfo(aitop); - gid = getgid(); - if (setresgid(gid, gid, gid) == -1) - err(1, "setresgid"); + if (ptb->vflag && nconn > 1) + fprintf(stderr, "%u connections established\n", nconn); } int @@ -890,23 +875,26 @@ main(int argc, char **argv) { extern int optind; extern char *optarg; - char kerr[_POSIX2_LINE_MAX], *tmp; struct addrinfo *aitop, hints; const char *errstr; - kvm_t *kvmh = NULL; struct rlimit rl; - int ch, herr; + int ch, herr, nconn; struct nlist nl[] = { { "_tcbtable" }, { "" } }; const char *host = NULL, *port = DEFAULT_PORT; - int nconn = 1; + struct event ev_sigint, ev_sigterm, ev_sighup; + struct statctx *udp_sc = NULL; - Bflag = DEFAULT_BUF; - Sflag = sflag = vflag = rtableid = 0; - kflag = NULL; - rflag = DEFAULT_STATS_INTERVAL; + /* Init world */ + ptb = &tcpbench; + ptb->dummybuf_len = 0; + ptb->Sflag = ptb->sflag = ptb->vflag = ptb->Vflag = 0; + ptb->kvmh = NULL; + ptb->kvars = NULL; + ptb->rflag = DEFAULT_STATS_INTERVAL; + nconn = 1; - while ((ch = getopt(argc, argv, "B:hlk:n:p:r:sS:vV:")) != -1) { + while ((ch = getopt(argc, argv, "B:hlk:n:p:r:sS:uvV:")) != -1) { switch (ch) { case 'l': list_kvars(); @@ -914,11 +902,11 @@ main(int argc, char **argv) case 'k': if ((tmp = strdup(optarg)) == NULL) errx(1, "strdup"); - kflag = check_prepare_kvars(tmp); + ptb->kvars = check_prepare_kvars(tmp); free(tmp); break; case 'r': - rflag = strtonum(optarg, 0, 60 * 60 * 24 * 1000, + ptb->rflag = strtonum(optarg, 0, 60 * 60 * 24 * 1000, &errstr); if (errstr != NULL) errx(1, "statistics interval is %s: %s", @@ -928,27 +916,27 @@ main(int argc, char **argv) port = optarg; break; case 's': - sflag = 1; + ptb->sflag = 1; break; case 'S': - Sflag = strtonum(optarg, 0, 1024*1024*1024, + ptb->Sflag = strtonum(optarg, 0, 1024*1024*1024, &errstr); if (errstr != NULL) errx(1, "receive space interval is %s: %s", errstr, optarg); break; case 'B': - Bflag = strtonum(optarg, 0, 1024*1024*1024, + ptb->dummybuf_len = strtonum(optarg, 0, 1024*1024*1024, &errstr); if (errstr != NULL) errx(1, "read/write buffer size is %s: %s", errstr, optarg); break; case 'v': - vflag++; + ptb->vflag++; break; case 'V': - rtableid = (unsigned int)strtonum(optarg, 0, + ptb->Vflag = (unsigned int)strtonum(optarg, 0, RT_TABLEID_MAX, &errstr); if (errstr) errx(1, "rtable value is %s: %s", @@ -960,6 +948,9 @@ main(int argc, char **argv) errx(1, "number of connections is %s: %s", errstr, optarg); break; + case 'u': + ptb->uflag = 1; + break; case 'h': default: usage(); @@ -968,15 +959,30 @@ main(int argc, char **argv) argv += optind; argc -= optind; - if (argc != (sflag ? 0 : 1)) + if ((argc != (ptb->sflag ? 0 : 1)) || + (UDP_MODE && (ptb->kvars || nconn != 1))) usage(); - if (!sflag) + if (!ptb->sflag) host = argv[0]; - + /* + * Rationale, + * If TCP, use a big buffer with big reads/writes. + * If UDP, use a big buffer in server and a buffer the size of a + * ethernet packet. + */ + if (!ptb->dummybuf_len) { + if (ptb->sflag || TCP_MODE) + ptb->dummybuf_len = DEFAULT_BUF; + else + ptb->dummybuf_len = DEFAULT_UDP_PKT; + } bzero(&hints, sizeof(hints)); + if (UDP_MODE) + hints.ai_socktype = SOCK_DGRAM; + else hints.ai_socktype = SOCK_STREAM; - if (sflag) + if (ptb->sflag) hints.ai_flags = AI_PASSIVE; if ((herr = getaddrinfo(host, port, &hints, &aitop)) != 0) { if (herr == EAI_SYSTEM) @@ -984,23 +990,17 @@ main(int argc, char **argv) else errx(1, "getaddrinfo: %s", gai_strerror(herr)); } - - if (kflag) { - if ((kvmh = kvm_openfiles(NULL, NULL, NULL, + if (ptb->kvars) { + if ((ptb->kvmh = kvm_openfiles(NULL, NULL, NULL, O_RDONLY, kerr)) == NULL) errx(1, "kvm_open: %s", kerr); drop_gid(); - if (kvm_nlist(kvmh, nl) < 0 || nl[0].n_type == 0) + if (kvm_nlist(ptb->kvmh, nl) < 0 || nl[0].n_type == 0) errx(1, "kvm: no namelist"); + ptb->ktcbtab = nl[0].n_value; } else drop_gid(); - signal(SIGINT, exitsighand); - signal(SIGTERM, exitsighand); - signal(SIGHUP, exitsighand); - signal(SIGPIPE, SIG_IGN); - signal(SIGALRM, alarmhandler); - if (getrlimit(RLIMIT_NOFILE, &rl) == -1) err(1, "getrlimit"); if (rl.rlim_cur < MAX_FD) @@ -1010,10 +1010,44 @@ main(int argc, char **argv) if (getrlimit(RLIMIT_NOFILE, &rl) == -1) err(1, "getrlimit"); - if (sflag) - serverloop(kvmh, nl[0].n_value, aitop); + /* Init world */ + TAILQ_INIT(&sc_queue); + if ((ptb->dummybuf = malloc(ptb->dummybuf_len)) == NULL) + err(1, "malloc"); + if (UDP_MODE) { + if ((udp_sc = calloc(1, sizeof(*udp_sc))) == NULL) + err(1, "calloc"); + udp_sc->fd = -1; + stats_prepare(udp_sc); + } + + /* Setup libevent and signals */ + event_init(); + signal_set(&ev_sigterm, SIGTERM, signal_handler, NULL); + signal_set(&ev_sighup, SIGHUP, signal_handler, NULL); + signal_set(&ev_sigint, SIGINT, signal_handler, NULL); + signal_add(&ev_sigint, NULL); + signal_add(&ev_sigterm, NULL); + signal_add(&ev_sighup, NULL); + signal(SIGPIPE, SIG_IGN); + + if (TCP_MODE) + print_tcp_header(); + + if (UDP_MODE) + evtimer_set(&mainstats.timer, udp_process_slice, udp_sc); else - clientloop(kvmh, nl[0].n_value, aitop, nconn); + evtimer_set(&mainstats.timer, tcp_process_slice, NULL); - return 0; + if (ptb->sflag) { + (void)server_init(aitop, udp_sc); + if (setpgid(0, 0) == -1) + err(1, "setpgid"); + } else + client_init(aitop, nconn, udp_sc); + + /* libevent main loop*/ + event_dispatch(); + + return (0); }