Re: [net] protecting interfaces from races between control and data ?
- Original Message - > i am slightly unclear of what mechanisms we use to prevent races > between interface being reconfigured (up/down/multicast setting, etc, > all causing reinitialization of the rx and tx rings) and > > i) packets from the host stack being sent out; > ii) interrupts from the network card being processed. > > I think in the old times IFF_DRV_RUNNING was used for this purpose, > but now it is not enough. > Acquiring the "core lock" in the NIC does not seem enough, either, > because newer drivers, especially multiqueue ones, have per-queue > rx and tx locks. > What I've done in my drivers is: * Lock the core mutex * Clear IFF_DRV_RUNNING * Lock/unlock each queue's lock The various Rx/Tx queue functions check for IFF_DRV_RUNNING after (re)acquiring their queue lock. See at vtnet_stop_rendezvous() at [1] for an example. > Does anyone know if there is a generic mechanism, or each driver > reimplements its own way ? > We desperately need a saner ifnet/driver interface. I think andre@ had some previous work in this area (and additional plans as well?). IMO, there's a lot to like on what DragonflyBSD has done in this area. [1] - http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451&view=markup > thanks > luigi > ___ > freebsd-curr...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: [net] protecting interfaces from races between control and data ?
- Original Message - > On Mon, Aug 5, 2013 at 8:19 PM, Adrian Chadd wrote: > > > No, brian said two things: > > > > * the flag, protected by the core lock > > * per-queue flags > > > > i see no mentions on per-queue flags on his email. > This is the relevant part > Right, I just use the IFF_DRV_RUNNING flag. I think Adrian meant 'per-queue locks' here? > > > What I've done in my drivers is: > * Lock the core mutex > * Clear IFF_DRV_RUNNING > * Lock/unlock each queue's lock > > The various Rx/Tx queue functions check for IFF_DRV_RUNNING after > (re)acquiring their queue lock. See at vtnet_stop_rendezvous() at > [1] for an example. > > [1] > http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451&view=markup > > - > > > > > > > > > > -adrian > > > > > > -- > -+--- > Prof. Luigi RIZZO, ri...@iet.unipi.it . Dip. di Ing. dell'Informazione > http://www.iet.unipi.it/~luigi/. Universita` di Pisa > TEL +39-050-2211611 . via Diotisalvi 2 > Mobile +39-338-6809875 . 56122 PISA (Italy) > -+--- > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Network stack changes
- Original Message - > On 28.08.2013 20:30, Alexander V. Chernikov wrote: > > Hello list! > > Hello Alexander, > > you sent quite a few things in the same email. I'll try to respond > as much as I can right now. Later you should split it up to have > more in-depth discussions on the individual parts. > > > > We already have some capabilities like VLANHWFILTER/VLANHWTAG, we can add > > some more. We even have > > per-driver hooks to program HW filtering. > > We could. Though for vlan it looks like it would be easier to remove the > hardware vlan tag stripping and insertion. It only adds complexity in all > drivers for no gain. > In the shorter term, can we remove the requirement for the parent interface to support IFCAP_VLAN_HWTAGGING in order to do checksum offloading on the VLAN interface (see vlan_capabilities())? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: dhclient sucks cpu usage...
Hi, - Original Message - > So, after finding out that nc has a stupidly small buffer size (2k > even though there is space for 16k), I was still not getting as good > as performance using nc between machines, so I decided to generate some > flame graphs to try to identify issues... (Thanks to who included a > full set of modules, including dtraceall on memstick!) > > So, the first one is: > https://www.funkthat.com/~jmg/em.stack.svg > > As I was browsing around, the em_handle_que was consuming quite a bit > of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows > me that the taskqueue for em was consuming about 50% cpu... Also pretty > high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is > consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump > or anything, but I think dhclient uses bpf to be able to inject packets > and listen in on them, so I kill off dhclient, and instantly, the taskqueue > thread for em drops down to 40% CPU... (transfer rate only marginally > improves, if it does) > > I decide to run another flame graph w/o dhclient running: > https://www.funkthat.com/~jmg/em.stack.nodhclient.svg > > and now _rxeof drops from 17.22% to 11.94%, pretty significant... > > So, if you care about performance, don't run dhclient... > Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. diff --git a/sys/net/bpf.c b/sys/net/bpf.c index cb3ed27..9751986 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct mbuf *m) return (BPF_TSTAMP_EXTERN); } } +#if 0 if (quality == BPF_TSTAMP_NORMAL) binuptime(bt); else +#endif getbinuptime(bt); return (quality); > -- > John-Mark GurneyVoice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > ___ > freebsd-curr...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: dhclient sucks cpu usage...
- Original Message - > On 10.06.2014 07:03, Bryan Venteicher wrote: > > Hi, > > > > - Original Message - > >> So, after finding out that nc has a stupidly small buffer size (2k > >> even though there is space for 16k), I was still not getting as good > >> as performance using nc between machines, so I decided to generate some > >> flame graphs to try to identify issues... (Thanks to who included a > >> full set of modules, including dtraceall on memstick!) > >> > >> So, the first one is: > >> https://www.funkthat.com/~jmg/em.stack.svg > >> > >> As I was browsing around, the em_handle_que was consuming quite a bit > >> of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows > >> me that the taskqueue for em was consuming about 50% cpu... Also pretty > >> high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is > >> consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump > >> or anything, but I think dhclient uses bpf to be able to inject packets > >> and listen in on them, so I kill off dhclient, and instantly, the > >> taskqueue > >> thread for em drops down to 40% CPU... (transfer rate only marginally > >> improves, if it does) > >> > >> I decide to run another flame graph w/o dhclient running: > >> https://www.funkthat.com/~jmg/em.stack.nodhclient.svg > >> > >> and now _rxeof drops from 17.22% to 11.94%, pretty significant... > >> > >> So, if you care about performance, don't run dhclient... > >> > > Yes, I've noticed the same issue. It can absolutely kill performance > > in a VM guest. It is much more pronounced on only some of my systems, > > and I hadn't tracked it down yet. I wonder if this is fallout from > > the callout work, or if there was some bpf change. > > > > I've been using the kludgey workaround patch below. > Hm, pretty interesting. > dhclient should setup proper filter (and it looks like it does so: > 13:10 [0] m@ptichko s netstat -B >Pid Netif Flags Recv Drop Match Sblen Hblen Command > 1224em0 -ifs--l 41225922 011 0 0 dhclient > ) > see "match" count. > And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for > each consumer on interface). > It should not introduce significant performance penalties. > It will be a bit before I'm able to capture that. Here's a Flamegraph from earlier in the year showing an absurd amount of time spent in bpf_mtap(): http://people.freebsd.org/~bryanv/vtnet/vtnet-bpf-10.svg > > > > diff --git a/sys/net/bpf.c b/sys/net/bpf.c > > index cb3ed27..9751986 100644 > > --- a/sys/net/bpf.c > > +++ b/sys/net/bpf.c > > @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct > > mbuf *m) > > return (BPF_TSTAMP_EXTERN); > > } > > } > > +#if 0 > > if (quality == BPF_TSTAMP_NORMAL) > > binuptime(bt); > > else > > +#endif > bpf_getttime() is called IFF packet filter matches some traffic. > Can you show your "netstat -B" output ? > > getbinuptime(bt); > > > > return (quality); > > > > > >> -- > >>John-Mark GurneyVoice: +1 415 225 5579 > >> > >> "All that I will do, has been done, All that I have, has not." > >> ___ > >> freebsd-curr...@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-current > >> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > >> > > ___ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > > > > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Add netbw option to systat
Awhile back, DragonlyFlyBSD added a netbw option to systat that I've ported to FreeBSD and found handy at various times: netbw Display aggregate and per-connection TCP receive and transmit rates. Only active TCP connections are shown. Leading to output such as: tcp acceptsconnects rcv 1.192G snd 15.77K rexmit 192.168.10.80:22 192.168.10.20:23103 rcvsnd 415.7 [ NTSX ] 192.168.10.80:22 192.168.10.20:46560 rcv 19.80M snd 14.47K [ NTSX ] 192.168.10.80:22 192.168.10.20:60699 rcvsnd 886.3 [ NTSX ] 192.168.10.81:5201192.168.10.51:60844 rcv 293.2M snd[R TSX ] 192.168.10.81:5201192.168.10.51:60845 rcv 293.5M snd[R TSX ] 192.168.10.81:5201192.168.10.51:60846 rcv 293.2M snd[R TSX ] 192.168.10.81:5201192.168.10.51:60847 rcv 292.9M snd[R TSX ] It uses the sequences number from the 'struct tcpcb' to derive the rates, which is usually good but certainly not perfect (i.e., don't set the interval too long). I'd like to commit this if anybody else thinks they'd find it useful. http://people.freebsd.org/~bryanv/patches/systat-netbw.patch From d0a4f282f3e36eb53cb0a50a64aa4597e52b7d42 Mon Sep 17 00:00:00 2001 From: Bryan Venteicher Date: Tue, 1 Jul 2014 00:51:29 -0500 Subject: [PATCH] Add 'netbw' display to systat --- usr.bin/systat/Makefile | 2 +- usr.bin/systat/cmdtab.c | 3 + usr.bin/systat/extern.h | 7 + usr.bin/systat/netbw.c | 476 usr.bin/systat/systat.1 | 4 + 5 files changed, 491 insertions(+), 1 deletion(-) create mode 100644 usr.bin/systat/netbw.c diff --git a/usr.bin/systat/Makefile b/usr.bin/systat/Makefile index 1bb2da0..a17e4dd 100644 --- a/usr.bin/systat/Makefile +++ b/usr.bin/systat/Makefile @@ -7,7 +7,7 @@ PROG= systat SRCS= cmds.c cmdtab.c devs.c fetch.c iostat.c keyboard.c main.c \ netcmds.c netstat.c pigs.c swap.c icmp.c \ mode.c ip.c tcp.c \ - vmstat.c convtbl.c ifcmds.c ifstat.c + vmstat.c convtbl.c ifcmds.c ifstat.c netbw.c .if ${MK_INET6_SUPPORT} != "no" SRCS+= icmp6.c ip6.c diff --git a/usr.bin/systat/cmdtab.c b/usr.bin/systat/cmdtab.c index c9c9e7d..0e225ec 100644 --- a/usr.bin/systat/cmdtab.c +++ b/usr.bin/systat/cmdtab.c @@ -55,6 +55,9 @@ struct cmdtab cmdtab[] = { { "netstat", shownetstat, fetchnetstat, labelnetstat, initnetstat, opennetstat, closenetstat, cmdnetstat, 0, CF_LOADAV }, + { "netbw", shownetbw, fetchnetbw, labelnetbw, + initnetbw, opennetbw, closenetbw, NULL, + 0, 0 }, { "icmp", showicmp, fetchicmp, labelicmp, initicmp, openicmp, closeicmp, cmdmode, reseticmp, CF_LOADAV }, diff --git a/usr.bin/systat/extern.h b/usr.bin/systat/extern.h index 17fffc1..38d4084 100644 --- a/usr.bin/systat/extern.h +++ b/usr.bin/systat/extern.h @@ -76,6 +76,7 @@ void closeiostat(WINDOW *); void closeip(WINDOW *); void closeip6(WINDOW *); void closekre(WINDOW *); +void closenetbw(WINDOW *); void closenetstat(WINDOW *); void closepigs(WINDOW *); void closeswap(WINDOW *); @@ -83,6 +84,7 @@ void closetcp(WINDOW *); int cmdifstat(const char *, const char *); int cmdiostat(const char *, const char *); int cmdkre(const char *, const char *); +int cmdnetbw(const char *, const char *); int cmdnetstat(const char *, const char *); struct cmdtab *lookup(const char *); void command(const char *); @@ -98,6 +100,7 @@ void fetchip(void); void fetchip6(void); void fetchiostat(void); void fetchkre(void); +void fetchnetbw(void); void fetchnetstat(void); void fetchpigs(void); void fetchswap(void); @@ -111,6 +114,7 @@ int initip(void); int initip6(void); int initiostat(void); int initkre(void); +int initnetbw(void); int initnetstat(void); int initpigs(void); int initswap(void); @@ -124,6 +128,7 @@ void labelip(void); void labelip6(void); void labeliostat(void); void labelkre(void); +void labelnetbw(void); void labelnetstat(void); void labelpigs(void); void labels(void); @@ -139,6 +144,7 @@ WINDOW *openip(void); WINDOW *openip6(void); WINDOW *openiostat(void); WINDOW *openkre(void); +WINDOW *opennetbw(void); WINDOW *opennetstat(void); WINDOW *openpigs(void); WINDOW *openswap(void); @@ -156,6 +162,7 @@ void showip(void); void showip6(void); void showiostat(void); void showkre(void); +void shownetbw(void); void shownetstat(void); void showpigs(void); void showswap(void); diff --git a/usr.bin/systat/netbw.c b/usr.bin/systat/netbw.c new file mode 100644 index 000..785af5f --- /dev/null +++ b/usr.bin/systat/netbw.c @@ -0,0 +1,476 @@ +/* + * Copyright (c) 2013 The DragonFly Project. All rights reserved. + * + * This code is derived from software contributed to The DragonFly Project + * by Matthew Dillon + * + * Redistribution and use in source and binary forms, with or without + * modificat
Re: Add netbw option to systat
On Wed, Jul 2, 2014 at 7:54 PM, hiren panchasara wrote: > On Wed, Jul 2, 2014 at 4:50 PM, Bryan Venteicher > wrote: > > Awhile back, DragonlyFlyBSD added a netbw option to systat that I've > ported > > to FreeBSD and found handy at various times: > > > >netbw Display aggregate and per-connection TCP receive and > transmit > > rates. Only active TCP connections are shown. > > > > Leading to output such as: > > > > tcp acceptsconnects rcv 1.192G snd 15.77K rexmit > > > > 192.168.10.80:22 192.168.10.20:23103 rcvsnd 415.7 [ > NTSX ] > > 192.168.10.80:22 192.168.10.20:46560 rcv 19.80M snd 14.47K [ > NTSX ] > > 192.168.10.80:22 192.168.10.20:60699 rcvsnd 886.3 [ > NTSX ] > > 192.168.10.81:5201192.168.10.51:60844 rcv 293.2M snd[R > TSX ] > > 192.168.10.81:5201192.168.10.51:60845 rcv 293.5M snd[R > TSX ] > > 192.168.10.81:5201192.168.10.51:60846 rcv 293.2M snd[R > TSX ] > > 192.168.10.81:5201192.168.10.51:60847 rcv 292.9M snd[R > TSX ] > > > > It uses the sequences number from the 'struct tcpcb' to derive the rates, > > which is usually good but certainly not perfect (i.e., don't set the > > interval too long). > > > > I'd like to commit this if anybody else thinks they'd find it useful. > > > > http://people.freebsd.org/~bryanv/patches/systat-netbw.patch > > I like the idea. > > A few things about the patch: > 1) You may want to remove the code hidden behind "#if 0" at 2 places. > That's inherited as is from the DragonflyBSD code, and I was trying to keep the diff relativity small with upstream. I'll remove it the hidden code. > 2) I am not entirely clear on why/if we need the last column with > flags but if we keep it (for compatibility of any other reason), It > would be nice to have those flags explained in the manpage: > > + mvwprintw(wnd, LINES-2, 0, > + "Rate/sec, " > + "R=rxpend T=txpend N=nodelay T=tstmp " > + "S=sack X=winscale F=fastrec"); > Yes, I'll document them. > 3) I feel that the header line for o/p (specially 'tcp accepts and > connects' terminology) can be improved but I do not have a better > suggestion :-) > > It looks okay me otherwise and thanks for your work. > > cheers, > Hiren > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Add netbw option to systat
On Thu, Jul 10, 2014 at 11:50 PM, Bruce Evans wrote: > On Thu, 10 Jul 2014, John Baldwin wrote: > > On Wednesday, July 02, 2014 8:54:41 pm hiren panchasara wrote: >> >>> On Wed, Jul 2, 2014 at 4:50 PM, Bryan Venteicher >>> wrote: >>> >>>> I'd like to commit this if anybody else thinks they'd find it useful. >>>> >>>> http://people.freebsd.org/~bryanv/patches/systat-netbw.patch >>>> >>> ... >> >> 4) Should numtok() just be humanize_number? Or rather, would it simplify >> the code to use humanize_number? (It might not, but if it does, I >> think that would be preferable.) >> > > No, nothing should use dehumanize(scientificize)_number(). It is a > badly designed API that doesn't even support unsigned numbers or > intmax_t. But numtok() takes a floating point arg. (It is always > used for rates that really do need floating point or perhaps a > quotient of integers. Except under #if 0, it is called on intger > args. It doesn't support this case since it always geenrates a > %5.Nf format with N > 0 (except for numbers < 0.001 it prints nothing, > perhaps because %f format doesn't work well for this case). > > systat already hads the better functions putint(), putfloat() and > putlongdouble(). Unfortunately, these are static in vmstat.c. They > should be used throughout systat to get a consistent format. > numtok() takes a double arg and could be handled at some cost in > efficiency by putlongdouble(). (putlongdouble() only exists due > to design errors in libdevstat. Old parts of systat -v use floats > and floats are more than adequate, but libdevstat uses long doubles. > Probably downcasting the long doubles to float would work in > systat -v, but putfloat() was cloned to create putlongdouble() > instead.) > > putlongdouble() prints the output, but numtok() formats the output for > printing and returns it in a static buffer with MAXINDEXES = 8 > generations so that this method is not too fragile. In general, direct > printing is easier to use, but here the output has be printed at a > certain place in the window. putlongdouble() takes coordinates for > each call and prints using move(), addch() and addstr(). It has more > control over padding characters than printf() can provide or > humanize_number() can dream of, and uses this to padd with '*' in some > cases (mainly for numbers that cannot be formatted to fit in the desired > space. Callers must pass some format info, especially the field width, > in each call. numtok() basically hard-codes a field width of 6 and a > format of %5.N%c where %c is the suffix. This format wastes 1 character > for the suffix when the suffix is ' '. putlongdouble() and even > dehumanize_number() avoid this wastage. This is more critical when the > field with is < 6. N > 0 and the decimal point also waste a lot of > space. putlongdouble() has complications to print 100% as 100% and > not 100.0% (the latter is 2 characters wider, and especially wasteful). > dehumanize_number() doesn't have any of the complications for the > decimal point since it doesn't support floating point. > > systat -v (vmstat.c) is mostly written in KNF, but the patch has mounds > of style bugs, e.g.: > > I was trying to keep the diff small with upstream. I'll make a cleanup sweep and incorporate your comments. Thanks. +void > +shownetbw(void) > +{ > + double delta_time; > + struct mytcpcb *elm, *telm; > + int row; > + > + delta_time = (double)(tv_curr.tv_sec - tv_last.tv_sec) - 1.0 + > +(tv_curr.tv_usec + 100 - tv_last.tv_usec) / 1e6; > + if (delta_time < 0.1) > + return; > + > + mvwprintw(wnd, 0, 0, > + "tcp accepts %s connects %s " > + " rcv %s snd %s rexmit %s", > + numtok(DELTARATE(tcps_accepts)), > + numtok(DELTARATE(tcps_connects) - > DELTARATE(tcps_accepts)), > + numtok(DELTARATE(tcps_rcvbyte)), > + numtok(DELTARATE(tcps_sndbyte)), > + numtok(DELTARATE(tcps_sndrexmitbyte))); > + ... > > In KNF, the continuation indent is 4. This helps minimize line splitting, > and lines are not split unnecessarily. Fixing this and other style bugs > and pessimizations gives: > > @ struct mytcpcb *elm, *telm; > @ double delta_time; > @ int row; > @ > @ delta_time = tv_curr.tv_sec - tv_last.tv_sec + > @ (tv_curr.tv_usec - tv_last.tv_usec) * 1e-6; > @ if (delta_time < 0.1) > @ return
Re: To SMP or not to SMP
- Original Message - > From: "John Baldwin" > To: freebsd-net@freebsd.org > Cc: "Barney Cordoba" , "Peter Jeremy" > > Sent: Friday, January 11, 2013 9:39:17 AM > Subject: Re: To SMP or not to SMP > > On Thursday, January 10, 2013 02:36:59 PM Peter Jeremy wrote: > > On 2013-Jan-07 18:25:58 -0800, Barney Cordoba > > > wrote: > > >I have a situation where I have to run 9.1 on an old single core > > >box. Does anyone have a handle on whether it's better to build a > > >non > > >SMP kernel or to just use a standard SMP build with just the one > > >core? > > > > Another input for this decision is kern/173322. Currently on x86, > > atomic operations within kernel modules are implemented using calls > > to code in the kernel, which do or don't use lock prefixes > > depending > > on whethur the kernel was built as SMP. My proposed change changes > > kernel modules to inline atomic operations but always include lock > > prefixes (effectively reverting r4). I'm appreciate anyone who > > feels like testing the impact of this change. > > Presumably a locked atomic op is cheaper than a function call then? > The > current setup assumes the opposite. > > I think we should actually do this for atomics in modules on x86: > > 1) If a module is built standalone, it should do whichever is > cheaper: >a function call or always use "LOCK". > > 2) If a module is built as part of the kernel build, it should use > inlined >atomics that match what the kernel does. Thus, modules built with >a >non-SMP kernel would use inlined atomic ops that do not use LOCK. > We >have a way to detect this now (some HAVE_FOO #define added in the >past >few years) that we didn't back when this bit of atomic.h was >written. > It would be nice to have the LOCK variants available even on UP kernels in non-hackish way. For VirtIO, we need to handle an guest UP kernel running on an SMP host. Whether this is an #define that forces the SMP atomics to be inlined, or if they're exposed with an _smp suffix. VirtIO currently uses mb() to enforce ordering. I have a patch to change to use atomic(9), but can only do so when VirtIO is included in the an SMP kernel (among other constraints - must have 16-bit atomic operations too). (FreeBSD's VirtIO is x86 only for now - but that will be changing soon; I haven't looked if other arch's atomic(9) behave differently for UP/SMP.) > -- > John Baldwin > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
VMware vmxnet2 driver
Hi, During the previous months, I've been porting OpenBSD's vmxnet driver (if_vic) to FreeBSD [1]. It has reach a doneness that I'd like to draw attention to it for those not subscribed to svn-projects. Most of the original OpenBSD driver - the attach, init, and Tx/Rx paths - have been rewritten. I added support for vmxnet2 - IPv4 TSO and checksum offloading. Code for LRO was added too, but I cannot figure out how to get it enabled on the hypervisor. Unfortunately, the driver tends to be no faster than the emulated em device, which certainly is not the desired outcome for a paravirtualized device. Nothing in the code jumps out as an obvious performance killer. The original OpenBSD driver suffers from the same lower performance, so it was not introduced during the port. I casually suspect the vmxnet code in VMware has not gotten much attention since vmxnet3 was introduced. The performance issue is gating this from being merged into HEAD, but I don't have the spare cycles at the moment to really investigate this; and I need to spend time to get VirtIO in better shape. My ultimate goal is to have a BSD licensed vmxnet3 driver included in FreeBSD to remove the need to use the poor one that's apart of the VMware tools. There is work in progress to get the necessary documentation to make that happen, but there is no firm date yet (likely later this year, hopefully in time for 10.0). Any testing or performance data is welcome. For bulk TCP transfers, if_vic will tend to be faster than em (~1/2 a magnitude) due to TSO, but I don't think that warrants merging into HEAD yet. Bryan [1] - http://svnweb.freebsd.org/base/projects/vmxnet/sys/dev/vmware/vmxnet/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: To SMP or not to SMP
- Original Message - > From: "John Baldwin" > To: freebsd-net@freebsd.org > Cc: "Konstantin Belousov" , "Bryan Venteicher" > , "Peter Jeremy" > > Sent: Monday, January 14, 2013 3:57:58 PM > Subject: Re: To SMP or not to SMP > > On Monday, January 14, 2013 4:07:56 pm Konstantin Belousov wrote: > > On Mon, Jan 14, 2013 at 03:07:50PM -0500, John Baldwin wrote: > > > On Sunday, January 13, 2013 1:15:13 am Bryan Venteicher wrote: > > > > > > > > - Original Message - > > > > > From: "John Baldwin" > > > > > To: freebsd-net@freebsd.org > > > > > Cc: "Barney Cordoba" , "Peter > > > > > Jeremy" > > > > > > Sent: Friday, January 11, 2013 9:39:17 AM > > > > > Subject: Re: To SMP or not to SMP > > > > > > > > > > On Thursday, January 10, 2013 02:36:59 PM Peter Jeremy wrote: > > > > > > On 2013-Jan-07 18:25:58 -0800, Barney Cordoba > > > > > > > > > > > wrote: > > > > > > >I have a situation where I have to run 9.1 on an old > > > > > > >single core > > > > > > >box. Does anyone have a handle on whether it's better to > > > > > > >build a > > > > > > >non > > > > > > >SMP kernel or to just use a standard SMP build with just > > > > > > >the one > > > > > > >core? > > > > > > > > > > > > Another input for this decision is kern/173322. Currently > > > > > > on x86, > > > > > > atomic operations within kernel modules are implemented > > > > > > using calls > > > > > > to code in the kernel, which do or don't use lock prefixes > > > > > > depending > > > > > > on whethur the kernel was built as SMP. My proposed change > > > > > > changes > > > > > > kernel modules to inline atomic operations but always > > > > > > include lock > > > > > > prefixes (effectively reverting r4). I'm appreciate > > > > > > anyone who > > > > > > feels like testing the impact of this change. > > > > > > > > > > Presumably a locked atomic op is cheaper than a function call > > > > > then? > > > > > The > > > > > current setup assumes the opposite. > > > > > > > > > > I think we should actually do this for atomics in modules on > > > > > x86: > > > > > > > > > > 1) If a module is built standalone, it should do whichever is > > > > > cheaper: > > > > >a function call or always use "LOCK". > > > > > > > > > > 2) If a module is built as part of the kernel build, it > > > > > should use > inlined > > > > >atomics that match what the kernel does. Thus, modules > > > > >built with > a > > > > >non-SMP kernel would use inlined atomic ops that do not > > > > >use LOCK. > We > > > > >have a way to detect this now (some HAVE_FOO #define added > > > > >in the > past > > > > >few years) that we didn't back when this bit of atomic.h > > > > >was > > > > >written. > > > > > > > > > > > > > It would be nice to have the LOCK variants available even on UP > > > > kernels in non-hackish way. For VirtIO, we need to handle an > > > > guest > > > > UP kernel running on an SMP host. Whether this is an #define > > > > that > > > > forces the SMP atomics to be inlined, or if they're exposed > > > > with > > > > an _smp suffix. > > Could you please, clarify why does UP kernel needs it ? > > Shouldn't the hypervisor context switching provide neccessary > > serialization > > anyway ? > > I thought this, too, but in the case of virtio you are presumably > sychronizing with other threads in the hypervisor itself which might > be running concurrently on another physical CPU. > Yes, that is the case to be concerned about. Although, thinking about this a bit more, in VirtIO (at least the current spec), all the shared fields are updated by either the host or guest, not both, so a UP kernel can get by without the LOCK, correct? > -- > John Baldwin > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: To SMP or not to SMP
- Original Message - > From: "Konstantin Belousov" > To: "Bryan Venteicher" > Cc: "John Baldwin" , "Peter Jeremy" , > freebsd-net@freebsd.org > Sent: Tuesday, January 15, 2013 4:42:16 AM > Subject: Re: To SMP or not to SMP > > On Mon, Jan 14, 2013 at 04:12:09PM -0600, Bryan Venteicher wrote: > > > > > > - Original Message - > > > From: "John Baldwin" > > > To: freebsd-net@freebsd.org > > > Cc: "Konstantin Belousov" , "Bryan > > > Venteicher" , "Peter Jeremy" > > > > > > Sent: Monday, January 14, 2013 3:57:58 PM > > > Subject: Re: To SMP or not to SMP > > > > > > On Monday, January 14, 2013 4:07:56 pm Konstantin Belousov wrote: > > > > On Mon, Jan 14, 2013 at 03:07:50PM -0500, John Baldwin wrote: > > > > > On Sunday, January 13, 2013 1:15:13 am Bryan Venteicher > > > > > wrote: > > > > > > > > > > > > - Original Message - > > > > > > > From: "John Baldwin" > > > > > > > To: freebsd-net@freebsd.org > > > > > > > Cc: "Barney Cordoba" , "Peter > > > > > > > Jeremy" > > > > > > > > > > Sent: Friday, January 11, 2013 9:39:17 AM > > > > > > > Subject: Re: To SMP or not to SMP > > > > > > > > > > > > > > On Thursday, January 10, 2013 02:36:59 PM Peter Jeremy > > > > > > > wrote: > > > > > > > > On 2013-Jan-07 18:25:58 -0800, Barney Cordoba > > > > > > > > > > > > > > > wrote: > > > > > > > > >I have a situation where I have to run 9.1 on an old > > > > > > > > >single core > > > > > > > > >box. Does anyone have a handle on whether it's better > > > > > > > > >to > > > > > > > > >build a > > > > > > > > >non > > > > > > > > >SMP kernel or to just use a standard SMP build with > > > > > > > > >just > > > > > > > > >the one > > > > > > > > >core? > > > > > > > > > > > > > > > > Another input for this decision is kern/173322. > > > > > > > > Currently > > > > > > > > on x86, > > > > > > > > atomic operations within kernel modules are implemented > > > > > > > > using calls > > > > > > > > to code in the kernel, which do or don't use lock > > > > > > > > prefixes > > > > > > > > depending > > > > > > > > on whethur the kernel was built as SMP. My proposed > > > > > > > > change > > > > > > > > changes > > > > > > > > kernel modules to inline atomic operations but always > > > > > > > > include lock > > > > > > > > prefixes (effectively reverting r4). I'm > > > > > > > > appreciate > > > > > > > > anyone who > > > > > > > > feels like testing the impact of this change. > > > > > > > > > > > > > > Presumably a locked atomic op is cheaper than a function > > > > > > > call > > > > > > > then? > > > > > > > The > > > > > > > current setup assumes the opposite. > > > > > > > > > > > > > > I think we should actually do this for atomics in modules > > > > > > > on > > > > > > > x86: > > > > > > > > > > > > > > 1) If a module is built standalone, it should do > > > > > > > whichever is > > > > > > > cheaper: > > > > > > >a function call or always use "LOCK". > > > > > > > > > > > > > > 2) If a module is built as part of the kernel build, it > > > > > > > should use > > > inlined > > > > > > >atomics that match what the kernel does. Thus, > > > > > > >modules > > > > > > >built with > > > a > > > > > > >
Re: VMware vmxnet2 driver
- Original Message - > From: "Ivan Voras" > To: freebsd-net@freebsd.org > Sent: Friday, January 18, 2013 4:54:12 AM > Subject: Re: VMware vmxnet2 driver > > On 14/01/2013 07:42, Bryan Venteicher wrote: > > > Any testing or performance data is welcome. For bulk TCP transfers, > > if_vic > > will tend to be faster than em (~1/2 a magnitude) due to TSO, but I > > don't > > think that warrants merging into HEAD yet. > > Considering that from your description the current situation is: > > * The driver isn't *worse* than either em or the "official" > VMWare driver (right?) > * There is currently no vmxnet driver at all in HEAD > > ... I don't think including the driver will harm anyone or anything, > but it may make things a bit simpler when configuring VMs. > > It is typically no better than em (*) - but better in certain cases with TSO. The official driver didn't compile on HEAD and I couldn't bring myself to spend the time to fix it. I'll look into it this weekend and do an initial comparison. A vmxnet3 driver would be far more useful to have in the tree. * I'm running ESXi nested in VMware Fusion but I don't think that would explain the discrepancy. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Performance problem with slow link behind fast gateway
Hi, On Tue, Sep 9, 2014 at 4:42 PM, wrote: > All, > > I'm seeing some performance problems with a slowish VPN connection behind > a fast gateway, the setup looks like this: > > |--| > |-| > |client (zandbak) (DSL connection)| 'VPN tunnel' - |Gateway > (vps) using NAT on 1G|-- 'Internet' > |--| > |-| > > > Transfers from the gateway to the client are reasonably fast (easily > within usable range for me): > root@zandbak:/usr/home/rob # scp rob@gateway:test_file ./ > test_file > 100% 10MB 445.2KB/s 00:23 > > > Transfers from the internet to the gateway are fast: > root@vps:/usr/home/rob # fetch -4 "http://149.20.53.23/pub/ > FreeBSD/releases/amd64/amd64/ISO-IMAGES/10.0/FreeBSD-10.0- > RELEASE-amd64-bootonly.iso" > FreeBSD-10.0-RELEASE-amd64-bootonly.iso 100% of 209 MB 10 MBps > 00m20s > > > But transfers from the client to the internet through the tunnel are > showing a very degraded connection speed, the speed jumps up and down but > averages at around 20kBps: > root@zandbak:/usr/home/rob # fetch "http://149.20.53.23/pub/ > FreeBSD/ISO-IMAGES-amd64/10.0/FreeBSD-10.0-RELEASE-amd64-bootonly.iso" > FreeBSD-10.0-RELEASE-amd64-bootonly.iso 0% of 209 MB 8275 Bps > 07h27m > > > I've tried to eliminate some variables: > -VPN: tinc as a L2 VPN and openVPN as a L3 VPN, results are the same > -NAT: pf and ipfw, results are the same > > I suspect that there's a problem with the fast link receiving too much > data and once the buffers are full dropping packets although I'm not sure > if this is actually the problem. > My question is: how can I debug this issue? > > > On the vtnet0 interface in your KVM VM, disable checksum offloading. What KVM/QEMU VirtIO provides as the "checksum" in situation likes this does not work well with what FreeBSD expects. Fixing this has been on my todo list for awhile, but it is a moderate amount of work to fix this, and touches many places in the stack. I have plans to do mbuf related work later this year, and was planning to finally fix this issue as well. > > Below some system information, I can supply more info if needed > > Thanks! > Rob Evers > > > > System info: > Gateway: This is a VPS on KVM > > root@vps:/usr/home/rob # uname -a > FreeBSD vps.debank.tv 10.0-STABLE FreeBSD 10.0-STABLE #5 r268727M: Wed > Jul 16 13:17:24 NZST 2014 r...@vps.debank.tv:/usr/obj/usr/src/sys/GENERIC > amd64 > > root@vps:/usr/home/rob # ifconfig vtnet0 > vtnet0: flags=8843 metric 0 mtu > 1500 > options=6c00ab HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> > ether 00:16:3c:55:17:b9 > inet 192.227.xxx.xxx netmask 0xff00 broadcast 192.227.xxx.xxx > inet6 fe80::216:3cff:fe55:17b9%vtnet0 prefixlen 64 scopeid 0x1 > nd6 options=21 > media: Ethernet 10Gbase-T > status: active > > > root@vps:/usr/home/rob # ifconfig tap0 > tap0: flags=8843 metric 0 mtu 1500 > options=8 > ether 00:bd:61:01:00:00 > inet6 fd7c:3e16:580b:4ccf::50 prefixlen 64 > inet6 fe80::2bd:61ff:fe01:0%tap0 prefixlen 64 scopeid 0x4 > inet 172.16.143.50 netmask 0xff00 broadcast 172.16.143.255 > nd6 options=61 > media: Ethernet autoselect > status: active > Opened by PID 61485 > > > Client: This is a VM on bhyve > > root@zandbak:/usr/home/rob # uname -a > FreeBSD zandbak 10.0-RELEASE-p7 FreeBSD 10.0-RELEASE-p7 #0: Tue Jul 8 > 06:37:44 UTC 2014 > r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC > amd64 > > root@zandbak:/usr/home/rob # ifconfig vtnet0 > vtnet0: flags=8943 metric > 0 mtu 1500 > options=80028 > ether 52:54:00:13:fd:78 > inet 192.168.1.129 netmask 0xff00 broadcast 192.168.1.255 > inet6 fe80::5054:ff:fe13:fd78%vtnet0 prefixlen 64 scopeid 0x1 > nd6 options=29 > media: Ethernet 10Gbase-T > status: active > > root@zandbak:/usr/home/rob # ifconfig tap0 > tap0: flags=8843 metric 0 mtu 1500 > options=8 > ether 00:bd:3d:94:05:00 > inet 172.16.143.55 netmask 0xff00 broadcast 172.16.143.255 > nd6 options=29 > media: Ethernet autoselect > status: active > Opened by PID 1411 > > > > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: UDP/IPv6 handling
On Wed, Oct 1, 2014 at 11:58 AM, Michael Tuexen < michael.tue...@lurchi.franken.de> wrote: > Dear all, > > in udp6_input() we have the following code: > > if (nxt == IPPROTO_UDP && plen != ulen) { > UDPSTAT_INC(udps_badlen); > goto badunlocked; > } > /* > * Checksum extended UDP header and data. > */ > if (uh->uh_sum == 0) { > if (ulen > plen || ulen < sizeof(struct udphdr)) { > UDPSTAT_INC(udps_nosum); > goto badunlocked; > } > } > > I'm trying to understand the UDP code path... > > I too was recently confused by this code. I pointed out one issue to kevlo@ recently, but it still kind of seemed like the UDP-Lite was mismerged to IPv6. So (ulen > plen) can't be true. I'm wondering why do we only check the ulen > is not too > short only in the case when the UDP checksum is zero. A zero checksum > should also never happen. > > I hope to have a patch for RFC6935 [1] soon so a zero checksum may be allowed if the inp/udpcb is configured for it. I think we should check for ulen < sizeof(struct udphdr) in any case. > > I think previously, the checks in ip6_input(), IP6_EXTHDR_CHECK(), and plen == ulen made this unnecessary. I think we'd want to do it for UDP-Lite if ulen was not initially zero. [1] - http://tools.ietf.org/html/rfc6935 > Opinions? > > Best regards > Michael > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: SIOCSVH, SIOCGVH ioctl(2) and virtio ethernet driver
On Fri, Dec 26, 2014 at 8:09 AM, Oleg Ginzburg wrote: > is it possible to use the carp(4) protocol with > vtnet(4) interfaces ( which is used, for example, in bhyve(8) ) > Currently, the standard carp init operation causes an SIOCGVH error: > > /sbin/ifconfig vtnet0 vhid 1 advskew 100 pass pass 10.10.10.10/24 alias > ifconfig: SIOCGVH: Protocol not supported > > > You probably don't have the carp(4) module loaded. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: vmxnet3 driver bug?
On Tue, Oct 17, 2017 at 9:36 AM, Lewis Donzis wrote: > The VMXNET3 driver appears to have a bug that prevents it from correctly > reporting when the link goes down. > > There are two lines of code that should be deleted in > /usr/src/sys/dev/vmware/vmxnet3/if_vmx.c: > > @@ -3619,8 +3619,6 @@ vmxnet3_media_status(struct ifnet *ifp, struct > ifmediareq *ifmr) > VMXNET3_CORE_LOCK(sc); > if (vmxnet3_link_is_up(sc) != 0) > ifmr->ifm_status |= IFM_ACTIVE; > - else > - ifmr->ifm_status |= IFM_NONE; > VMXNET3_CORE_UNLOCK(sc); > } > > IFM_NONE doesn’t belong in the status flags and, coincidentally, is > defined with an identical value as IFM_ACTIVE, so it indicates that link is > always active. > > This should be fixed in r326309 that I'll merge to the stable branches in a week. > Thanks, > lew > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"