Re: [Cerowrt-devel] Recording RF management info _and_ associated traffic?

David Lang Sun, 25 Jan 2015 01:40:34 -0800

On Sun, 25 Jan 2015, Dave Taht wrote:

I want to make clear that I support dlang's design in the abstract... and
am just arguing because it is a slow day.


I welcome challenges to the design, it's how I improve things :-)

On Sat, Jan 24, 2015 at 10:44 PM, David Lang <da...@lang.hm> wrote:

On Sat, 24 Jan 2015, Dave Taht wrote:


to clarify, the chain of comments was

1. instead of bridging I should route

2. network manager would preserve the IPv4 address to prevent breakingestablished connections.

I was explaining how that can't work. If you are moving between differentnetworks, each routed independently, they either need to have different addressranges (in which case the old IP just won't work), or they would each need toNAT to get to the outside (in which case the IP may stay the same, but theconnections will break since the new router wouldn't have the NAT entries forthe existing connections)

Hmm? The first thing I ever do to a router is renumber it to a unique IPaddress range, and rename the subnet in dns to something unique. The 3 sedlines for this are on a cerowrt web page somewhere. Adding ipv6 statically isa pita, but doable with care and a uci script, and mildly more doable as hnetdmatures.


I run local dns services on each in the hope that at least some will be
cached, and a local dhcp server to serve addresses out of that range. I
turn off dhcp default route fetching on each routers external interface and
use babel instead to find the right route(s) out of the system.

On the NAT front, there is no nat on the internal routers, just a flat
address space (172.20.0.0/14 in my case). I push all the nat to the main
egress gateway(s), and in a case like yours would probably use multiple
external IPs and dnat rather than masquarade the entire subnet on one to
free up port space. You rapidly run out of ports in a natted evironment
with that many users. I've had to turn down NAT timeouts for udp in
particular to truly unreasonable levels otherwise (20 seconds in some cases)

hmm, we haven't seen anything like this, but it could be a problem we haven'tnoticed because we haven't been looking for it.

Doing this I can get a quick status on what is up with "ip route", and by
monitoring the activity on each ip range, see if traffic is actually being
passed, a failure of a given gateway fails over to another, and so on.
There's a couple snmp hacks to do things like monitor active leases, and
smokeping/mrtg to access other stats. There's a couple beagles that are on
wifi that I ping on some APs. The beagles have not been very reliable for
me, so they switch on and off with digiloggers gear when they fail a local
ping. In fact the main logging beagle failed entirely the other month, sigh.

I use the ad-hoc links on cerowrt as backups (if they lose ethernet
connectivity) and extenders (if there is no ethernet connectivity), and (as
I have 5 different comcast exit nodes spread throughout the network), use
babel-pinger on each to see if they are up, and insert default routes into
the mix that are automatically the shortest "distance" between the node and
exit gateway. If one gw goes down (usually) all the traffic ends up
switching to the next nearest default gateway switching over in 16 seconds
or so, breaking all the nat associations for the net they were on (sigh),
as well as ipv6 native stuff, but it's happened so often without me
noticing it that it's nice not to worry.

(I have a mostly failed attempt in play for doing better with ipv6 and
hnetd on a couple of exit nodes, but that isn't solid enough to deploy as
yet, so it's only sort of working in the yurtlab. I really wish I could buy
PI space for ipv6 somehow)

(I have been fiddling with dns anycast to try to get more redundancy on the
main dns gateways. That works pretty good)

Now, your method is simpler! (although mine is mostly scripted) I imagine
you bridge everything on a vlan, and use a central dhcp/dns server to serve
up dhcp across (say) a 10.0.0.0/16 subnet. And by blocking local
multicast/broadcast, in particular, this scales across the 3k user
population. You've got a critical single point of failure in your gateway,
but at least that's only one, and I imagine you have that duplicated.

I have two wifi vlans, one for 5GHz (ESSID SCALE), and one for 2.4GHz (ESSIDSCALE-slow, no speed limits, but it does a great job of encouraging everyone whocan to use 5GHz :-) ) There is a central DHCP server and firewall that allocatesaddresses across a /17 for each of the two networks. We don't setup activefailover, but we have a spare box that we can put in if needed.

The APs don't have any IP addresses on either wireless network. They have an IPon a different VLAN that's used for management only. Makes it a bit harder forany attackers to do anything to them.


Remember, we need to have it work for a few days at a shot

(In contrast my network is always broken somewhere, but unless two critical
nodes break, it's pretty redundant and loss is confined to a a single AP -
my biggest problem is that I need to upgrade the firmware on about half the
network - which involves climbing trees - and my plan was to deploy hnetd
last year so I could roll out ipv6)

How do you deal with a dead AP that is not actually connecting with traffic?

Nagios type monitoring to detect that the AP isn't reachable on the wirednetwork and we send a runner to find out what's happening. About three years agowe had a lot of problems with people unplugging the APs for some reason.

For the normal user that we are trying to support at a conference, it's a
win.
I'll note that we also block streaming sites (which has the side effect ofblocking some useful sites that share the same IPs, Amazon for example) tohelp make things better for everyone else, even at the cost of limiting whatsome people are able to do. Bandwidth is limited compared to the number ofpeople we have, and we have to make choices.
Blocking ads is also effective.

We use DNS to block things like this (or actually redirect the DNS to point to aserver that serves an image saying that they are being blocked by SCaLE), andthen we block port 53 to the outside to force people to use our DNS servers.Somewhat heavy handed, but it works.

Will you attempt to deploy ipv6?



We have been offering IPv6 routable addresses for a few years.


How many do you get and from whom?


I don't remember at the moment.

I am of course interested in how fq_codel performs on your ISP link, and
are you planning on running it for your wifi?



I'm running OpenWRT on the APs but haven't done anything in particular to
activate it.


fq_codel is on by default in Barrier breaker and later on all interfaces. I
note that it doesn't scale anywhere near as we would like under contention
but that work is only beginning in chaos calmer. A thought I've had in an
environment such as yours would be to rate limit each AP's ingress/egress
ethernet interface to, say, 20mbits, thus pushing all the potential bloat
to sqm on ethernet and out of the wifi (which would generally run faster).
Might even force uploads from the users lower, also (say 10mbit). Might
not, and just rely on people retaining low expectations. :)

Was it on openwrt last year?

yes, most of what I did on the wireless side is in the paper athttps://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless

The first year I did the network I had a total of one month to plan and buy APs,so I was running stock firmware, the second year I used DD-WRT and was veryunhappy with it. I've been running OpenWRT since.

I'll check what we have on the firewall (a fairly up to day
Debian build)


fq_codel has been a part of that for a long time.

I'd port over the sqm-scripts and use those, it's only a 1 line change.

What's the best way to monitor the queues?


On each router?

I tend to use pdsh a lot, setting up a /etc/genders file for them all so I
can do a

pdsh tc qdisc show dev wlan0 # or uptime or cat /etc/dhcp.leases | wc -l or
whatever

Been meaning to get around to something that used snmp instead for a while.

I'm gathering info on each AP about the number of users currently connected andthe bandwidth used on all ports. I also have a central log from all APs whichshows the MAC addresses as they associate with each AP.

So collecting the data to one place is the easy part, what I don't now is what Ineed to gather from where with what commands. Any suggestions for this are verywelcome.


David Lang
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] Recording RF management info _and_ associated traffic?

Reply via email to