On 04/10/2017 07:28, Walter Dnes wrote: > I have some doubts about massive "hosts" files for adblocking. I > downloaded one that listed 13,148 sites. I fed them through a script > that called "host" for each entry, and saved the output to a text file. > The result was 1,059 addresses. Note that some adservers have multiple > IP address entries for the same name. A back-of-the-envelope analysis > is that close to 95% of the entries in the large host file are invalid, > amd return "not found: 3(NXDOMAIN)". > > I'm not here to trash the people compiling the lists; the problem is > that hosts files are the wrong tool for the job. Advertisers know about > hosts files and deliberately generate random subdomain names with short > lifetimes to invalidate the hosts files. Every week the sites are > probably mostly renamed. Further analysis of the 1,059 addresses show > 810 unique entries, i.e. 249 duplicates. It gets even better. 44 > addresses show up in 52.84.146.xxx; I should probably block the entire > /24 with one entry. There are multiple similar occurences, which could > be aggregated into small CIDRs. So the number of blocking rules is > greatly reduced. > > I'm not a deep networking expert. My question is whether I'm better > off adding iptables reject/drop rules or "reject routes", e.g... > > route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject > > (an example from the "route" man page). iptables rules have to be > duplicated coming and going to catch inbound and outbound traffic. A > reject route only needs to be entered once. This excercise is intended > to block web adservers, so another question is how web browsers react to > route versus iptables blocking. > > While I'm at it (I did say I'm not an expert) is there another way to > handle this? E.g. redirect "blocked CIDRs" via iptables or route to a > local pixel image? Will that produce an immediate response by the web > browser, versus timing out with "regular blocking"? >
This is a complex problem with no cut-and-dried solution. It's real life and as you know real life is murky. Let's define the real problem you are wanting to solve: there's a bunch of ad servers out there, and you want them to disappear. Or more accurately, you want their traffic to disappear from *your* wires. There are really 3 approaches as you know: redefine the hostname to be a blackhole (e.g. 127.0.0.1) find the addresses or subnets and drop/reject the packets with iptables find the subnets (sometimes the individual hosts) and route them into a blackhole Each has their strengths and weaknesses. packet filters work best at the TCP/UDP/ICMP layer where you have an addresses and often a port. routing works best at the IP layer where you have whole chunks of subnets and tell the router what to do with all traffic from that entire subnet host files work best at the name layer where you have dns names Your problem seems to slot in somewhere between a firewall and a routing solution, explaining why you can't decide. Host files for this sucks major big eggs as you know, people still use it as it seems legit (but isn't) and they understand it whereas they don't understand the other 2. Ad providers are well aware of this. I was surprised to see 52.84.146.0/24 show up in your mail, as that is Amazon's AWS range. Yes, you could null-route that subnet, but it's Amazon and maybe there's hosts in there that you DO want to use. I'd suggest you use a packet filter, but not on Linux and certainly not iptables. That thing is a god-awful mess looking like it was built by unsupervised schoolkids masquerading as internes. The best tool for this is the pf packet filter, but it runs on FreeBSD. Get yourself a spare machine, load pfsense on it (it's an appliance like wrt) and drop the traffic from all offensive addresses. Drop, not reject. You could in theory do the same thing with iptables, but the ruleset will quickly drive you nuts. Perhaps the ipset plugin would help, I've been meaning to check it out for ages and never got around to it. -- Alan McKinnon alan.mckin...@gmail.com