On 04/10/2017 07:28, Walter Dnes wrote:
>   I have some doubts about massive "hosts" files for adblocking.  I
> downloaded one that listed 13,148 sites.  I fed them through a script
> that called "host" for each entry, and saved the output to a text file.
> The result was 1,059 addresses.  Note that some adservers have multiple
> IP address entries for the same name.  A back-of-the-envelope analysis
> is that close to 95% of the entries in the large host file are invalid,
> amd return "not found: 3(NXDOMAIN)".
> 
>   I'm not here to trash the people compiling the lists; the problem is
> that hosts files are the wrong tool for the job.  Advertisers know about
> hosts files and deliberately generate random subdomain names with short
> lifetimes to invalidate the hosts files.  Every week the sites are
> probably mostly renamed.  Further analysis of the 1,059 addresses show
> 810 unique entries, i.e. 249 duplicates.  It gets even better.  44
> addresses show up in 52.84.146.xxx; I should probably block the entire
> /24 with one entry.  There are multiple similar occurences, which could
> be aggregated into small CIDRs.  So the number of blocking rules is
> greatly reduced.
> 
>   I'm not a deep networking expert.  My question is whether I'm better
> off adding iptables reject/drop rules or "reject routes", e.g...
> 
> route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject
> 
> (an example from the "route" man page).  iptables rules have to be
> duplicated coming and going to catch inbound and outbound traffic.  A
> reject route only needs to be entered once.  This excercise is intended
> to block web adservers, so another question is how web browsers react to
> route versus iptables blocking.
> 
>   While I'm at it (I did say I'm not an expert) is there another way to
> handle this?  E.g. redirect "blocked CIDRs" via iptables or route to a
> local pixel image?  Will that produce an immediate response by the web
> browser, versus timing out with "regular blocking"?
> 


This is a complex problem with no cut-and-dried solution. It's real life
and as you know real life is murky.

Let's define the real problem you are wanting to solve: there's a bunch
of ad servers out there, and you want them to disappear. Or more
accurately, you want their traffic to disappear from *your* wires.

There are really 3 approaches as you know:
redefine the hostname to be a blackhole (e.g. 127.0.0.1)
find the addresses or subnets and drop/reject the packets with iptables
find the subnets (sometimes the individual hosts) and route them into a
blackhole

Each has their strengths and weaknesses.
packet filters work best at the TCP/UDP/ICMP layer where you have an
addresses and often a port.
routing works best at the IP layer where you have whole chunks of
subnets and tell the router what to do with all traffic from that entire
subnet
host files work best at the name layer where you have dns names

Your problem seems to slot in somewhere between a firewall and a routing
solution, explaining why you can't decide. Host files for this sucks
major big eggs as you know, people still use it as it seems legit (but
isn't) and they understand it whereas they don't understand the other 2.

Ad providers are well aware of this. I was surprised to see
52.84.146.0/24 show up in your mail, as that is Amazon's AWS range. Yes,
you could null-route that subnet, but it's Amazon and maybe there's
hosts in there that you DO want to use.

I'd suggest you use a packet filter, but not on Linux and certainly not
iptables. That thing is a god-awful mess looking like it was built by
unsupervised schoolkids masquerading as internes. The best tool for this
is the pf packet filter, but it runs on FreeBSD. Get yourself a spare
machine, load pfsense on it (it's an appliance like wrt) and drop the
traffic from all offensive addresses. Drop, not reject.

You could in theory do the same thing with iptables, but the ruleset
will quickly drive you nuts. Perhaps the ipset plugin would help, I've
been meaning to check it out for ages and never got around to it.


-- 
Alan McKinnon
alan.mckin...@gmail.com


Reply via email to