Hello, On Wed, Oct 4, 2017 at 12:28 AM, Walter Dnes <waltd...@waltdnes.org> wrote: > I have some doubts about massive "hosts" files for adblocking. I > downloaded one that listed 13,148 sites. I fed them through a script > that called "host" for each entry, and saved the output to a text file. > The result was 1,059 addresses. Note that some adservers have multiple > IP address entries for the same name. A back-of-the-envelope analysis > is that close to 95% of the entries in the large host file are invalid, > amd return "not found: 3(NXDOMAIN)". > > I'm not here to trash the people compiling the lists; the problem is > that hosts files are the wrong tool for the job. Advertisers know about > hosts files and deliberately generate random subdomain names with short > lifetimes to invalidate the hosts files. Every week the sites are > probably mostly renamed. Further analysis of the 1,059 addresses show > 810 unique entries, i.e. 249 duplicates. It gets even better. 44 > addresses show up in 52.84.146.xxx; I should probably block the entire > /24 with one entry. There are multiple similar occurences, which could > be aggregated into small CIDRs. So the number of blocking rules is > greatly reduced. > > I'm not a deep networking expert. My question is whether I'm better > off adding iptables reject/drop rules or "reject routes", e.g... >
If you want to filter connections based on IP, then use iptables or the newer alternative, nftables. Nftables is faster and more configurable. I suggest the Wikipedia page before the documentation: https://en.wikipedia.org/wiki/Nftables. If you want to block advertisements, you should use a content aware system that is integrated into a browser and that is maintained by lots of people at the same time. You should also consider blocking JavaScript. Cheers, R0b0t1