It looks what my suggested test actually is finding a physical sites which tend to use large numbers of virtually hosted domains on web servers. Spammers are merely a subset of this group - but the set I look at the most. Jdow's point about very long chains of subdomains is real - It is too bad that there is not a common syntax for "allow anything 1 or N levels deep", just the "allow anything" case. Also, Keith "said" subdomains in a context where it seemed that hosts would be more appropriate (though maybe he did mean that his users get subdomains, not just virtual hosts - certainly takes a few tricks not directly built into apache to do that, but it is possible).
Obviously, I've been biased by looking at more spam domains than clean ones. Still, the inherent flaw in that a wildcard allows unlimited levels of indirection is only one argument against its use - simplicity of sharing zone files is the best argument for its use for these cases coming up. A few interesting points: It seems that some of the cases mentioned may be relying on what is documented in BIND9 as bugs in BIND8 and earlier (e.g. subdomains sharing 'NS' records with parents). Also, BIND9 make clear that "wildcards" *cannot* be used with DNSSEC secure zones. While noone expects spammers to change to DNSSEC, it seems that longer term all the cited "legitimate" cases (so far virtual hosting of large numbers of domains and/or hosts) would be better performed using either "nsupdate" to add hosts instead of unlimited subdomains *or* using a LDAP interface like "slapd" *or* using the BIND9 database capability. Also wildcards are disallowed for "Link-local multicast" (think wireless and/or cheap IPv6 link-local only devices). Worth mentioning is the historic "bug" in the resolver when a wildcarded CNAME was used in the domain edu.com and all communication between any ".com" to any ".edu" domain was suddenly broken. I guess my only remaining point is (despite everyone affected will dislike it), since the majority of all email communication is spam, and an untested but likely scenario is that the majority of spam includes wildcarded domains while an unknown amount a ham does (which I believe is significant, but relatively small by comparison), the question becomes not whether the test is valid, *IT IS*, but what is the FP rate, and what weight should be assigned to it (clearly it is not going to be in the class of SURBLs, but it would seem that the amount of email mentioning blog domains at large virtually hosted sites is vanishingly small). I would wager that the FP rate is lower than that for the DNS_FROM_RFC_ABUSE or DNS_FROM_RFC_POST rules, of which one or the other hits most "free mail" (with the added tag line of something like "Get your Free email at XYZ.tld") and nearly every cable Internet operator in the US BTW, locally I lower the scores for these (and similar local URI rules), then use meta rules to recognize when more than one is hit, and assign a slightly high than default value in those cases - Maybe something similar is appropriate (e.g. if bayes > 60% *and* wildcards are used add X points, but wildcards alone are only scored as Y points) - I actually use many rules like this and think that given time and a larger corpus to check against, the default SA system should do likewise. Anyway, we clearly have found a common case (physically large sites with large numbers of virtually hosted domain/sites) which will FP on a rule such as I originally proposed. Though, I don't think it comes anywhere close to the "Middle Initial" rule - a test mass check run would show quickly if there is any merit. I would expect a very high SPAM%, a small but significant HAM% and an S/O ratio we can only guess at. I would also expect a low "overlap" rate which would argue for the value of meta-rules to reduce an cost of FPs. Paul Shupak [EMAIL PROTECTED]