On Wed, 03 Sep 2014 15:47:05 -0600 Jesse Norell wrote:
> Hmm, ok. Without "hapaxes" enabled, how many hits on a token do you > need for it to start being useful? Don't disable hapaxes without good empirical evidence that it provides you with a benefit. Typically it reduces accuracy without any benefit at all. The documentation used to say it greatly reduces the size of the database, but that's not true. > I actually meant to clarify that a plugin is what would need to > perform the IP lookup and add it as a bayes token. You can't say > "increment URL_IP:x.x.x.x spam token count when training" in the rule > language. (I've written some rules, never delved into plugins.) DNS requests are normally sent out as early as possible, Bayes and other text based rules are then processed in parallel with the the network round-trips. If you try this don't be surprised if it's unreliable, or doesn't work without deferring bayes.