On Sat, 20 Dec 2014 12:35:04 +0100 Axb wrote: > On 12/18/2014 06:27 PM, RW wrote:
> > Unless there's a bug, the fact that those disclaimer phrases got > > through suggests that these rules are either intended to be very > > much more aggressive than the SOUGHT rules, or the ham corpus > > isn't good enough. > > > as the rules were generated with donated corpus data, you're more > than welcome to send me an archive of ham samples to avoid these > potential issues. Most of the hits were in mailing list folders, some were in this list. Most of your rules are sensible, but a minority look like they are picking-up on text lifted from legitimate mail. Some of these are still good rules because the text contains mistakes. IIRC Justin Mason used to check new sought sub-rules manually before releasing them.