I was actually thinking of removing the 3-in-a-line restriction, but splitting the rule in to at least 2 pieces: naughty words, and definite signs of porn spam. "fuck" falls in the former category, "cum" in the latter. I'll basically check each words' frequency in the corpus and separate the words at some ratio of spam:nonspam presence.
C On Thu, 2002-02-07 at 07:12, Shane Williams wrote: > I was looking at the porn expressions and scoring, and thought of an > idea to shoot by everybody. > > If I'm reading the PORN_3 rule correctly, you must have three of the > listed strings within 15 characters of each other, and this scores .7 > if caught. > > Two things seem strange about this. First, how often would two of > these strings in close proximity not be pretty spammy? And if there > are actually three of them in a row, shouldn't it score higher than .7 > > What I was thinking was to reproduce the rule in full, except change > the 3 to a 2 and then score the rules as such. Two of these strings > in proximity would score slightly lower than 3 in proximity. > > -- > Public key #7BBC68D9 at | Shane Williams > http://pgp.mit.edu/ | > =----------------------------------+------------------------------- > All syllogisms contain three lines | [EMAIL PROTECTED] > Therefore this is not a syllogism | www.gslis.utexas.edu/~shanew > > > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk