Hello,

I have quite pretty archive of phish mail (bank and mail accounts), where many words and phrases repeat.

I was thinking about processing them manually and creating rules, but that would be much work. I remember that SOUGHT ruleset used to contain phrases that appear repeatedly, so I'd try to use these, if possible.

so far I found:
- description how it works https://taint.org/2007/03/05/134447a.html
- scripts to search in corpus:
  
https://svn.apache.org/repos/asf/spamassassin/trunk/masses/rule-dev/seek-phrases-in-corpus

which seems to use plugins (Dumptext.pm, GrepRenderedBody.pm) I found at: https://svn.apache.org/repos/asf/spamassassin/branches/3.3/masses/plugins/


Are these still working or do they have any new versions?

Does anyone have hints how to process phish archive?

I mean, I apparently could manually weed out any repeated non-phish phrases to avoid FPs or check them manually what mail they hit, so I didn't need to keep much of ham mail

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901

Reply via email to