Hello,
I have quite pretty archive of phish mail (bank and mail accounts), where
many words and phrases repeat.
I was thinking about processing them manually and creating rules, but that
would be much work.
I remember that SOUGHT ruleset used to contain phrases that appear
repeatedly, so I'd try to use these, if possible.
so far I found:
- description how it works https://taint.org/2007/03/05/134447a.html
- scripts to search in corpus:
https://svn.apache.org/repos/asf/spamassassin/trunk/masses/rule-dev/seek-phrases-in-corpus
which seems to use plugins (Dumptext.pm, GrepRenderedBody.pm) I found at:
https://svn.apache.org/repos/asf/spamassassin/branches/3.3/masses/plugins/
Are these still working or do they have any new versions?
Does anyone have hints how to process phish archive?
I mean, I apparently could manually weed out any repeated non-phish phrases
to avoid FPs or check them manually what mail they hit, so I didn't need to
keep much of ham mail
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901