Dear fellow Spamassassin users,

I'm contacting you as a member of ULYSSIS. ULYSSIS is a student non-profit organisation at the University of Leuven trying to make computers and technology more approachable and available to students. As part of this objective, we run a hosting service within our university's network for student organisations, student unions and individuals at our university.

We've battled with spam from time to time, since we seem to attract a lot of exotic languages which are rather well able to circumvent commonly used methods. This has had us resort to some custom rulesets to battle against mostly targetted French and SEO spam often coming from very respectable servers and very normal addresses.

Now because SEO spam specifically has been adapting quite well to any rule we think of (finding alternative ways of saying the same thing time and time again), I was hoping to write a rule that basically boiled down to "give some spam score to emails that contain the word SEO 3 or more times" to push those already being detected by other rules over the edge. To be clear, this will be a low score rule, I'm aware that ham can perfectly well contain that word 3 times, just like this email for example. Now while investigating I started wondering how to tackle that some spam will just have a plain text body, while others will also feature HTML, which means that suddenly the amount may double/half. Beyond that it seems quite hacky to use a regex that boils down to something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of something that is properly aware of the count of certain words.

Since I sort of expected Spamassassin to have a solution for both the text/text+html and the counting problems, I asked around on IRC but was pointed here. So uhm, any suggestions or pointers are more than welcome. Not too sure if any more information is required, but feel free to ask questions or corect my presumptions if necessary.

Kind regards,
Bert Van de Poel
ULYSSIS
University of Leuven

Reply via email to