On Fri, 2015-01-02 at 09:15 +0100, Joolee wrote: > You can start with http://homoglyphs.net/?unicodepos=1 and the search term > homoglyphs might get you even more extensive lists. > I realised that this was spam containing homoglyphs: a look at the message showed it to be using an abnormal size and font so, since I have my reader set up to display plain text rather than HTML, I knew that there would be only HTML in the body.
What I was asking about was how to write a regex that would match the ofuscation encoding. I've had several attempts at it now. The resulting regexes pass SA lint tests and match example spam when run as, for instance grep -P '\&\#959;' <saved_spam.txt but don't generate hits when used in an SA body rule as: body MG_OBFUSCATION /\&\#959;/ Martin