On Tue, 9 Dec 2003 07:13:11 -0500, Scott Sprunger <[EMAIL PROTECTED]>
posted to spamassassin-talk:
 > I wanted to test a theory so I've been trying to come up with a
 > rule that will catch encoded strings in the subject of a message.
 > So far I've tried the rules below, but none of them are hitting.
 > Any suggestions?
 >
 > rawbody  T_SBJT_ENC /^Subject:
 > ?=\?(us\-ascii|iso\-8859\-1|windows\-1251)\?/i
 > describe T_SBJT_ENC Subject uses encoding (us-ascii, ISO or windows)
 > score           T_SBJT_ENC .01
 >
 > full     T_SBJT_ENC /^Subject:
 > ?=\?(us\-ascii|iso\-8859\-1|windows\-1251)\?/i
 > describe T_SBJT_ENC Subject uses encoding (us-ascii, ISO or windows)
 > score           T_SBJT_ENC .01
 >
 > header   T_SBJT_ENC Subject =~
 > /=\?(us\-ascii|iso\-8859\-1|windows\-1251)\?/i
 > describe T_SBJT_ENC Subject uses encoding (us-ascii, ISO or windows)
 > score           T_SBJT_ENC .01
<...>
 > What I'm looking for are subject headers as shown below:
 >
 > Subject: =?us-ascii?B?MCBNZW4sIGl0IHJlYWxseSB3b3JrcyEgZnA=?= iwsgfb
 > Subject: =?iso-8859-1?b?SSBhbSBub3cgdG90YWxseSBkZWJ0IGZyZWU=?=
 > Subject: =?windows-1251?B?QmExayBmaTF0ZXJzPyAtIGZvcmdldA==?=

Assuming the Subject:raw thingy works (just saw that in another reply)
you will need to clarify a couple of issues for yourself.

Are you looking for any RFC2047 encoding or specifically the base64
type? The Subjects you list could have been encoded (validly) like
this just as well:

  Subject: =?us-ascii?Q?0_Men,_it_really_works!_fp?= iwsgfb
  Subject: =?iso-8859-1?Q?I_am_now_totally_debt_free?=
  Subject: =?windows-1251?Q?Ba1k_fi1ters?_-_forget?=

(or, seeing as they are in fact all just pure 7bit us-ascii, of course

  Subject: 0 Men, it really works! fp iwsgfb
  Subject: I am now totally debt free
  Subject: Ba1k fi1ters? - forget

so it's a pretty safe assumption that the encoding was used purely for
obfuscation, or out of incompetence. But I digress ...)

Are you looking for ISO-8859-1 in particular, or could this be
extended to cover other cases? In particular, 8859-15 is virtually
identical to -1 except for the Euro sign and some other minor tweaks,
and is getting more and more widespread. All of the other 8859 sets
are identical to US-ASCII in the lower 128 bytes IIRC and so could be
used to encode a message which is in fact in US-ASCII.

As a minor terminological nit, there are many ISO character sets other
than 8859 so your rule descriptions are not entirely accurate. In
particular, ISO-646 is plain ole 7-bit ASCII and ISO-10646 is Unicode.
There is also ISO-2022 which is used e.g. in Japan. (And then of
course there's a whole lot of ISO standards which do not standardize
character encodings at all, but something else entirely :-)

Anyway, assuming base64 encoding is the target here and that all kinds
of ISO-8859 should trigger the rules, here's an attempt at synthesis:

  header   T_SBJT_ENC Subject:raw =~ 
/=\?(us\-ascii|iso\-8859\-[1-9][0-9]?|windows\-1251)\?b\?/i
  describe T_SBJT_ENC Subject uses RFC2047 base64 encoding
  score    T_SBJT_ENC .01

I guess some variants of ISO-8859 would legitimately use base64 most
of the time, but unless you're using one of those encodings regularly,
this shouldn't matter much in practice.

/* era */

-- 
The email address era     the contact information   Just for kicks, imagine
at iki dot fi is heavily  link on my home page at   what it's like to get
spam filtered.  If you    <http://www.iki.fi/era/>  500 pieces of spam for
want to reach me, see     instead.                  each wanted message.



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to