Re: [SAtalk] Skipping multipart/related is bad

Nels Lindquist Tue, 19 Mar 2002 14:21:57 -0800

On 19 Mar 2002 at 14:19, Craig Hughes wrote:

> First step towards being on top of the bug list is being on the buglist
> at all -- and the first step towards being on the buglist is for the
> person who identifies a bug to enter it on the buglist.
> 
> http://bugzilla.spamassassin.org/
> 
> C
> 
> On Tue, 2002-03-19 at 10:03, Bart Schaefer wrote:
> > On Tue, 19 Mar 2002, Bart Schaefer wrote:
> > 
> > > SA should apply body tests to any text parts within a multipart/related.
> > 
> > I just looked at the source of PerMsgStatus.pm for the first time ...
> > 
> > It never occurred to me that SpamAssassin could lack a proper MIME parser.  
> > Any nested multipart containing a base64'd sub-part can totally defeat all
> > body checks, and even if there's only one level of multipart the base64
> > decoder fails if the lines are not exactly 76 characters long (the spec
> > only requires that they be NO MORE than 76).
> > 
> > This belongs at the very top of the bug list, if you ask me.


I posted a couple of messages about problems with base64 encoding 
back in January and didn't get a single reply, not even a pointer to 
the buglist.

Is this the same issue? (see attached)
----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.

--- Begin Message ---

Hi there.

I'm new to the SpamAssassin world, so I hope this issue hasn't been 
discussed to death already.

SpamAssassin first come to my attention on the MIMEDefang mailing 
list when SA integration was added to 'Fang.  I've been playing with 
it for a little while now, and I've been very pleased with the 
results.

However, I noticed that a few messages which were clearly spam were 
sneaking through just under the threshold.  It wasn't immediately 
obvious why this should be since the messages had all the hallmarks 
of blatant spam which typically gets quite a high score.

I determined that the messages in question had their bodies base64 
encoded.  It would appear that SpamAssassin isn't decoding the base64 
strings prior to doing its analysis.

Since I'd been in the habit of manually decoding such messages so 
that I could report them via SpamCop, I did some testing.

I took one of the messages which had scored 4.2 (threshold is the 
default 5.0), manually decoded the base64 string, and built a message 
using the same headers and the decoded message body.

Using the commandline version of SA 2.01, I checked the decoded 
message and the score jumped to 19.1.  Checking with SA 1.5 on a 
different machine, the score was 25.

I had similar results with other messages.  Some of them weren't as 
dramatic, but nearly all of the scores were bumped over the spam 
threshold due to hits from the decoded message body.

Has anyone considered adding base64 decoding functionality to 
SpamAssassin in order to improve its accuracy on such messages?
----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

--- End Message ---

--- Begin Message ---

Okay, I think I was mistaken about lack of base64 decoding being the 
problem.

Does SpamAssassin recurse over subparts in multipart messages?  It 
appears that for multipart messages of the form:

Headers (Content Type: multipart/alternative;)
 |
 ---Part 1 (plain text)
 (blank)
 |
 ---Part 2 (base64 encoded HTML)
 (encoded content)

Part 2 is not being processed as part of the message body.

Am I missing something?

On 28 Jan 2002 at 16:09, Nels Lindquist wrote:

> I'm new to the SpamAssassin world, so I hope this issue hasn't been 
> discussed to death already.
> 
> SpamAssassin first come to my attention on the MIMEDefang mailing 
> list when SA integration was added to 'Fang.  I've been playing with 
> it for a little while now, and I've been very pleased with the 
> results.
> 
> However, I noticed that a few messages which were clearly spam were 
> sneaking through just under the threshold.  It wasn't immediately 
> obvious why this should be since the messages had all the hallmarks 
> of blatant spam which typically gets quite a high score.
> 
> I determined that the messages in question had their bodies base64 
> encoded.  It would appear that SpamAssassin isn't decoding the base64 
> strings prior to doing its analysis.
> 
> Since I'd been in the habit of manually decoding such messages so 
> that I could report them via SpamCop, I did some testing.
> 
> I took one of the messages which had scored 4.2 (threshold is the 
> default 5.0), manually decoded the base64 string, and built a message 
> using the same headers and the decoded message body.
> 
> Using the commandline version of SA 2.01, I checked the decoded 
> message and the score jumped to 19.1.  Checking with SA 1.5 on a 
> different machine, the score was 25.
> 
> I had similar results with other messages.  Some of them weren't as 
> dramatic, but nearly all of the scores were bumped over the spam 
> threshold due to hits from the decoded message body.
> 
> Has anyone considered adding base64 decoding functionality to 
> SpamAssassin in order to improve its accuracy on such messages?
----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

--- End Message ---

Re: [SAtalk] Skipping multipart/related is bad

Reply via email to