1.  Just posed a bunch of rule ideas.  Should keep you busy for a while ;).  Deductions mostly at this point.  But I plan to work on some good spam detection ideas later tonight.

Theo Van Dinter wrote:
It depends on your client.  I've seen 30 byte text messages take 4k in HTML.
  
But why waste the extra 30 bytes, when most people (especially if you know they can recieve HTML mail).  Most clients automatically reply in the format they recieve in.  Most users don't even realize the difference between formats.
I don't know, being able to catch 59% of my spam with a 99.4% correct
hit rate is in no way "outdated".

Does that mean you'll have the same results?  No.  But that's why you
can submit mass-check results. ;)
The problem I see is HTML mail is intended, and is tending to replace plain text email.  As a result, more false postive emails will occur in 6 months, 1 year, and so on.  Why focus and rely so heavily on what we know will be causing this, rather than work to perfect spam detection in ways that ONLY spammers can be detected.  SpamAssassin should not add so many points for what can be legitimate email fully compliant with all specifications and standards simply based on the format.  It's asking for false positives.  The rule should exist.  But should not be so heavy, since it will become more of an issue as time goes on.  

Rumor is the new version of Outlook in the works has expanded HTML mail capabilities as well.  As HTML becomes more common for use, plain text will disappear, since there is no need for it.  It doesnt't take to much to strip the tags and turn images into [img].  It's very easy to go to plain text if the software isn't designed for HTML mail.  I even saw some module for a mail server (don't remember which) that can do this automatically, on the server side (for businesses running Eudora and allow WAP email checking, which isn't good with HTML).  

The rule is pushing for more good email to go to jail.

As far as the amount of spam caught by this rule, from a visual examination of about 30 HTML emails, all are caught by:

BODY: HTML link text says "click here"
MAILTO_LINK        (0.2 points)  BODY: Includes a URL link to send an email
MAILTO_TO_SPAM_ADDR (0.6 points)  URI: Includes a link to a likely spammer email address
FORGED_MUA_OUTLOOK (1.0 points)  Forged mail pretending to be from MS Outlook
CLICK_BELOW        (0.3 points)  Asks you to click below
HTML_IMAGE_RATIO_06 (1.0 points)  BODY: HTML has a low ratio of text to image area

and other HTML based rules.  They tend to be getting the job done with detecting them.  Since spammers like HTML for a reason: Images, and links.  And SpamAssassin knows that. 
 
I must be crazy, I thought that is how SA worked.
It is, but that's what it should be relying on.  Rules that are detect spam, not a bad (or modern) email client.
been really analyzing spam by hand over the past few weeks to create a 
few bugs with some improved rule ideas.  Expect to see quite a few over 
  

Good, we like new rule ideas.
Good, then you'll get a lot of them as I analyze more mail.
Yes, the 97% catch rate that I have now is horrible...

  
That's why we tell people not to delete their messages by default.
Very true, but how many heed that advice.  And how many check their convicted spam more than once a day?  And how many look closely.  Appealing a conviction is hard.  Most serve life sentances in SpamJail.  Perhaps the justice system is corrupt, or the jury is to lazy to look good, but it seems the best approach is to zero in better on the bad spam, and keep the good out.
-- 
Robert J. Accettura
[EMAIL PROTECTED]

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to