I've seen different results than what you are reporting.

Almost all of the hits for GIBBERISH that set off ANTIGIBBERISH are E-mails containing base64 attachments.  When you see a spam trigger both of these, it's likely because it's sent in base64 and it should trip Declude's BASE64 test instead.  GIBBERISHSUB has a similar problem with base64 encoding, and gives no score when it is found.  Although this can be highly indicative of spam if ISO-8859 is encoded in the subject, that's a job for a different filter.

These filters are designed to work within the capabilities of Declude, and while triggering multiple tests only to defeat the filters is undesirable, it is necessary.  If you are looking to figure out how well they work, you literally have to pay attention to the scoring that it gives.  If it gives no score, technically that's not a hit as far as the design goes.  95% of the hits on the body filter that trigger the anti test are because of base 64 encoding, which includes any E-mail with an attachment or inline attached content such as non-Western European language, occasionally a valid E-mail needlessly using that encoding, or in some cases spam that is trying to get past text filters.

If you see a lot of E-mails containing base64 encoding because of non-Western European languages, then these filters will tag a lot of that E-mail, but not add score to it.  The intended target is english spam that isn't base64 encoded and it works pretty well there.

Matt


Frederick Samarelli wrote:
I assume you using all four of these items at one time.

GIBBERISHSUB
ANTIGIBBERISHSUB
GIBBERISH
ANTIGIBBERISH

I have notice that almost all spam that set off GIBBERISHSUB/GIBBERISH will
set off the ANTIGIBBERISHSUB/ANTIGIBBERISH making the test none productive.

Fred


----- Original Message ----- 
From: "Matthew Bramble" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, September 15, 2003 4:29 PM
Subject: [Declude.JunkMail] GIBBERISH and GIBBERISHSUB filters updated


  
They're still a work in progress of course, but most of the major
sources of FP's seem to have been fixed.

The major changes are that the tests have both been split into two
files, on for positives, and one for counterbalancing false positives.
This reduces the possibility of crediting too much back to any E-mail.
It also makes testing a lot easier as any test that fails the main
filter, and doesn't fail the "anti" filter gets scored, those that fail
both don't.

The GIBBERISHSUB filter is pretty much there with the only things that I
expect to add being exceptions in the ANTIGIBBERISHSUB filter.  Those
exemptions should be for words, acronyms and stock market symbols, and
they should match the same exemptions in ANTIGIBBERISH filter.

The GIBBERISH filter similarly has ANTIGIBBERISH as a counterbalance.
Some things are listed in both files if they only occasionally don't
tend to throw positives, which makes monitoring easier.  The test will
no longer interfere with BASE64 except that it will add extra score to
any base64 encoded content that isn't tagged anywhere in the headers or
message body as being such.  This is not a bad thing because that would
be very highly indicative of spam.  I have also found that many spams
are caught because they contain gibberish in the message boundary only.
Normal mail clients use time stamps, either in decimal or hexadecimal
form so they won't trip the test.  Spammers also tend to create fake
directories in their links that are made from gibberish, and this will
detect that as well, though unfortunately, some legitimate mailers are
random enough to get caught and they are being kept track of in the
"anti" file.

I haven't had time to massage the comments, but wanted to put this out
for testing because it resolves many of the false positives.  Please let
me know if you have a nomination for counterbalancing measures, such as
words, mail clients, bulk mailers, etc.  Offending code is helpful
because a literal exception might not be the best way around it.  For
instance, I just too care of a MS Word mail issue by exempting XML tags
instead of one particular string of characters.

You can download those filters plus the OBFUSCATION filter at the
following locations:


GIBBERISH and ANTIGIBBERISH
http://www.mailpure.com/decludefilters/gibberish/Gibberish_09-15-2003.txt

    
http://www.mailpure.com/decludefilters/gibberish/AntiGibberish_09-15-2003.txt
  
GIBBERISHSUB and ANTIGIBBERISHSUB

    
http://www.mailpure.com/decludefilters/gibberishsub/GibberishSub_09-15-2003.txt
  
http://www.mailpure.com/decludefilters/gibberishsub/AntiGibberishSub_09-15-2003.txt
  
OBFUSCATION

    
http://www.mailpure.com/decludefilters/obfuscation/Obfuscation_09-14-2003c.txt
  
Recommendations how to best obscure the files long-term would be
appreciated.  It shouldn't be anything too convoluted, like maybe a
secret handshake or something :)

Matt

    

Reply via email to