|
I've seen different results than what you are reporting. Almost all of the hits for GIBBERISH that set off ANTIGIBBERISH are E-mails containing base64 attachments. When you see a spam trigger both of these, it's likely because it's sent in base64 and it should trip Declude's BASE64 test instead. GIBBERISHSUB has a similar problem with base64 encoding, and gives no score when it is found. Although this can be highly indicative of spam if ISO-8859 is encoded in the subject, that's a job for a different filter. These filters are designed to work within the capabilities of Declude, and while triggering multiple tests only to defeat the filters is undesirable, it is necessary. If you are looking to figure out how well they work, you literally have to pay attention to the scoring that it gives. If it gives no score, technically that's not a hit as far as the design goes. 95% of the hits on the body filter that trigger the anti test are because of base 64 encoding, which includes any E-mail with an attachment or inline attached content such as non-Western European language, occasionally a valid E-mail needlessly using that encoding, or in some cases spam that is trying to get past text filters. If you see a lot of E-mails containing base64 encoding because of non-Western European languages, then these filters will tag a lot of that E-mail, but not add score to it. The intended target is english spam that isn't base64 encoded and it works pretty well there. Matt Frederick Samarelli wrote: I assume you using all four of these items at one time. GIBBERISHSUB ANTIGIBBERISHSUB GIBBERISH ANTIGIBBERISHI have notice that almost all spam that set off GIBBERISHSUB/GIBBERISH will set off the ANTIGIBBERISHSUB/ANTIGIBBERISH making the test none productive. Fred ----- Original Message ----- From: "Matthew Bramble" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, September 15, 2003 4:29 PM Subject: [Declude.JunkMail] GIBBERISH and GIBBERISHSUB filters updatedThey're still a work in progress of course, but most of the major sources of FP's seem to have been fixed. The major changes are that the tests have both been split into two files, on for positives, and one for counterbalancing false positives. This reduces the possibility of crediting too much back to any E-mail. It also makes testing a lot easier as any test that fails the main filter, and doesn't fail the "anti" filter gets scored, those that fail both don't. The GIBBERISHSUB filter is pretty much there with the only things that I expect to add being exceptions in the ANTIGIBBERISHSUB filter. Those exemptions should be for words, acronyms and stock market symbols, and they should match the same exemptions in ANTIGIBBERISH filter. The GIBBERISH filter similarly has ANTIGIBBERISH as a counterbalance. Some things are listed in both files if they only occasionally don't tend to throw positives, which makes monitoring easier. The test will no longer interfere with BASE64 except that it will add extra score to any base64 encoded content that isn't tagged anywhere in the headers or message body as being such. This is not a bad thing because that would be very highly indicative of spam. I have also found that many spams are caught because they contain gibberish in the message boundary only. Normal mail clients use time stamps, either in decimal or hexadecimal form so they won't trip the test. Spammers also tend to create fake directories in their links that are made from gibberish, and this will detect that as well, though unfortunately, some legitimate mailers are random enough to get caught and they are being kept track of in the "anti" file. I haven't had time to massage the comments, but wanted to put this out for testing because it resolves many of the false positives. Please let me know if you have a nomination for counterbalancing measures, such as words, mail clients, bulk mailers, etc. Offending code is helpful because a literal exception might not be the best way around it. For instance, I just too care of a MS Word mail issue by exempting XML tags instead of one particular string of characters. You can download those filters plus the OBFUSCATION filter at the following locations: GIBBERISH and ANTIGIBBERISH http://www.mailpure.com/decludefilters/gibberish/Gibberish_09-15-2003.txthttp://www.mailpure.com/decludefilters/gibberish/AntiGibberish_09-15-2003.txt |
- [Declude.JunkMail] GIBBERISH and GIBBERI... Matthew Bramble
- Re: [Declude.JunkMail] GIBBERISH an... Frederick Samarelli
- [Declude.JunkMail] Any easy way Matthew Bramble
- [Declude.JunkMail] Any easy... ISPhuset Nordic / Benny Samuelsen
- Re: [Declude.JunkMail] ... Sanford Whiteman
- RE: [Declude.JunkM... ISPhuset Nordic / Benny Samuelsen
