[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/7.0] ING Direct mail FPing on TVD_ rules
I just received a mistagged-ham report from a customer showing two stock rules hit on a legitimate email from ING Direct - total score was 6.4, even with -3.5 from BAYES_00. I've asked if I can pass the message on for analysis. Stock scores: score TVD_PH_SUBJ_ACCOUNTS_POST 2.602 2.607 2.497 3.099 # n=2 score TVD_SUBJ_ACC_NUM 0.001 2.199 2.199 2.198 # n=1 I've dropped them down like so: score TVD_PH_SUBJ_ACCOUNTS_POST 1.1 1.1 1.0 1.5 score TVD_SUBJ_ACC_NUM 0.001 1.199 1.199 1.198 # n=1 The full set of hits (mostly stock rules, the ones that aren't have low advisory scores): Content analysis details: (6.4 points, 5.0 required) pts rule name description -- -- 1.0 SUBJ_YOUR_DEBT Subject contains "Your Bills" or similar 2.2 TVD_SUBJ_ACC_NUM Subject has spammy looking monetary reference 3.1 TVD_PH_SUBJ_ACCOUNTS_POST TVD_PH_SUBJ_ACCOUNTS_POST -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 0.1 PERSONAL_INFO_11 BODY: PERSONAL_INFO_11 -3.5 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 1.1 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of words 0.0 TO_EQ_FM_HTML_ONLY To == From and HTML only 0.0 TO_EQ_FM_DIRECT_MX To == From and direct-to-MX 1.7 TO_EQ_FM_HTML_DIRECT To == From and HTML only, direct-to-MX -kgd
[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/7.0] Re: How many Froms?
On Wed, 28 Apr 2010, Frank Heydlauf wrote: > Hi, > > On Wed, Apr 28, 2010 at 08:05:27PM +0100, Martin Gregorie wrote: [snip..] > >> Or could I just use a rule like: > >> > >> header From =~ /\...@.*\@/ > > This regex matches i.e. > > From: u...@example.com > > which is a common "auto expansion" of many MUAs when > no sender real-name is configured. > Just try on your on mailfolder. There's an easy fix for that FP, just use the 'From:addr =~ ' varient of the header rule. That ignores the "comment" part of the 'From:' address and only examines the stuff inside the '' part. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/7.0] Re: Filtering zip spam
Hi, > Alex, does Bayes understand/check INSIDE zips, at least for file > properties? If not, then it is inherently limited (just in this I'm not sure if you're asking me rhetorically here. I really don't know. Is it enough that bayes finds the encoded string as the attachment, and matches that against other strings or must it be expanded first into its real content? > context), which is a big part of why this is such an effective > technique. Adding that to Bayes should be relatively straight > forward, and should make zips less attractive to spammers. Almost too obvious of an addition makes me wonder why it hasn't previously been done. > One simple approach is to score all "small" zips, then meta that > with other characteristics, like ANY blocklist hit, "unusual" > nation of origin, etc. That's a good one. I'm not sure I'm at the point of writing rules to match on attachment size, however. > That's how I first handled zips, a few years ago, and it's fairly > effective. Small zips in ham are VERY unusual, and typically are Again, very obvious after you mention it that I'm surprised it's not in the default rules if you've been doing it for a while. Is there some side-effect or drawback that would prevent it from being rolled into a real SA release? > To avoid FPs, I'm using the RealName-based rules I described almost > three years ago (I have several "skip" rules daisy-chained off I'll have to locate those. Not much luck finding it after a quick search. It's not the Google "I'm feeling lucky" discussion, right? # Is this even still relevant? http://old.nabble.com/Googlepages---Livefilestore-spams-td14715808.html > Alex, as with all rules, it really depends on your ham ecology. I agree to an extent, but there is a common reference point that we all have, and I'd like to at least find that. > Feel free to share more info about yours (we need the equivalent > of the Geek Code for ham ecology!). When you first started > posting, I briefly assumed you were a college student, then > gradually realized you have decent volume and diversity. :) I appreciate that. I've been working with Linux since the beginning but not a real perl programmer. > As I mentioned in a post in January, I had noticed a consistent > value in an Image properties field which I was calculating, but > not (at the time) exporting. Is this it? # Re: pill image spam learns to walk http://marc.info/?l=spamassassin-users&m=126327771510366&w=2 Is there any progress on your work from that, which might benefit us here? > Entire zip: > - number of files > - compression ratio (i.e. across ALL files) Isn't this what the clamav and sanesecurity sigs are for? Thanks, Alex
[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/8.0] Bayes spam and ham out of proportion
I've been running spamassassin for years. I am using auto-learn with very conservative thresholds. However, after several years of usage my spam database is about three time larger than my ham database and I am starting to see false positives. Is there a way how to "shrink" the spam database? Thanks Frank --