[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/7.0] ING Direct mail FPing on TVD_ rules

2010-04-28 Thread babedh-d...@biggdog.biz
I just received a mistagged-ham report from a customer showing two stock 
rules hit on a legitimate email from ING Direct - total score was 6.4, 
even with -3.5 from BAYES_00.  I've asked if I can pass the message on 
for analysis.

Stock scores:
score TVD_PH_SUBJ_ACCOUNTS_POST 2.602 2.607 2.497 3.099 # n=2
score TVD_SUBJ_ACC_NUM 0.001 2.199 2.199 2.198 # n=1

I've dropped them down like so:
score TVD_PH_SUBJ_ACCOUNTS_POST 1.1 1.1 1.0 1.5
score TVD_SUBJ_ACC_NUM 0.001 1.199 1.199 1.198 # n=1


The full set of hits (mostly stock rules, the ones that aren't have low 
advisory scores):

Content analysis details:   (6.4 points, 5.0 required)

  pts rule name  description
 -- 
--
  1.0 SUBJ_YOUR_DEBT Subject contains "Your Bills" or similar
  2.2 TVD_SUBJ_ACC_NUM   Subject has spammy looking monetary reference
  3.1 TVD_PH_SUBJ_ACCOUNTS_POST TVD_PH_SUBJ_ACCOUNTS_POST
-0.0 T_RP_MATCHES_RCVD  Envelope sender domain matches handover relay
 domain
  0.1 PERSONAL_INFO_11   BODY: PERSONAL_INFO_11
-3.5 BAYES_00   BODY: Bayes spam probability is 0 to 1%
 [score: 0.]
  0.0 HTML_MESSAGE   BODY: HTML included in message
  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
  1.1 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of 
words
  0.0 TO_EQ_FM_HTML_ONLY To == From and HTML only
  0.0 TO_EQ_FM_DIRECT_MX To == From and direct-to-MX
  1.7 TO_EQ_FM_HTML_DIRECT   To == From and HTML only, direct-to-MX

-kgd



[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/7.0] Re: How many Froms?

2010-04-28 Thread babedh-d...@biggdog.biz
On Wed, 28 Apr 2010, Frank Heydlauf wrote:

> Hi,
>
> On Wed, Apr 28, 2010 at 08:05:27PM +0100, Martin Gregorie wrote:
[snip..]
> >> Or could I just use a rule like:
> >>
> >> header From =~ /\...@.*\@/
>
> This regex matches i.e.
>
> From: u...@example.com 
>
> which is a common "auto expansion" of many MUAs when
> no sender real-name is configured.
> Just try on your on mailfolder.

There's an easy fix for that FP, just use the 'From:addr =~ '
varient of the header rule. That ignores the "comment" part
of the 'From:' address and only examines the stuff inside
the '' part.

-- 
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{



[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/7.0] Re: Filtering zip spam

2010-04-29 Thread babedh-d...@biggdog.biz
Hi,

> Alex, does Bayes understand/check INSIDE zips, at least for file
> properties?  If not, then it is inherently limited (just in this

I'm not sure if you're asking me rhetorically here. I really don't
know. Is it enough that bayes finds the encoded string as the
attachment, and matches that against other strings or must it be
expanded first into its real content?

> context), which is a big part of why this is such an effective
> technique.  Adding that to Bayes should be relatively straight
> forward, and should make zips less attractive to spammers.

Almost too obvious of an addition makes me wonder why it hasn't
previously been done.

> One simple approach is to score all "small" zips, then meta that
> with other characteristics, like ANY blocklist hit, "unusual"
> nation of origin, etc.

That's a good one. I'm not sure I'm at the point of writing rules to
match on attachment size, however.

> That's how I first handled zips, a few years ago, and it's fairly
> effective.  Small zips in ham are VERY unusual, and typically are

Again, very obvious after you mention it that I'm surprised it's not
in the default rules if you've been doing it for a while. Is there
some side-effect or drawback that would prevent it from being rolled
into a real SA release?

> To avoid FPs, I'm using the RealName-based rules I described almost
> three years ago (I have several "skip" rules daisy-chained off

I'll have to locate those. Not much luck finding it after a quick
search. It's not the Google "I'm feeling lucky" discussion, right?

# Is this even still relevant?
http://old.nabble.com/Googlepages---Livefilestore-spams-td14715808.html

> Alex, as with all rules, it really depends on your ham ecology.

I agree to an extent, but there is a common reference point that we
all have, and I'd like to at least find that.

> Feel free to share more info about yours (we need the equivalent
> of the Geek Code for ham ecology!).  When you first started
> posting, I briefly assumed you were a college student, then
> gradually realized you have decent volume and diversity. :)

I appreciate that. I've been working with Linux since the beginning
but not a real perl programmer.

> As I mentioned in a post in January, I had noticed a consistent
> value in an Image properties field which I was calculating, but
> not (at the time) exporting.

Is this it?

# Re: pill image spam learns to walk
http://marc.info/?l=spamassassin-users&m=126327771510366&w=2

Is there any progress on your work from that, which might benefit us here?

> Entire zip:
>    - number of files
>    - compression ratio (i.e. across ALL files)

Isn't this what the clamav and sanesecurity sigs are for?

Thanks,
Alex



[Copfilter] Copy of quarantined email - *** SPAM *** [8.9/8.0] Bayes spam and ham out of proportion

2010-04-29 Thread babedh-d...@biggdog.biz
I've been running spamassassin for years.  I am using auto-learn with very
conservative thresholds.  However, after several years of usage my spam
database is about three time larger than my ham database and I am starting
to see false positives.

Is there a way how to "shrink" the spam database?

Thanks
Frank

--