Botnet spam not being caught

2009-06-13 Thread MySQL Student
Hi all,

I'm using SA-3.2.5 on Linux and my system is being deluged with spam that
isn't being caught, apparently from botnets. I'm using botnet-0.7. The
subject is random and the "Received from" header is always an unresolvable
IP. Is there a more robust botnet plugin that may be more effective?
Botnet-v08 was catching too many FPs. (score too high). The body is also
quite random -- enough so as to keep bayes usually at 50 or less.

Is there a later version of SA that's stable?

Here's the relevant headers:

Received: from [78.97.185.89] (unknown [78.97.185.89])
Message-ID: 
Subject: Where is this bar?
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit
Date: Sat, 13 Jun 2009 04:05:44 -0400 (EDT)
X-Virus-Scanned: by amavisd-new at mydomain.com
X-Spam-Status: No, hits=4.9 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, BOTNET, HTML_MESSAGE, MIME_HTML_ONLY, RDNS_NONE,
URIBL_BLACK
X-Spam-Level: 

The body is HTML and contains the following:

Click here to view this message as a web page.

Copyright © 2002-2009 by the Pyahqql, Inc.
All rights reserved.

Click here if this picture is blocked

Home  |  Contact Us  |  Privacy Policy  |  Terms of Use | Unsubscribe |

Where can I go from here?

Thanks,
Alex


Re: Botnet spam not being caught

2009-06-13 Thread MySQL Student
Hi John,

Botnet seems to have caught that just fine (it's listed in the rules
> which were triggered).  The problem is either that you're running it
> at a lower score (which you could also do for Botnet0.8 if you wanted
> to upgrade -- their default scores are exactly the same), or you need
> other rules/configs to supplement your overall scoring system.


Yes, I didn't intend to blame it on botnet; I realize the rule is being
triggered.

I guess I was concerned about raising the score above my current 1.5, and
was thinking that instead some other rule was available, or being used by
someone on the list, in conjunction with botnet to catch these.

If not, can you recommend an approach on calculating the right score for
botnet for my environment, so it doesn't tag so many FPs, or what an
appropriate value should be with my threshold being set to 5.0?

Thanks again,
Alex


Re: Botnet spam not being caught

2009-06-13 Thread MySQL Student
Hi Charles,

Received: from [78.97.185.89] (unknown [78.97.185.89])
>> Message-ID: 
>>
>
> Do they all have message ID's that include the IP?


Yeah, great, it looks like they all do. Would something like this work?

header MYMSGIPMessage-ID =~ /78.97.185.89/
score   MYMSGIP0.3
describe   MYMSGIPMessage-ID from botnet

Can someone help to write a rule that wildcards this safely?

> Also give a bit mroe score to the RDNS rules

Yeah, great idea. It's currently only 0.1.

I also see BOTNET_NORDNS in Botnet.cf, but it isn't being triggered. It's
also weighted at 0.0. Is there a reason for this?

> You also might want to block that line that says "if picture is blocked".

There's a couple of variations, but this also looks like it would work well.

Thanks,
Alex


Debugging and scripting

2009-06-18 Thread MySQL Student
Hi. I'm relatively new to spamassassin and perl scripting, and I must
already be doing a few things wrong that I hoped the list could help me to
solve. I'm receiving the following output when running "spamassassin -D <
spam-test.txt 2>&1|less'

[32692] warn: Number found where operator expected at (eval 607) line 1,
near "0  0"
[32692] warn:  (Missing operator before 0?)

Where is this coming from? Perhaps local.cf, but where? It's not line 607.

I'm also having a problem with one of my rules:

[32692] info: config: invalid expression for rule LOCAL_XPS: "Subject =~
/Free\ DELL\ XPS/i": syntax error

Here is the full rule:

meta   LOCAL_XPSSubject =~ /Free\ DELL\ XPS/i
score  LOCAL_XPS1.5
describe   LOCAL_XPSRule by AS: XPS Dell

Do I need the backslashes to escape the spaces? Will that match that pattern
anywhere on the line, or only
that text on the line?

Can you explain to me the meaning of '(.+)' as in:

header LOCAL_RULE1  Subject =~ /(.+)Spam\ Sample(.+)/i
score LOCAL_RULE1   5.0
describe LOCAL_RULE1Subject Spam Sample

How about without the parens?

I believe this is somehow causing emails to hit the "MISSING_SUBJECT" rule,
even though the email clearly
has a subject.

Any help greatly appreciated.
Thanks,
Alex


Re: Debugging and scripting

2009-06-18 Thread MySQL Student
Hi Dan,

> Do I need the backslashes to escape the spaces?
>
> no, although \s would be fine.
>

Okay, so either \s or nothing at all works just the same?


> this can be much more effectively written as:
> /.spam\ssample./i


> That will match the words "spam sample" in the subject as long as there
> is at least 1 character before and one after.


But you had previously written that /Spam Sample/ will also match that text
anywhere on the line. Is that not the case?

Thanks again,
Alex


Re: Debugging and scripting

2009-06-19 Thread MySQL Student
Hi Matus (and list :-)


> I'm not Dan. This is a mailing list. Meny people read it and many can
> respond your mail.


Yes, thanks, I had responded to him directly and probably didn't need to,
but the reply-to must not be set to the list address?

/spam sample/ will match the test anywhere on line.
> /.spam sample./ will match the text anywhere on line, except the begin and
> the end, since it must be preceded by at least one character.
>
> /(.+)spam sample(.+)/ will match exactly the same, but the match will be
> slower since the (.+) will need to compare all text before/after the "spam
> sample" and store them both to capture buffers.


Okay, that's great. Thanks so much for your help.

Best regards,
Alex


BAYES_99 score & lint

2009-06-22 Thread MySQL Student
Hi all,

When I run "spamassassin -D --lint", I receive this output:

[14406] info: rules: meta test LOCAL_BAYES_RTF has dependency 'BAYES_99'
with a zero score

Which is it saying has a zero score?

BAYES_99 in 50_scores.cf is shown as:

score BAYES_99 0 0 3.5 3.5

The LOCAL_BAYES_RTF is a meta rule that combines BAYES_99 with a mimeheader
rule with 0.1 score that catches RTF files.

Ideas greatly appreciated.

Thanks,
Alex


Re: BAYES_99 score & lint

2009-06-22 Thread MySQL Student
>
>
> Post your entire scoring block for LOCAL_BAYES_RTF


meta   LOCAL_BAYES_RTF(BAYES_99 && LOCAL_CTYP_RTF)
score  LOCAL_BAYES_RTF 1.5
describe   LOCAL_BAYES_RTF Rule by AS: Probably an Inline RTF spam

mimeheader LOCAL_CTYP_RTFContent-Type =~
/^application\/octet-stream.\.rtf/i
score  LOCAL_CTYP_RTF0.1
describe   LOCAL_CTYP_RTFRule by AS: Content-Type: RTF

I also looked a bit further, and don't see where else BAYES_99 might be
redefined, and I'm sure that it's scoring above zero.

Is there a way to print out all the rules with their scores?

Thanks,
Alex


SA & amavisd & scanning attachments

2009-07-02 Thread MySQL Student
Hi,

I'm not sure this is an SA question specifically, but perhaps an amavisd-new
question that I hoped someone could help me to answer.

I'm using amavisd-new, postfix, and spamassassin for multiple domains. I'd
like to know if it's possible to permit per-domain forwarding of certain
attachment types while stripping others?

It appears $banned_filename_re is sitewide, but I thought there might be
another way to permit attachments on a per-domain basis?

Thanks,
Alex


Re: perms problems galore

2009-07-04 Thread MySQL Student
Hi,

I guess I have more of a general sa-update question. I have sa-update
running against updates.spamassassin.org and these others:

70_sare_stocks.cf.sare.sa-update.dostech.net
70_sc_top200.cf.sare.sa-update.dostech.net
70_sare_adult.cf.sare.sa-update.dostech.net
90_2tld.cf.sare.sa-update.dostech.net

They never seem to update, however. Am I doing something wrong? Are there
others I should consider?

Thanks,
Alex

On Fri, Jul 3, 2009 at 11:05 PM, Gene Heskett wrote:

> Greetings all;
>
> I _thought_ I had sa-update running ok, but it seemed that the
> effectiveness
> was stagnant, so I found the cron entry that was running as-update &
> discovered a syntax error there, which when I fixed it, disclosed that I
> had
> all sorts of perms problems that I don't seem to be able to fix readily.
>
> sa-update is being run as the user saupdate, which is a member of the group
> mail.  I have made the whole /var/lib/spamassassin/keys tree an
> saupdate:mail,
> with very limited rights as in:
> drw--- 2 saupdate mail 4096 2008-12-19 16:05 keys
>
> But sa-update appears not to have perms to access or create gpg keys there.
> --
>


Spam troubleshooting

2009-07-04 Thread MySQL Student
Hi all,

I am stuck trying to figure out why the attached spam isn't caught properly.
In fact, BAYES_99 isn't flagged
and I know it should be, and the total score is 0.0, despite several rules
being flagged. The LOCAL_BODY_1577053434 and LOCAL_BODY_4046600451 both
catch the phone numbers and have a 2.01 value.

The X-MailCleaner headers were there when I received the email. I've
obfuscated our customers domain for security.

Any ideas greatly appreciated. Where can I start? Am I doing something wrong
or is there something in the header that is reducing the score?

Thanks so much.
Best regards,
Alex


phone-spam-out.txt.gz
Description: GNU Zip compressed data


Re: Spam troubleshooting

2009-07-05 Thread MySQL Student
Hi,

spamassassin 2>&1 -D --lint
>
> search here for missing perl modules


How effective are razor/pyzor and SPF/DKIM? I've always been a bit hesitant
to use any of those.

and the spam mail have all_trusted ?, you trust a spammer in
> trusted_networks


trusted_networks isn't at all defined. It looks like it was previously
defined with just 127.0.0.1, but it's now commented out. What should it be?
You are referring to the spamassassin trusted_networks, not postfix, right?

Thanks,
Alex


Re: Spam troubleshooting

2009-07-05 Thread MySQL Student
Hi again,

and the spam mail have all_trusted ?, you trust a spammer in
> trusted_networks


I meant to add, how can I determine which IP it was that is being trusted,
anyway?

Thanks again,
Alex


Spam gathering contact details

2009-07-05 Thread MySQL Student
Hi,

I'm receiving a lot of spam that I can't catch containing fields where the
recipient is supposed to enter their contact details, like this:

Full Legal Name :
Address :
City :
State :
Zip code :
Country :
Nationality :
Home and Cell # :

I've added specific rules that look for, say /Full Legal Name :/, but it
otherwise only hits BAYES_99. Does anyone have any suggestions for catching
these more effectively?

Thanks,
Alex


Re: Spam gathering contact details

2009-07-05 Thread MySQL Student
Hi,

...actually, the rules sandbox in svn has been rearranged a bit since that
> announcement. The current ruleset lives here:
>
>
> http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/20_fillform.cf
>
> The updated ReplaceTags.pm is available at:
>
>
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/ReplaceTags.pm


Okay, I've updated both, and it's already catching some. Thanks so much,
John.

However, it still doesn't catch the original one due to score too low,
although it's close. It only catches FILL_THIS_FORM_SHORT/LONG and the rule
I created for /Full Legal Name/. I've added it to bayes, but for some reason
isn't being tagged.

How did you determine the scores for FILL_THIS_FORM? How safe would it be to
raise each by 0.5?

Thanks,
Alex


Re: Spam troubleshooting

2009-07-05 Thread MySQL Student
Hi,

ALL_TRUSTED is a bit odd. If you you look back through the debug, it
> has identified untrusted relays:
>
> [11689] dbg: metadata: X-Spam-Relays-Untrusted: [ ip=194.230.33.137
> rdns=mx.xm-rz.net helo=mail.xm-rz.net by=myhost.mydomain.com ident=
> envfrom= intl=0 id=B94C2118004 auth= msa=0 ] [ ip=62.2.104.4 rdns=


Yes, after noticing xm-rz and t-p.com in 'Received:' headers on several of
these, I've since added a header rule to add points for those relays. Is
this the proper way to do it?

header LOCAL_RECVD_TP   Received =~ /.\.t-p\.com/
score  LOCAL_RECVD_TP   3.6
describe   LOCAL_RECVD_TP   Recvd from botnet

Thanks,
Alex


Re: Spam troubleshooting

2009-07-05 Thread MySQL Student
Hi again,

I have more information on those untrusted hosts.

ALL_TRUSTED is a bit odd. If you you look back through the debug, it
>> has identified untrusted relays:
>>
>> [11689] dbg: metadata: X-Spam-Relays-Untrusted: [ ip=194.230.33.137
>> rdns=mx.xm-rz.net helo=mail.xm-rz.net by=myhost.mydomain.com ident=
>> envfrom= intl=0 id=B94C2118004 auth= msa=0 ] [ ip=62.2.104.4 rdns=
>
>
Now, for some reason, when I run this spam through SA, I see this:

X-Spam-Report:
* -4.0 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/
,
*  medium trust
*  [194.230.33.137 listed in list.dnswl.org]
*  0.0 STOX_REPLY_TYPE STOX_REPLY_TYPE
*  3.6 LOCAL_RECVD_TP Recvd from botnet
*  3.6 LOCAL_RECVD_XM Recvd from botnet
*  2.0 LOCAL_BODY_4046600451 BODY: This message contained the string
*  "1.845.709.8044"
*  2.0 LOCAL_BODY_1577053434 BODY: This message contained the string
*  "845.709.8044"
X-Spam-Status: Yes, score=7.2 required=5.0 tests=LOCAL_BODY_1577053434,

LOCAL_BODY_4046600451,LOCAL_RECVD_TP,LOCAL_RECVD_XM,RCVD_IN_DNSWL_MED,
STOX_REPLY_TYPE shortcircuit=no autolearn=disabled version=3.2.5

What the hell is RECVD_IN_DNSWL_MED and why is it trusted in dnswl.org?

Thanks,
Alex


Re: Spam troubleshooting

2009-07-06 Thread MySQL Student
Hi,

have any of you tryed going to dnswl.org homepage ?, even tryed to lookup
> the ip ?, got refused submit of new ticket ?


Yes, I went to the site, but didn't try to resolve either of them because I
knew they were already on the list. They now appear to no longer be on the
list. Now I know to submit a ticket.

Thanks,
Alex


Eliminating unnecessary rules

2009-07-22 Thread MySQL Student
Hi,

I have created a routine where I can enter a string into a text file
and it gets converted into a set of rules that form a cf file. They
are all of the form LOCAL_RULE_N, where N is a random 6-digit number.
Two points are added if the rule is triggered. There are now about
3800 of these rules, dating back chronologically about a year or so.

I've learned a lot over the past year, and I now think some of these
patterns may be catching valid mail, so I'd like to figure out how
best to prune at least the ones that are no longer triggered or are
triggered but don't cause the email to become spam. IOW, the message
would be spam regardless of whether the rule fired.

What is the best way to do this? An awk script on mail.log over the
past few weeks? How can I wildcard the script with so many rules, and
when they have random numbers at the end?

I'm still surprised how many are hitting for things like "Acai Berry"
or "PO Box 1845 | Ft. Worth | TX", for example.

Thanks for any ideas.
Alex


Re: Spam troubleshooting

2009-07-22 Thread MySQL Student
>> How effective are razor/pyzor and SPF/DKIM?
>
> very effective, razor/pyzor altogether with DCC.
>
> SPF also helps much, although it should be implemented at SMTP level and
> refuse all messages that cause (hard) fail.
>
> While DKIM is currently in SA, the only place it currently applies is
> whitelisting, since it has scores of +/-0.001. Different scores were
> mentioned here, but not incorporated into SA scores yet.
>
>> I've always been a bit hesitant
>> to use any of those.
>
> Why?

Because how often do spammers have DNS entries with valid SPF or DKIM
information? How often do spammers use compromised hosts with valid
SPF or DKIM information?

Will they help with emails that only contain a random URL and a line
or two of text, like:

: Get your Nursing Degree here
http://spamsite.com/

Or would that be DCC? Often times these types of emails get through,
apparently before the URL is listed in spamcop, SURBL, or URIBL_BLACK?

Can I also ask where the best place to start with to implement razor
and/or pyzor in SA3.2 on Linux with postfix?

Thanks,
Alex


Re: boosting PBL score suggestions

2009-07-22 Thread MySQL Student
> when that was set a couple of years back, PBL had a few FPs -- the FP
> rate has dropped greatly since then, going by recent ruleqa results.
> go ahead and bump it up.

I just checked many of my FPs that have RCVD_IN_PBL, and increasing
the score there would sure help me too! Thanks for spotting that,
Aaron.

Best,
Alex


whitelist_from questions

2009-07-22 Thread MySQL Student
Hi all,

Some time ago someone had mentioned to never use whitelist_from but
instead use whitelist_from_rcvd. Where is whitelist_from_rcvd
documented? It doesn't appear in the SA docs in the same place that
whitelist_from is listed.

So, forever I have been using whitelist_from and have probably a
thousand entries. Given that it doesn't appear to be well documented,
Is it okay to do a one-to-one translation of my whitelist_from rules
to whitelist_from_rcvd?

Do these entries have to be in local.cf, or can I create a
whitelist_from.cf file to place them in?

Thanks,
Alex


Re: whitelist_from questions

2009-07-22 Thread MySQL Student
> It is documented on the Mail::SpamAssassin::Conf man page just like
> whitelist_from.

Ugh, thanks.

> whitelist_from_rcvd a...@lists.sourceforge.net sourceforge.net
> Use this to supplement the whitelist_from addresses with a check against the
> Received headers. The first parameter is the
> address to whitelist, and the second is a string to match the relay’s rDNS.

Okay, so for example if I was going to whitelist j...@orbitz.com, the
appropriate line would be:

whitelist_from_rcvd j...@orbitz.com psmtp.com

psmtp.com is the domain that controls mail for orbitz, according to
the MX records.

Thanks,
Alex


Re: sa-stats.pl and SpamAssassin 3.2.4

2009-07-22 Thread MySQL Student
Hi,

Are spamd and amavisd-new mutually exclusive?

I'm also trying to use sa-stats.pl, and it is reporting zeros because
I've just learned it relies on spamd, which I'm apparently not using.

Here is the relevant log information from line in my mail.log:

Jul 22 00:01:24 mail02 amavis[30729]: (30729-266) SPAM,
 -> , Yes, hits=40.6
tag1=-300.0 tag2=5.0 kill=5.0 use_bayes=1 tests=BAYES_99, BODY_8BITS,
BOTNET, FORGED_YAHOO_RCVD, FROM_ILLEGAL_CHARS, HEAD_ILLEGAL_CHARS,
HTML_IMAGE_RATIO_02, HTML_MESSAGE, HTML_TAG_BALANCE_BODY,
MIME_HTML_ONLY, MIME_HTML_ONLY_MULTI, MPART_ALT_DIFF, MSGID_RANDY,
RCVD_DOUBLE_IP_LOOSE, RCVD_HELO_IP_MISMATCH, RCVD_IN_XBL,
RCVD_NUMERIC_HELO, RDNS_NONE, REPTO_QUOTE_YAHOO,
SUBJECT_NEEDS_ENCODING, SUBJ_ILLEGAL_CHARS, TVD_RCVD_IP, TVD_RCVD_IP4,
quarantine spam-d55bdeb21a3775a8f250921df74e14d7-20090722-000123-30729-266
(spam-quarantine)

Jul 22 00:01:24 mail02 amavis[30729]: (30729-266) TIMING [total 785
ms] - SMTP EHLO: 1 (0%), SMTP pre-MAIL: 1 (0%), create email.txt: 0
(0%), SMTP pre-DATA-flush: 1 (0%), SMTP DATA: 80 (10%), body hash: 0
(0%), mime_decode: 6 (1%), get-file-type: 13 (2%), decompose_part: 1
(0%), parts: 0 (0%), AV-scan-1: 4 (0%), AV-scan-2: 6 (1%), SA msg
read: 13 (2%), SA parse: 2 (0%), SA check: 519 (66%), write-header: 25
(3%), save-to-local-mailbox: 8 (1%), delete email.txt: 105
(13%),unlink-1-files: 0 (0%), rundown: 0 (0%)

Can sa-stats.pl be configured to parse this output? Other ideas?

Thanks,
Alex


Re: Lotto/Money & email address spam

2009-07-22 Thread MySQL Student
> Please use pastebin.

Yes, will do, thanks.

>>It hit BAYES_99, but that's it. Are there any rules that pertain to
>>'loan' or this type of mail that can somehow block these?
>
> FreeMail.pm and the SOUGHT_FRAUD rules.

Some time ago you were speaking about the AOL tunome.com freemail
domain, and that Dan was going to create an updated list. Any progress
on that?

I thought FreeMail was part of SA proper, but apparently not. Who
maintains that, and how do I find it?

I found the SOUGHT_FRAUD rules in jm's sandbox. Are those the proper
ones to use? Are the testing ones safe?

Thanks,
Alex


Re: Lotto/Money & email address spam

2009-07-22 Thread MySQL Student
Hi,

>> I found the SOUGHT_FRAUD rules in jm's sandbox. Are those the proper ones
>> to use? Are the testing ones safe?
>
> Subscribe your sa-update to the sought rules channel. The reulsets are
> regenerated too often for manual maintenance to be feasible.

Okay, I have configured sa-update to download the following rulesets:

70_sare_stocks.cf.sare.sa-update.dostech.net
70_sc_top200.cf.sare.sa-update.dostech.net
sought.rules.yerp.org
updates.spamassassin.org

Do people have a script that lints the rules, copies them to
/etc/mail/spamassassin/ and restarts amavisd? SA should automatically
pick up on the new rules, correct?

I'm somewhat concerned about there being some type of error and SA
failing, or a typo in a rule that is now catching all my mail as spam?

Also, the system this is running on has a really old compiler and
glibc that are incompatible with sa-compile. Can the rules be compiled
on another system and migrated to the server where SA is running? An
upgrade is planned for late in the year, but it's just too involved to
do now :-(

Thanks,
Alex


Re: Spam troubleshooting

2009-07-22 Thread MySQL Student
>> Can I also ask where the best place to start with to implement razor
>> and/or pyzor in SA3.2 on Linux with postfix?
>
> EHM? implement it on your mailserver...

Heh, no, I mean where can I go to learn how to implement it? Where's
the docs? :-)

I think I'm headed towards razor first, as it doesn't require python
and appears to be simpler and more effective, even?

Thanks,
Alex


URL Block Lists

2009-07-22 Thread MySQL Student
Hi,

What is the preferred list of URL block lists that everyone uses? I'm
currently using SURBL and a few others, often times there are URLs
like 'learningbetter.net' that isn't tagged.

We've set up our own internal URL block list that gets trained
manually by inspecting email visually, until the URL is added to URIBL
or SURBL, but I must be missing something, because lately there are
far too many not being tagged.

Thanks,
Alex


Re: Lotto/Money & email address spam

2009-07-22 Thread MySQL Student
>> I thought FreeMail was part of SA proper, but apparently not. Who
>> maintains that, and how do I find it?
>
> You need three files:
> http://sa.hege.li/FreeMail.pm
> http://sa.hege.li/FreeMail.cf
> http://sa.hege.li/freemail_domains.cf
>
> And it's also worthwhile to add the
> 90_sare_freemail.cf.sare.sa-update.dostech.net channel to sa-updates

To update my previous post, I've now also added the 90_sare_freemail channel.

Wouldn't it be more efficient or effective to combine the two lists,
90_sare_freemail and freemail_domains?

Thanks for putting up with my newbie questions.
Best,
Alex


Re: Lotto/Money & email address spam

2009-07-23 Thread MySQL Student
Hi,

>> Please don't paste examples to this list.
>>
>> Please post them to pastebin (or a similar service) and then include the
>> link.
..

Yes, understood. FWIW, I know enough to not post an entire message
with headers to the list -- I'm sure half the time it would be
filtered anyway. This time it was just a snippet, but in the future
I'll post even those online, too.

Thanks,
Alex


Re: Lotto/Money & email address spam

2009-07-23 Thread MySQL Student
Hi,

> sa-update lint checks the rules in a sandbox, and does not update the
> local channel, if there are any issues. Moreover, do NOT copy these
> updates to your site config dir -- but keep it in the update dir where
> sa-update puts them [1]. SA knows how to use them instead of the
> "install-time" default conf.

Okay, great. That is what I have now done. I actually have multiple
mail servers, none of which have direct access to the Internet other
than inbound SMTP, so I have sa-update running on another box, which
creates a tarball, which is then scp'd to the mail servers and
extracted.

For me, this now means the sa-update channels are in
/var/lib/spamassassin/3.0005/ and my local site-config is
/etc/mail/spamassassin, where local.cf and init.pre reside.

I also spent much of the day reading docs. I've worked with Linux now
for many years, and have been involved with SA, just not to the level
that I'm involved now.

> It's a rather bizarre picture I'm sensing here. From your recent posts I
> understand you are running a mail server for a large organization. Yet
> there is this cannonade with rather basic questions...

guenther, I knew you were a smart guy :-)

Yes, there is a bigger picture; hopefully I get some cred for trying
to tackle this on my own (with the help of others more experienced).

Anyway, I'm trying to use sa-update to install the SOUGHT rules, and
linting them shows this:

[17021] warn: config: invalid regexp for rule __SEEK_AY2NNY: /This
place is so exclusive, how did you get an invite\x{e2}\x{80}\x{a6} /:
/This place is so exclusive, how did you get an
invite\x{e2}\x{80}\x{a6} /: Can't use \x{} without 'use utf8'
declaration

I'm using perl-5.6.0; is that the cause?

Thanks again,
Alex


Re: whitelist_from questions

2009-07-23 Thread MySQL Student
Hi,

> Firstly, before you convert all these to whitelist_from_rcvd, perhaps you
> ought to ask yourself whether you really need 1000 entries on your
> whitelist.

I'm surprised you were the first to make that very comment, so thanks.

> Does mail from these addresses actually get miscategorised as
> spam, or would SA get it right without the whitelist?

Mail was being tagged as spam, and the organization became concerned
that others would be tagged, so it seemed anytime there was a
high-profile external business contact that they couldn't risk being
tagged, they had it added to the whitelist.

The list used to be much larger until we spent quite a while (months
and months) going through it with them to prune it.

I don't doubt that if we removed a substantial amount of them that SA
would do what's right, but there doesn't seem to be any scientific way
to do that successfully.

> Secondly, don't forget about whitelist_from_spf. If a domain has an SPF
> record, this is a better solution than whitelist_from_rcvd as it avoids the
> need for *you* to work out which are the outgoing servers.

Is there a way to script that for the 1000 or so entries, to see which
have SPF records?

> Lastly, if you do use whitelist_from_rcvd, remember that there may be
> multiple outgoing servers for a given domain, and worse they may change over
> time.

Yeah, I thought of that too, so it doesn't sound like that's going to
work well here.

Thanks,
Alex


Re: Low Scoring Lotto Spam

2009-07-27 Thread MySQL Student
Hi,

>        *  3.0 RCVD_IN_UCEPROTECT2 RBL: Received via a relay in
>        *      dnsbl-2.uceprotect.net
>        *      [81.202.69.68 listed in dnsbl-2.uceprotect.net]
>        *  2.0 RCVD_IN_UCEPROTECT3 RBL: Received via a relay in
>        *      dnsbl-3.uceprotect.net
>        *      [81.202.69.68 listed in dnsbl-3.uceprotect.net]

How successful have you been with the UCEPROTECT lists? Seems like a
nice project. How come more people aren't using it?

IOW, you seemed to be the only one of the four or five people that
posted their output from this lotto spam. Why such a disparity in the
rules that people use?

Thanks,
Alex


Re: whitelist_from questions

2009-07-27 Thread MySQL Student
Hi,

I'm looking an email that appears to be one of the users from the
whitelist, but instead was from:

   From probesqt...@segunitb1.freeserve.co.uk  Mon Jul 27 19:49:19 2009

Why can't a comparison be made between the "From:" info and the actual
sender? Is this because of virtual domains and/or users?

Thanks,
Alex


Upgrading perl modules for SA

2009-07-30 Thread MySQL Student
Hi,

I recently upgraded perl from 5.6.0 to perl-5.10.0, along with all the
modules necessary for sa-3.2.5 and amavisd-new (an old version still).
I'm now having a problem that I really don't understand:

Jul 30 14:24:30 bigship amavis[1757]: (01757-175) TROUBLE in
check_mail: decoding2-get-file-types FAILED: 'file' utility
(/usr/bin/file) failed, status=1 (256 ) at /usr/sbin/amavisd line
4019.

Jul 30 14:24:30 bigship amavis[1757]: (01757-175) PRESERVING EVIDENCE
in /var/amavis/amavis-20090730T142430-01757

The amavisd children are running as a regular user. When I su to that
user and run "/usr/bin/file" with the files listed above, it
successfully returns the correct type of file. The lines in amavisd
surrounding 4019 are:

$file ne '' or die "Unix utility file(1) not available, but is needed";
for my $part (@$partslist) {
my($filename) = "$tempdir/parts/$part";
my($filetype) = '';
my($proc_fh) = run_command(undef, undef, $file, $filename);
while( defined($_ = $proc_fh->getline) ) { $filetype .= $_ }
my($err); $proc_fh->close or $err=$!; my($ret) = retcode($?);
 <= 4019
$ret==0 or die "'file' utility ($file) failed, status=$ret ($? $err)";

chomp($filetype); my($taint) = substr($filetype,0,0);
# remove file name
$filetype = $1.$taint  if $filetype=~/^.+?:[\t ](.*)$(?!\n)/s;
section_time('get-file-type');
local($_) = $filetype;  my($ty);

# try to classify some common types and give them short type name
# _last_ match wins!

Running spamassassin --lint returns no errors or warnings. Amavis
complains that I'm missing a few modules, like SPF, DKIM, and
IO::Socket::SSL, but I don't think they're related, and I guess they
weren't on there before when it was working fine.

Thanks,
Alex


Re: Upgrading perl modules for SA

2009-07-30 Thread MySQL Student
Hi,

>> check_mail: decoding2-get-file-types FAILED: 'file' utility
>> (/usr/bin/file) failed, status=1 (256 ) at /usr/sbin/amavisd line

> How's this a SA question?

Yes, my apologies. I don't know enough about amavis yet, and thought
it may be related to all the modules I upgraded, and not amavis
itself. I've since reverted my changes back to perl-5.6.0, and going
to subscribe to that list too.

I also upgraded Berkeley DB to db4 and have left db3, db2, and db1 on
the system too. However, now I'm having a problem with bayes:

[10496] dbg: bayes: tie-ing to DB file R/O /home/sscan/.spamassassin/bayes_toks
[10496] dbg: bayes: tie-ing to DB file R/O /home/sscan/.spamassassin/bayes_seen
[10496] dbg: bayes: found bayes db version 0
[10496] warn: bayes: bayes db version 0 is not able to be used,
aborting! at /usr/lib/perl5/site_perl/5.6.0/Mail/SpamAssassin/BayesStore/DBM.pm
line 196.

I guess I don't understand the logic, because around 196 is the
following, which appears to say that if $self->_check_db_version
doesn't equal zero, then fail, but we know it equals version zero from
what is stated above...

  $self->{db_version} = ($self->get_storage_variables())[6];
  dbg("bayes: found bayes db version ".$self->{db_version});

  # If the DB version is one we don't understand, abort!
  if ($self->_check_db_version() != 0) {
warn("bayes: bayes db version ".$self->{db_version}." is not able
to be used, aborting!");
$self->untie_db();
return 0;
  }

Thanks,
Alex


Bayes training

2009-08-03 Thread MySQL Student
Hi,

We have accumulated quite a large list of whitelisted users, primarily
because they were previously tagged incorrectly. I've extracted a copy
of all whitelisted mail into a separate mbox.

Certainly there is some spam in there as well, but assuming I only
learn the ham, would it make sense to train bayes using the emails
from this folder? It's all business-related, but I'm concerned that it
may have things in the email that caused it to be tagged in the first
place, like excessive HTML, sent from a host with no reverse DNS, etc.
-- all the reasons for it being whitelisted in the first place.

Looking at the logs before the addresses were added to the whitelist,
I see quite a few that were BAYES_99, probably because they resemble
mailing lists, such as those from networkworld, for example. IOW, I
wouldn't want to whitelist an email from networkworld.com, but one of
the company's partners could send the company an email that had many
of those characteristics.

Someone may also send them a one-line email with a small GIF as an
attachment, such as their corporate logo in their signature. This
would be a valid email, but also very much resembles the
characteristics of a typical spam.

This is all being done to hopefully train bayes to better recognize
corporate email, and hopefully cut down on the number of whitelisted
senders that must be added in the future (or, corporate email that
gets tagged then must be whitelisted).

Ideas greatly appreciated.
Thanks,
Alex


Upgrading bayes DB

2009-08-04 Thread MySQL Student
Hi,

I'm still working on my bayes training project, but also trying to
upgrade the bayes DB due to upgrading perl and all the associated
modules. I started with this output from "sa-learn --dump magic"

0.000  0  3  0  non-token data: bayes db version
0.000  0   1786  0  non-token data: nspam
0.000  0   3698  0  non-token data: nham
0.000  0 198349  0  non-token data: ntokens
0.000  0  929232460  0  non-token data: oldest atime
0.000  0 1249369370  0  non-token data: newest atime
0.000  0 1249369387  0  non-token data: last journal sync atime
0.000  0 1249342872  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire
reduction count

After the upgrade (sa-learn --sync -D), it zeroed the nham and nspam.
How could this happen? What could I have
done wrong? This is after the upgrade:

0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 1249438016  0  non-token data: oldest atime
0.000  0 1249438016  0  non-token data: newest atime
0.000  0 1249438016  0  non-token data: last journal sync atime
0.000  0 1249438016  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire
reduction count

It seemed to indicate that it was upgrading from db version 0 to db
version 2, then db version 3, although the first sa-learn output shows
that it was already version 3.

Thanks,
Alex


RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

I'm trying to configure RelayCountry. I have it installed, and SA recognizes it:

# spamassassin --lint -D 2>&1|grep -i country
[4278] dbg: diag: module installed: IP::Country::Fast, version 604.001
[4278] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC
[4278] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9648) implements
'extract_metadata', priority 0
[4278] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9648) implements
'parsed_metadata', priority 0

I've loaded the plugin, and "add_header" according to the wiki page:

add_header all Relay-Country _RELAYCOUNTRY_
loadplugin Mail::SpamAssassin::Plugin::RelayCountry

I can create rules for each country I'd like to identify, and that
successfully adds it to the header:

header  RELAYCOUNTRY_RU X-Relay-Countries =~ /RU/
describeRELAYCOUNTRY_RU Relayed through Russian Federation
score   RELAYCOUNTRY_RU 2.0

I was hoping to also have the X-Spam-Countries header added, but that
doesn't seem to work. I'm using v3.2.5, so it has the
RelayCountries.pm patch to add that support. What am I missing?

Somewhat of a basic question, but once I do manage to get that header
working, I know I can parse that and make decisions based on it. Are
there any pre-written perl routines or utilities that can make that
information useful?

Also, I believe I read it adds bayes metadata to the email. Is that
just through the additional headers or is it supposed to add something
else?

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

> I don't know if it makes a difference, but I call it Relay-Countries to
> match the name of the pseudo-header used in the tests
>
> add_header all Relay-Countries          _RELAYCOUNTRY_

It doesn't appear to make a difference. I must be doing something else
wrong. Using "spamassassin --lint -D 2>&1 | less" shows the
X-Relay-Countries header, but it's null:

# spamassassin --lint -D 2>&1 | egrep -i 'relay|country|countries'

[23760] dbg: diag: module installed: IP::Country::Fast, version 604.001
[23760] dbg: config: read file /etc/mail/spamassassin/70_relay_country.cf
[23760] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC
[23760] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayEval from @INC
[23760] dbg: Botnet: adding (\b|\d)relay(\b|\d) to botnet_serverwords
[23760] dbg: Botnet: adding (\b|\d)relay(\b|\d) to botnet_serverwords
[23760] dbg: metadata: X-Spam-Relays-Trusted:
[23760] dbg: metadata: X-Spam-Relays-Untrusted:
[23760] dbg: metadata: X-Spam-Relays-Internal:
[23760] dbg: metadata: X-Spam-Relays-External:
[23760] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9698) implements
'extract_metadata', priority 0
[23760] dbg: metadata: X-Relay-Countries:
[23760] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9698) implements
'parsed_metadata', priority 0
[23760] dbg: rules: ran eval rule NO_RELAYS ==> got hit (1)
[23760] dbg: Botnet: no trusted relays
[23760] dbg: check:
tests=MISSING_DATE,MISSING_HEADERS,MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,RELAYCOUNTRY_LOW

I've added your rules in 70_relay_country.cf, and they trigger in the
"tests=", but the header isn't added.

I've added the "add_header" in init.pre, above the loadplugin line as
well as adding it in local.cf when it didn't work in init.pre.

I've also checked email that has actually been tagged by these rules,
and not just from a "-D" run, and it's not there either.

Thanks again,
Alex


Anti-Phishing and Spear-Phishing Version 2

2009-08-06 Thread MySQL Student
Hi,

Has anyone tried the phishing rules generated by  Julian Field and
developed by Google? It looks really neat:

http://www.jules.fm/Logbook/files/anti-phishing-v2.html

It's basically a list of 3.5k email addresses found in email thought
to be spam. Looks to be developed by Google, so it's "safe?"

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

>> [23760] dbg: metadata: X-Relay-Countries:
>>
> The --lint test is *NOT* valid for this. --lint is *ONLY* to verify your
> config files are parseable.

Yes, thanks, I should have known that, and I think I did. I mentioned
in the previous post that I tried it with a real message, and even
viewed a number already in quarantine, and the same result.

I found this message on nabble:

http://www.nabble.com/Question-about-RelayCountry-td18309349.html#a18339974

Same problem, back in'08, with no resolution. I even downgraded to the
IP::Fast released in Jan 09, and no difference.

Could this be a problem with one of the modules, or is this most
likely a configuration issue?

What I don't understand is that it knows which country its relayed
through, because it prints the rules in the "tests=" section:

X-Spam-Status: Yes, hits=21.8 tag1=-300.0 tag2=4.9 kill=4.9
 use_bayes=1 tests=BAYES_50, BODY_ENHANCEMENT, BOTNET,
FH_HELO_EQ_D_D_D_D, RDNS_NONE,  RELAYCOUNTRY_UK, SARE_ADULT2,
SARE_RECV_IP_FROMIP3, URIBL_AB_SURBL, URIBL_BLACK, []

Curiously, why doesn't it print them each in a column with
description, instead of all together?

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

> This is also why the plugin works and you do get the per-country rule
> hits, but don't get the SA Relay-Countries header.

Yes, you are correct. Thanks for the lead and the explanation. Here's
a thread that talks about how to add the header for amavisd:

http://www.mail-archive.com/amavis-u...@lists.sourceforge.net/msg12416.html

I'm not sure it's really necessary after all, though, because the
rules work without it, and it still doesn't print the header in
quarantined mail.

> char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

How did you get line noise from your modem to look so much like perl code? :-)

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

> I find ordinary header and meta rules are all I need:
>
> http://pastebin.com/f5e5232d1

Among those rules you have:

meta RELAYCOUNTRY_MED   ! RELAYCOUNTRY_HIGH && (
__RELAYCOUNTRY_AF || __RELAYCOUNTRY_AS || __RELAYCOUNTRY_EU_S ||
__RELAYCOUNTRY_OC_S || __RELAYCOUNTRY_AM_S )

It's probably hard to read, but doesn't this exclude the US?
RELAYCOUNTRY_AM_S are all the Americas except US and CA. If I
understand correctly, this says NOT RELAYCOUNTRY_HIGH and all
countries except US and CA, which means that RELAYCOUNTRY_MED would
trigger on all US and CA relays.

Thanks,
Alex


Scores, razor, and other questions

2009-08-07 Thread MySQL Student
Hi,

After another day of hacking, I have a handful of general questions
that I hoped you could help me to answer.

- How can I find the score of a particular rule, without having to use
grep? I'm concerned that I might find it at some score, only for it to
be redefined somewhere else that I didn't catch. Something I can do
from the command-line?

- How do I find out what servers razor is using? What is the current
license now that it's hosted on sf, or are the query servers not also
running there? It doesn't list any restrictions on the web site.

- The large majority of the spam that I receive these days is a result
of a URL not being listed in one of the SBLs. I'm using SURBL, URIBL,
and spamcop. For example, I caught guadelumbouis.com several hours
ago, and it's still not listed in any of the SBLs. Am I doing
something wrong or am I missing an SBL? Has anyone else's spam with
URLs increased a lot lately?

Thanks,
Alex


Elusive spam

2009-08-12 Thread MySQL Student
Hi,

I'm having trouble catching a particular type of spam, and hoped
someone had some time to take a look:

http://pastebin.com/d57336542

It doesn't match RAZOR2, or any of the URI lists, and it's only
BAYES_50. I have a pretty well-established BAYES db, so I'm surprised
it's only BAYES_50. What can I do to block spam like this in the
future?

Thanks,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

>> Maybe this will sound dumb but wouldn't it be perfectly
>> safe to blacklist "example.com" after all, that isn't a
>> domain your ever going to get mail from.
>
> I could be wrong, but I'm guessing the example.com is the OP's munging.

Yes, that's correct. My apologies.

Best,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

> Are we to make guesses on what else might be munged?
> Is just example.com munged or the 172.0.0.1 also munged?

Just the domain was munged. Thanks for the info. I should have been
able to figure that out.

Thanks,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

> it hits spamhaus, and spamcop, what more do you want ?
>
> meta haus_cop (spamhaus && spamcop)
> score haus_cop 5

X-Spam-Status: No, hits=4.8 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, DATE_IN_PAST_03_06, RCVD_IN_BL_SPAMCOP_NET,
 RCVD_IN_SORBS_WEB, RCVD_IN_XBL, RELAYCOUNTRY_US, URI_HEX

50_scores.cf:score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960 # n=0 n=2
50_scores.cf:score RCVD_IN_XBL 0 2.896 0 3.033 # n=0 n=2
70_relay_country.cf:score   RELAYCOUNTRY_US 0.1
50_scores.cf:score RCVD_IN_SORBS_WEB 0 1.117 0 0.619 # n=0 n=2
50_scores.cf:score BAYES_50 0 0 0.001 0.001
50_scores.cf:score URI_HEX 1.777 1.316 1.395 0.368
50_scores.cf:score DATE_IN_PAST_03_06 2.299 1.394 1.306 0.044

Something doesn't seem right. Am I adding them wrong? It sure seems to
equal more than 5.0. Is it possible the rules are being scored
differently in another location?

The meta rule is a good one. I'll create that now.

Thanks,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

> 50_scores.cf:score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960 # n=0 n=2
> 50_scores.cf:score RCVD_IN_XBL 0 2.896 0 3.033 # n=0 n=2
> 70_relay_country.cf:score           RELAYCOUNTRY_US 0.1
> 50_scores.cf:score RCVD_IN_SORBS_WEB 0 1.117 0 0.619 # n=0 n=2
> 50_scores.cf:score BAYES_50 0 0 0.001 0.001
> 50_scores.cf:score URI_HEX 1.777 1.316 1.395 0.368
> 50_scores.cf:score DATE_IN_PAST_03_06 2.299 1.394 1.306 0.044
>
> Something doesn't seem right. Am I adding them wrong? It sure seems to
> equal more than 5.0. Is it possible the rules are being scored
> differently in another location?

It does look like the XBL scores may have been modified in another
config file by a previous admin, ugh. Thanks, now I know.

Thanks,
Alex


Post trips pastebin spam filter

2009-08-12 Thread MySQL Student
Hi,

I have another spam message that is very elusive, and thought someone
might be able to take a look. I tried to post it to pastebin, and its
spam filter apparently catches it, and prevents me from posting. It's
definitely in the header.

Is there something else I can do to post it, or does someone know how
their spam filter works? I tried even obfuscating the spam URLs, but
it still catches it.

The spam has BAYES_99, and is also DKIM signed and verified, and
passes SPF, and despite having "Congratulations!", "Wal-Mart" and
several URLs in the body, it's not caught.

Thanks,
Alex


Re: Barracuda RBL in first place

2009-08-15 Thread MySQL Student
Hi,

>                            Unknown user 32.00% (32.00%)            87427696
>                              Greylisted 24.88% (16.92%)            46225401
>                               Throttled 11.03% (5.64%)             15399444
>                     Relay access denied 0.01%  (0.00%)                 7034
>                   Bogus DNS (Broadcast) 0.01%  (0.00%)                11692
>              Bogus DNS (RFC 1918 space) 0.07%  (0.03%)                82135
>                         Spoofed Address 0.26%  (0.12%)               319551
>                      Unclassified Event 0.77%  (0.35%)               949388
>                 Temporary Local Problem 0.01%  (0.00%)                 8165
>             Require FQDN sender address 0.04%  (0.02%)                51022
>          Require FQDN for HELO hostname 8.97%  (4.02%)             10988455

[...]

Can I ask how you produced those stats? They look very helpful.

Thanks,
Alex


Re: Barracuda RBL in first place

2009-08-15 Thread MySQL Student
Hi,

>> What log script do you good people use to generate the list above ? Is it
>> a home brew or one we can download so we can compare our own hits ?
>
> http://www.rulesemporium.com/programs/sa-stats.txt

Any chance someone knows where there is a compatible one that parses
amavisd instead of spamd? I've tried, but guess I don't know enough
perl to get it right.

Any chance someone has a bit of time to hack on it on this lazy
Saturday afternoon? :-)

Thanks,
Alex


Counting RAZOR2 hits

2009-08-15 Thread MySQL Student
Hi,

I thought "grep -c RAZOR2_CHECK" through my mail logs would give me a
good approximation of the number of times RAZOR2 was consulted, but
that doesn't seem to be the case. There are some mails that don't have
it listed in the "tests=" section.

I've also tried the razor-* commands, and they don't appear to be able
to help here either. What am I missing?

Does RAZOR2_CHECK mean that it was found in the RAZOR2 db, or that it
merely consulted the db?

Thanks,
Alex


Re: Barracuda RBL in first place

2009-08-16 Thread MySQL Student
Hi,

> So perhaps instead of adding another RBL, maybe some admins need to
> consider adding in some HELO checking / rejection.

Can you explain a bit more here? What are you checking for, that the
host is valid?

Thanks,
Alex


Re: Counting RAZOR2 hits

2009-08-17 Thread MySQL Student
Hi,

> You can also set your min_cf in your razor config files, which will
> affect when the RAZOR2_CHECK rule fires. This does work in SpamAssassin,
> as I have over-ridden the min_cf on my own system, and have done so for
> years.

Thanks to everyone for their great ideas thus far. I'm looking forward
to working through it to learn more.

I'm seeing a lot of FNs that include various RAZOR rules, but still
don't have enough points to be tipped. Are there meta rules that
people have created and can share that might help?

How about combining it with BOTNET? The ones that have BAYES_99 and
most of the SURBLS and RAZOR* are all properly tagged already, but
many only have BAYES_50.

Some have only RAZOR2_CHECK and contain an inline image.

X-Spam-Status: No, hits=4.1 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, HTML_MESSAGE, RAZOR2_CF_RANGE_51_100,
 RAZOR2_CF_RANGE_E8_51_100, RAZOR2_CHECK, RDNS_NONE, RELAYCOUNTRY_US,
 SPF_HELO_PASS, SPF_PASS

score RAZOR2_CHECK 0 0.9 0 0.9
score RAZOR2_CF_RANGE_51_100 0 0.8 0 0.8
score RAZOR2_CF_RANGE_E4_51_100 0 1.8 0 1.8
score RAZOR2_CF_RANGE_E8_51_100 0 1.5 0 1.5

I see now that RAZOR2_RANGE_E8 should also be at least 1.8, which I've
now changed.

Does everyone do their own mass-checks these days? How do you go about
analyzing the FNs to figure out why they aren't caught and adjust the
scores? Of course they need to be looked at individually for
additional patterns, but how are the scores best "personalized" of the
rules that are triggered?

Thanks,
Alex


Re: sa-update: stuck at 795855?

2009-08-19 Thread MySQL Student
Hi,

> The problem is that the spammers test with the SA rulesets as soon
> as they are released, which is why the rulesets become ineffective.

I'm not sure I agree with that. If this were the case, I would have a
lot less spam with scores of 50 or more, which obviously aren't even
trying to do something as easy as pass it through SA first.

Also, couldn't we then draw conclusions from this that, since vendors
like Symantec have rules which never are seen by spammers, that their
rules are better?

Incidentally, are there technologies that vendors like Symantec,
Proofpoint, Cisco, Google, etc, use that we don't have or don't have
access to?

Thanks,
Alex


Re: Assistence needed with spamassasin under RedHat 5.2

2009-08-19 Thread MySQL Student
Hi,

> spamassasin.  I have a test message which is genuine.  Running this through
> spamassasin with -t (test) mode as described below gives the output below:
>
> Running : spamassassin -t /tmp/rose2 gives at the bottom the following
> (edited for privacy) report.

Try adding some debugging output, and first look for something obviously wrong:

# spamassassin -D -t /tmp/rose2 2>&1 | less

Go line-by-line looking for something that stands out as obviously wrong.

Consider obfuscating your message, replacing your domain with
"example.com", for instance, and uploading it to pastebin.com. Then
post a link here so we can all view the message for further ideas.

Regards,
Alex


Re: gpgkey failures with sa-update

2009-08-19 Thread MySQL Student
Hi,

> list.  No errors reported then, and I've now forgotten the url. www.yerp.org
> now gets me a webmail login screen, so obviously that wasn't it.  Toss that
> url to me and I'll replay it again.

You should be able to search through your browser history, no?

With Firefox v3.5, you can also just type "yerp" in the location bar,
and it will do a more aggressive search through your previous URLs for
anything containing those letters.

Regards,
Alex


Re: spam mail with flagged style images

2009-08-20 Thread MySQL Student
Hi,

> Text added to e-mail is a bogus one, never repeated, same as the old styled
> spam mail with attached images. The OCR doesn't detect nothing, I understand
> because of flagged effect. Also, image file name changes, if it have.

A few of these have slipped through on my systems, but for the most
part, these rules have worked here:

mimeheader AS_090505_CDIS_INLINE  Content-Disposition =~ /inline/
score  AS_090505_CDIS_INLINE  0.5
describe   AS_090505_CDIS_INLINE  Rule by AS: Content-Disposition: inline

mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/
score  AS_090508_CTYP_PNG 0.5
describe   AS_090508_CTYP_PNG Rule by AS: Content-Type: PNG

mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/
score  AS_090508_CTYP_JPG 0.5
describe   AS_090508_CTYP_JPG Rule by AS: Content-Type: JPG

mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/
score  AS_090508_CTYP_JPEG 0.5
describe   AS_090508_CTYP_JPEG Rule by AS: Content-Type: JPEG

meta   AS_090508_PNGSPAM  (AS_090505_CDIS_INLINE && AS_090508_CTYP_PNG)
score  AS_090508_PNGSPAM  0.5
describe   AS_090508_PNGSPAM  Rule by AS: Probably an Inline PNG spam

meta   AS_090508_JPGSPAM  (AS_090505_CDIS_INLINE && AS_090508_CTYP_JPG)
score  AS_090508_JPGSPAM  0.5
describe   AS_090508_JPGSPAM  Rule by AS: Probably an Inline JPEG spam

meta   AS_090508_JPEGSPAM  (AS_090505_CDIS_INLINE &&
AS_090508_CTYP_JPEG)
score  AS_090508_JPEGSPAM  0.5
describe   AS_090508_JPEGSPAM  Rule by AS: Probably an Inline JPEG spam

meta   LOCAL_BOTNET_JPG(BOTNET && AS_090508_JPGSPAM)
score  LOCAL_BOTNET_JPG 1.5
describe   LOCAL_BOTNET_JPG Rule by AS: Probably an Inline JPEG spam

meta   LOCAL_BOTNET_JPEG(BOTNET && AS_090508_JPEGSPAM)
score  LOCAL_BOTNET_JPEG1.5
describe   LOCAL_BOTNET_JPEGRule by AS: Probably an Inline JPEG spam

The LOCAL_* are mine, adapted to others I found some time ago. I'd be
interested in people's input on these. Can they be simplified? Do you
agree with the scoring?

How about bayes poisoning? The messages also all have random text,
mostly spelled correctly, but nonsensical. If they are trained, could
it adversely affect my bayes db?

Thanks,
Alex


Junkmailfilter rules

2009-08-20 Thread MySQL Student
Hi,

I've been using the junkmailfilter rules for a few days now, and it's
doing quite well. It occurred to me that I might be able to use the
RCVD_IN_JMF_W rule filter whitelisted domain mail, and use that to
train bayes ham.

Would this work? There of course would be mail from
constantcontact.com, mailing list mail, "newsletters", etc, that all
contain a lot of HTML and other components that could equally be seen
in spam.

How do people typically train bayes ham? I can't rely on my users not
to mix up spam and ham, surely corrupting the database.

I did find this in one of the emails, passed through delivery.net:

X-Spam-Status: No, hits=4.9 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, BOTNET, DKIM_SIGNED, DKIM_VERIFIED, HTML_MESSAGE,
 RAZOR2_CF_RANGE_51_100, RAZOR2_CF_RANGE_E4_51_100, RAZOR2_CHECK,
 RCVD_IN_JMF_W, RELAYCOUNTRY_US, SPF_HELO_PASS, SPF_PASS

It was a citibank credit card email. How could it be in RAZOR and also
whitelisted, and BOTNET? Certainly there were no domains in there that
it was relayed through that were part of a botnet.

Ideas greatly appreciated.
Thanks,
Alex


Re: spam mail with flagged style images

2009-08-20 Thread MySQL Student
Hi,

>> mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/
>> mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/
>> mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/
>
> All scored the same. Can be written as a single rule.

I've spent some time and tried to refine my rules based on your
advice, guenther. Can I ask you to check them over again and see if
this is any better, or at least more inclusive?

mimeheader LOC_CDIS_INLINE  Content-Disposition =~ /inline/
score  LOC_CDIS_INLINE  0.1
describe   LOC_CDIS_INLINE  Content-Disposition: inline

mimeheader LOC_CTYP_IMG  ((Content-Type =~ /image\/png/) ||
(Content-Type =~ /image\/jpg/) || (Content-Type =~ /image\/jpeg/) ||
(Content-Type =~ /^application\/octet-stream.\.rtf/))
score  LOC_CTYP_IMG 0.1
describe   LOC_CTYP_IMG  Content-Type: PNG-JPG-JPEG-RTF

meta   LOC_IMGSPAM  ((LOC_CDIS_INLINE && LOC_CTYP_IMG)
score  LOC_IMGSPAM  0.1
describe   LOC_IMGSPAM  Probably inline image

meta   LOC_BOTNET_IMG   ((BOTNET && LOC_IMGSPAM) || (BAYES_99 &&
LOC_IMGSPAM))
score  LOC_BOTNET_IMG   1.5
describe   LOC_BOTNET_IMG   Probably inline image spam

> Generally, no.  A spam advertising body part enhancers also has
> correctly spelled words. Training them doesn't "poison" Bayes either.
> And there usually are still useful tokens around.

That's great, thanks!

Thanks,
Alex


Re: spam mail with flagged style images

2009-08-21 Thread MySQL Student
Hi,

> mimeheader LOC_CTYP_IMG  ((Content-Type =~ /image\/png/) ||
> (Content-Type =~ /image\/jpg/) || (Content-Type =~ /image\/jpeg/) ||

I thought this passed through my --lint, but I only caught it the
second time. I was looking around for the (new) right way to do it,
and found this in 80_additional.cf:

mimeheader __ANY_IMAGE_ATTACH   Content-Type =~ /image\/(?:gif|jpeg|png)/

Now I know. Does the rest look like it will work as expected?

Thanks,
Alex


Re: lottery message scored hammy by bayes

2009-08-25 Thread MySQL Student
Hi,

> If you're using autolearning, what are your learning thresholds?

What do you recommend for thresholds? I'm considering using
autolearning, but very concerned about corrupting the database. I
think I would use something like +15 for spam.

There are FNs on occasion in the 2.x range with low bayes numbers (or
BAYES_50) that I wouldn't want to be tagged as ham. Should that be a
concern?

Even mail that has been whitelisted could also contain spam, so would
a ham threshold of like -100 work, or present the same problem?

Thanks,
Alex


Training spam as ham and forwarding

2009-08-26 Thread MySQL Student
Hi SA users,

I have a few messages found in the quarantine that I need to train as
ham because they were marked as spam incorrectly. To do this, I added
the following to the top of the file so it becomes a normal email:

 From DUMMY-LINE Thu Jan  1 00:00:00 1970

Is this correct? (without the leading spaces)

I can now accurately access and index it using pine, whereas before it
didn't acknowledge it as a normal email. I'd also now like to forward
it to the intended recipient as an attachment, but the recipient isn't
able to read it as a normal email, but instead as plain text. How can
I accomplish this?

Are there mail tools, like procmail or formail, I believe, that were
designed to automate this?

Does anyone request ham from their users to be trained by bayes, or
is autolearning typically the only way (or only real effective way) to do this?

Also, on another note, how can I have all email destined for a
particular user sent to them, including spam? This is what all_spam_to
is for, correct?

Thanks,
Alex


Google/Yahoo Spam

2009-08-27 Thread MySQL Student
Hi all,

I'm seeing an increase in Google Reader and yahoo
groups/personals/profile spam. Here's an example of the Google Reader
spam:

http://pastebin.com/m1021fc5f

Any ideas on how to catch this one? For the Yahoo spam (with links to
yahoo sites ending in '/1', I've created these:

uriLOC_YAHOO1 m{http://groups\.yahoo\.com\/}i
score  LOC_YAHOO1 0 1.5 0 1.5
describe   LOC_YAHOO1 Contains groups.yahoo.com uri

uriLOC_YAHOO2 m{http://profile\.yahoo\.com\/}i
score  LOC_YAHOO2 0 1.5 0 1.5
describe   LOC_YAHOO2 Raw body contains profile.yahoo

uriLOC_YAHOO3 m{http://personals\.yahoo\.com\/}i
score  LOC_YAHOO3 0 1.5 0 1.5
describe   LOC_YAHOO3 Raw body contains personals.yahoo

They're somewhat paired down because I'm not very good at pattern
matching, so thought someone could improve on this?

Thanks,
Alex


Converting spam to email message

2009-08-27 Thread MySQL Student
Hi all,

I thought I understood, but I'm still having trouble converting a
message in the quarantine back into a normal email message that I can
forward on to a recipient. Does anyone know how to do this?

Thanks so much.
Best regards,
Alex


Re: Converting spam to email message

2009-08-27 Thread MySQL Student
Hi,

>> I thought I understood, but I'm still having trouble converting a
>> message in the quarantine back into a normal email message that I can
>> forward on to a recipient. Does anyone know how to do this?
>
> Maybe I missed something, but SpamAssassin doesn't have a quarantine.
>
> http://wiki.apache.org/spamassassin/SpamQuarantine

Yes, my apologies. I guess it would then be amavisd-new that's
managing the quarantine.

I didn't realize that amavisd manipulated the mail in that way.
Hopefully someone can still help.

Thanks,
Alex


Re: Porn-portal spammers

2009-08-29 Thread MySQL Student
Hi,

> I am getting rather tired from messages spamming porn-portals. They typically
> originate from hotmail.com, and advertise a porn-portal based on
> google.com/groups, google.com/reader, groups.yahoo.com, pipes.yahoo.com,
> spaces.live.com, docs.google.com, sites.google.com and livejournal.com.

This was posted by Martin a week or so ago in response to a similar
question by me:

This should catch your set and more:

uri  LOC_YAHOO /^http:.{1,40}\.yahoo[.,]com/i
scoreLOC_YAHOO 0 1.5 0 1.5
describe LOC_YAHOO Contains *.yahoo.com uri

Or, if you want to be more specific, try this:

uri  LOC_YAHOO /^http:\/\/(groups|profile|personals)\.yahoo[.,]com/i
scoreLOC_YAHOO 0 1.5 0 1.5
describe LOC_YAHOO Contains yahoo.com groups/profile/personals uri

Does this help?

Best regards,
Alex


Re: 3.3.0 alpha 2 on production mail servers / clusers ???

2009-08-29 Thread MySQL Student
Hi,

> On Saturday August 29 2009 19:47:32 R-Elists wrote:
>> have many, or any of you folks on the list migrated your production servers
>> to the 3.3.0 alpha 2 or later release?
>
> We are certainly one of them (actually running CVS head,
> which is pretty close to alpha2). About 1000 users here.

Do we have an idea of a timeline for the next release and/or
production release currently?

How about dependencies? Will perl-5.8 work okay? What modules will
need to be updated? How about for use with amavis? Will I need to
upgrade that?

A list of the top five best new features would also be great! *salivates* :-)

I'm trying to anticipate what I can do ahead of time to get it into
place as soon as possible.

Thanks,
Alex


Shortcircuit info

2009-08-31 Thread MySQL Student
Hi all,

I'm trying to understand how shortcircuit works to ease some of the
load on the severs. First, does anyone have any recommended metas that
they use in their environment that might help?

Can I add shortcircuit to an existing rule, or does the rule have to
be designed to be used with shortcircuit? In other words, I have a
meta that combines spamcop with spamhaus:

metaMETA_HAUS_COP   (RCVD_IN_BL_SPAMCOP_NET && RCVD_IN_XBL)
describe META_HAUS_COP  Contains SPAMHAUS XBL and SPAMCOP
score   META_HAUS_COP   0 4.0 0 4.0
shortcircuit META_HAUS_COP  spam

In order for it to be actually shortcircuited, however, I have to make
the score 100, correct?

Thanks,
Alex


URL rule creation question

2009-09-10 Thread MySQL Student
Hi all,

I've seen this pattern in spam quite a bit lately:

href="http://doubleheaderover.com/jazert/html/?39.6d.3d.31.66.67.6b.79.77.63.77.63.65.6e.74.69.6e.6e.69
.61.6c.5f.68.31.33.33.2e.6f.39.39.41.4d.2e.30.30.45.33.39.2e.30.32.30.61.64.6b.37.61.76.61.67.63.31.66.
62.2e.6a.61.7a.65.72.74.2e.68.74.6d.6c3az8fO"

Would it be reasonable to create a rule that looks for this two-char
then dot pattern, or is it reasonable that it might appear in a
legitimate email too frequently? If possible, how would you create a
rule to capture this?

Thanks,
Alex


JMF whitelist and RAZOR conflict

2009-09-10 Thread MySQL Student
Hi,

I have several emails that are tagged with RCVD_IN_JMF_W,
SPF_SOFTFAIL, and RAZOR2_CHECK such as this one:

http://pastebin.com/m4a4d990e

Is the criteria for being listed on the JMF_W simply that it contains
a domain that is whitelisted, despite whether it contains another URL
that is blacklisted?

Would I be advised to make the JMF_W score very low, or create a meta
that doesn't really whitelist it unless it isn't also blacklisted?

meta META_NOT_JMF_RAZOR(RCVD_IN_JMF_W && !RAZOR2_CHECK)

It also appears to spoof the kraftfoods.com mail server, correct? Is
there a possible rule to be created here?

Thanks,
Alex


Re: JMF whitelist and RAZOR conflict

2009-09-10 Thread MySQL Student
Hi,

>> http://pastebin.com/m4a4d990e
>>
>> Is the criteria for being listed on the JMF_W simply that it contains
>> a domain that is whitelisted, despite whether it contains another URL
>> that is blacklisted?
>
> I'm not sure what you are saying here, it's not as if the people
> running the whitelist could lookup the IP address on razor.

I'm saying that it appears odd that it would be listed on both RAZOR
and JMF_W, unless the JMF_W found the kraftfoods.com URL and the RAZOR
rules found the bogus
http://ADSENSETREASUREONLINE.yolasite.com URL. Unless the yolasite.com
is a legitimate kraftfoods site?

>> meta META_NOT_JMF_RAZOR    (RCVD_IN_JMF_W && !RAZOR2_CHECK)
>
> Why RAZOR2_CHECK? Why not other positive scoring rules? The trouble is
> that the whitelist rule is then pointless. Set it's score at a value
> that's commensurate with it's effectiveness on your email.

Does my question now make sense? I was looking at it from more of a
validation point of view for JMF_W, because of the apparent conflict
with RAZOR.

>> It also appears to spoof the kraftfoods.com mail server, correct? Is
>> there a possible rule to be created here?
>
> No, it was almost certainly sent through kraftfoods.com. It's based on
> an IP address recorded by your trusted network.

Maybe I should have used a better example. Can I ask you to look at this one?

http://pastebin.com/m7d61b26f

This uses IP 66.132.135.108 as its URL (xybersleuth.com), and unless
that's not a spammer's site, then there's something wrong. This email
includes JMF_W and RAZOR2_CF_RANGE_51_100 and URIBL_BLACK in the same
message, although it has a very low bayes score. Which is correct?

Thanks,
Alex


Re: URL rule creation question

2009-09-11 Thread MySQL Student
Hi,

> The 'doubleheadedrover' domain currently shows up in Razor(E8),
> uribl_black, surbl_jp, and invaluement.
>
> But it wasn't in all of those when he first started posting about it.

Yes, that's correct. Thanks for your help. That's already caught a
few. I have another that I thought you could help with.

I'd like to create a rule that matches a specific letter and up to 5
spaces after it, repeated ten times. I'm thinking something like this:

/s\ {5}o\ {5}n\ {5}i\ {5}c\ {5}\ m\ {5}e\ {5}d\ {5}i\ {5}a/i

I'm still learning regex's, so hopefully this isn't too far off. The
opportunities for rules are coming faster than my ability to learn.

Thanks,
Alex


Re: JMF whitelist and RAZOR conflict

2009-09-11 Thread MySQL Student
Hi,

>> I have several emails that are tagged with RCVD_IN_JMF_W,
>> SPF_SOFTFAIL, and RAZOR2_CHECK such as this one:
>> http://pastebin.com/m4a4d990e
>
> why accept SPF_SOFTFAIL ?
>
> cant this be solved ?

I don't understand. I'm still learning how the SPF rules work.
Shouldn't I be adding points for an SPF_FAIL? This indicates a spoof
attempt, no?

> are you recieving forwarded emails from spf domains ?

If I understand correctly, no. I have no relationship with any
external source and their SPF records.

> if so add the forward ip to trusted_networks (so spf will be disabled from
> this hosts)

Do you mean to avoid the processing overhead? IOW, don't bother
checking SPF records for trusted domains?

>> Is the criteria for being listed on the JMF_W simply that it
>> contains a domain that is whitelisted, despite whether it
>> contains another URL that is blacklisted?
>
> this is spamassassin working, if there is a blacklisted domain add it to
> your uribl_skip_domain list

Ah, you mean if the domain is erroneously on the blacklist, right?

>> Would I be advised to make the JMF_W score very low, or create a
>> meta that doesn't really whitelist it unless it isn't also blacklisted?
>
> this is ip and not domains

On a somewhat related note, how does BOTNET differ from RDNS_NONE?
What is the logic behind the BOTNET rule? Is there some known list
that it's checking, or is it just likely to be a dynamic IP or
compromised host if it doesn't have a reverse DNS entry?

Thanks so much for the clarification, and confirmation about Gevalia/Kraft.

Thanks,
Alex


Re: URL rule creation question

2009-09-12 Thread MySQL Student
>>> \s is the proper way to represent whitespace.
>>
>> lol, yes, I know that; I was actually trying to match 's' and the
>> slash is the start of the pattern match.
>
> I wasn't referring to the beginning of the RE.

Yeah, I realized that just after I sent this, if anyone cares :-)

Thanks again,
Alex


URIBL_BLACK vs RCVD_IN_JMF_W

2009-09-18 Thread MySQL Student
Hi,

I have been going through about 15MB of email generated from a
procmail recipe searching for RCVD_IN_JMF_W, and you would not believe
how many also match URIBL_BLACK or URIBL_GREY. Call me naive, but are
there really that many providers that are unaware their clients are
sending spam? (okay, rhetorical question :-)

IOW, I guess this email is more of an informational note to those who
may not be aware, and perhaps for others to comment on whether they
even use it?

The winner for me was a Bank of America scam with the following two relays:

Received: from User (channelf.5460.net [61.137.93.80])
Received: from ortiz.unizar.es (ortiz.unizar.es [155.210.1.52])

No b-of-a relays, of course. This message also hit RAZOR2_CHECK and SPF_FAIL.

There's also a money scam that passed through nasa.gov, hit
RCVD_IN_JMF_W, and a few fraud rules:

Received: from ALTPHYEMBEVSP30.RES.AD.JPL ([128.149.137.84]) by
Received: from mail.jpl.nasa.gov (altvirehtstap02.jpl.nasa.gov [128.149.137.73])
Received: from mail.jpl.nasa.gov (sentrion2.jpl.nasa.gov [128.149.139.106])

X-Spam-Status: No, hits=1.1 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=AE_ADVICE_WITH_MONEY, AE_FRAUD_ADVICE, BAYES_50, LOTS_OF_MONEY,
 MILLION_USD, MONEY_TO_NO_R, RCVD_IN_DNSWL_MED, RCVD_IN_JMF_W, RELAYCOUNTRY_US

I have RCVD_IN_JMF_W set to 0.5 points. It was also listed in
RCVD_IN_DNSWL_MED? Running it a bit later, it scored as spam with the
RAZOR rules:

X-Spam-Report:
*  0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
* -0.5 RCVD_IN_JMF_W RBL: Sender listed in JMF-WHITE
*  [128.149.139.106 listed in hostkarma.junkemailfilter.com]
* -4.0 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/,
*  medium trust
*  [128.149.139.106 listed in list.dnswl.org]
*  0.0 RELAYCOUNTRY_US Relayed through United States
*  1.0 AE_FRAUD_ADVICE BODY: Someone offering free advice
*  1.8 MILLION_USD BODY: Talks about millions of dollars
*  2.1 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
*  above 50%
*  [cf:  56]
*  0.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
*  [cf:  56]
*  0.0 LOTS_OF_MONEY Huge... sums of money
*  2.0 AE_ADVICE_WITH_MONEY Has advice and mentions much money
*  1.0 MONEY_TO_NO_R Lots of money and bare, missing or undisclosed To
*  0.2 MONEY_INHERIT Lots of money from a dead guy
X-Spam-Relay-Country: US US US
X-Spam-Status: Yes, score=5.4 required=5.0 tests=AE_ADVICE_WITH_MONEY,
AE_FRAUD_ADVICE,LOTS_OF_MONEY,MILLION_USD,MONEY_INHERIT,MONEY_TO_NO_R,
RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK,
RCVD_IN_DNSWL_MED,RCVD_IN_JMF_W,RELAYCOUNTRY_US shortcircuit=no
autolearn=disabled version=3.2.5

Thanks,
Alex


Re: Problems with high spam

2009-09-18 Thread MySQL Student
Hi,

> also if using amavisd make its temp dir on ram speed up scanning and it
> considered safe, mta have it on disk for the backup :)

How about mounting /var with noatime? Does anyone do that? Do you
think it helps? What Linux filesystem is best suited for this? ext4?

Thanks,
Alex


Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

I have an mbox with about a 100 messages in it from a few days ago.
The mbox is a combination of spam and ham. What is the best way to run
SA through these messages again, so I can catch the ones that have
URLs in them that weren't on the blacklist at the time they were
received?

Must I break them all apart to do this, or can SA somehow parse the
whole mbox? If not, what program do you suggest I use to accomplish
this?

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

> Do you just want to re-scan the whole mbox and see what rules hit now
> for research reasons?

That's a good start, but I'd like to see if I can break out the ham to
train bayes.

> There's no way to (directly) get SA to modify email that's already in an
> mbox file. The mass-check and sa-learn tools can read them, but nothing
> in SA can write to that. However, there might be a utility out there to
> do this (although I'm not aware of any)..

Yeah, that's kind of what I thought. Maybe a program that can split
each message back into an individual file? Would procmail even help
here? Or even a simple shell script that looks for '^From ', redirects
it to a file, runs spamassassin -d on it, then re-runs SA on each
file? I could then concatenate each of them back together and pass it
through sa-learn.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

> You probably want "spamassassin --mbox". :)
> It won't modify the messages in-place, but you can do something like
> "spamassassin --mbox infile > outfile".

My apologies if it wasn't clear, but these messages have already been
marked by SA. Some are ham, and the rest are FPs that I'd like to
re-run through SA, in hopes of it now properly detecting them as spam.

Thank you all for your help. The "mbox split" suggestion is a good
one. I'll follow that route and post my experience later.

Thanks again,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

>> You probably want "spamassassin --mbox". :)
>> It won't modify the messages in-place, but you can do something like
>> "spamassassin --mbox infile > outfile".
>
> My apologies if it wasn't clear, but these messages have already been

Wait, my mistake. I read that too fast. Does that work, and rewrite
the X-Spam-Status header?

Guess I could find out for myself, but it just contradicts my
experience and info I've learned previously.

Thanks again,
Alex


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
Hi,

>> Thank you all for your help. The "mbox split" suggestion is a good
>> one. I'll follow that route and post my experience later.
>
> formail -s is the way to go.

I thought about that as a component of procmail. Sounds great.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
> but this will invalidtate dkim headers if this headers is signed, are
> spamassassin aware of this problem ? (in general)

Are you saying there is a bug?

> mutt -f mbox
>
> in mutt save to another folder if missclassified

Yes, I use pine for that, but would like to eliminate as many of the
FNs as possible, particularly ones that I can't determine visually.

Thanks,
Dave


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
Hi,

> IIRC you previously mentioned using Pine. Just in case you're not aware
> the default format for Pine/Alpine is MBX, an extended version of
> MBOX. You can tell the difference because MBX mailboxes start with a
> dummy email that's hidden by the software.

It seems that if you save messages into a separate folder it does not
add the DUMMY information at the top. I believe this is why the system
was set up to use "mbox" and not "mbx". Does this sound correct?

> I'd be very wary about allowing any tool to modify an MBX file unless
> you know it's safe. Where locking is an issue, Mark Crispin recommends
> that they only be accessed via the c-client library.

This isn't the actual spool file, but a copy in the home directory.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
Hi,

It's certainly not a fast operation, but using the following will
split an mbox into individual messages:

export FILENO=0
mkdir msgs
formail -s sh -c 'cat - >msgs/$FILENO' < mbox-name.mbox

I also created a loop that would strip all the SA headers from the messages:

for file in *; do echo Processing: $file; spamassassin -d < $file >
$file.txt; done

This worked for a few hundred of the messages, but then started to
fail on my production system with:

[22135] warn: bayes: cannot open bayes databases
/home/user/.spamassassin/bayes_* R/W: lock failed: File exists

How can I tell when another process is using the database and when it
is free for my script to use?

Is there a faster way to run spamassassin just to strip the SA headers?

Maybe there is a faster way, like passing the messages through the
running amavisd instead of having to restart spamassassin each time to
re-process each message?

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-22 Thread MySQL Student
Hi,

> Try using a local SA setup for stripping the headers. By local, I mean
> don't use your main production SA - run a separate copy with its own
> (cut down) configuration and all data base accesses and UBL calls etc
> turned off.

Much better idea, thanks. Thanks for the script, too.

Best,
Alex


New money/fraud spam

2009-09-27 Thread MySQL Student
Hi John,

Another batch of money spam attached. Everything is the same as the last time.

Thanks,
Alex


money-spam-092709.gz
Description: GNU Zip compressed data


Re: New money/fraud spam

2009-09-27 Thread MySQL Student
Okay, my bad, please ignore. Damn google auto-complete.

Alex

On Sun, Sep 27, 2009 at 6:46 PM, MySQL Student  wrote:
> Hi John,
>
> Another batch of money spam attached. Everything is the same as the last time.
>
> Thanks,
> Alex
>


Sought regex problem

2009-09-27 Thread MySQL Student
Hi,

I posted bug 6198 a few weeks ago, and there have been no comments or
fixes on it in two weeks, and I'm unsure what to do next. It's either
not a bug and I'm doing something wrong or it's not significant enough
to bother with the focus on v3.3.

Thought someone might have some ideas here? I'm using perl-5.6. Anyone
else using perl-5.6 with the sought rules?

[13204] dbg: config: read file /var/lib/spamassassin/3.002005/sought_rules_yerp_
org/20_sought.cf
[13204] warn: config: invalid regexp for rule __SEEK_D52BRW: / Don\'t want to
lose your potential of a lover\? Lucky you are, in 21th century all bed-related
male problems can be solved by the powerful remedy, the all-mighty blue caplet\!
This solution will give you the right support for 50\(\!\) hours\. Rock-like and
ready to go\. more\x{bb}/: / Don\'t want to lose your potential of a lover\?
Lucky you are, in 21th century all bed-related male problems can be solved by /:
Can't use \x{} without 'use utf8' declaration

Maybe it's a perl module that's incompatible?

Ideas greatly appreciated.
Thanks,
Alex


Re: Sought regex problem

2009-09-27 Thread MySQL Student
Hi,

>> [13204] dbg: config: read
>> file /var/lib/spamassassin/3.002005/sought_rules_yerp_
>> org/20_sought.cf [13204] warn: config: invalid regexp for rule
>> __SEEK_D52BRW:
>
>  grep doesn't find   __SEEK_D52BRW in my copy of the rules.

This was from the sa-update when I submitted the bug report.

Thanks to all for the feedback and the update to the bugzilla. I'm in
the process of upgrading perl, but there are still a few applications
that depend on it.

Mark suggested in the bugzilla update that I "change SpamAssassin to
add 'use utf8' into code generated from rules when it sees it is being
run with a pre-5.8 version of perl." How do I do this for the time
being?

Thanks,
Alex


Re: Hostkarma Blacklist Climbing the Charts

2009-09-28 Thread MySQL Student
Hi,

> header RCVD_IN_JMF_W eval:check_rbl_sub('JMF-lastexternal', '127.0.0.1')
> describe RCVD_IN_JMF_W Sender listed in JMF-WHITE
> tflags RCVD_IN_JMF_W net nice
> score RCVD_IN_JMF_W -5

Hopefully my comment isn't out of place with the current discussion of
JMF/Hostkarma. I think this is not only a really bad default score,
but it should be reduced to -0.5 or perhaps not used at all.

I have a money/fraud email that hit RCVD_IN_JMF_W that passed through
these servers:

Received: from 41.220.75.3
Received: from webmail.stu.qmul.ac.uk (138.37.100.37) by mercury.stu.qmul.ac.uk
Received: from qmwmail2.stu.qmul.ac.uk ([138.37.100.210]
Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6])

It also hit these other rules:

X-Spam-Status: No, hits=1.3 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=AE_GBP, BAYES_50, LOTS_OF_MONEY, LOTTERY_PH_004470,
LOTTO_RELATED, MONEY_TO_NO_R, RCVD_IN_DNSWL_MED, RCVD_IN_JMF_W,
RELAYCOUNTRY_UK, SPF_FAIL, SPF_HELO_FAIL

Unless I'm really missing something, which server has JMF/Hostkarma
whitelisted that shouldn't be?

This happens time after time.

Thanks,
Alex













>
> header RCVD_IN_JMF_BL eval:check_rbl_sub('JMF-lastexternal', '127.0.0.2')
> describe RCVD_IN_JMF_BL Sender listed in JMF-BLACK
> tflags RCVD_IN_JMF_BL net
> score RCVD_IN_JMF_BL 3.0
>
> header RCVD_IN_JMF_BR eval:check_rbl_sub('JMF-lastexternal', '127.0.0.4')
> describe RCVD_IN_JMF_BR Sender listed in JMF-BROWN
> tflags RCVD_IN_JMF_BR net
> score RCVD_IN_JMF_BR 1.0
> ===8<---
>
> You pick the names and then the world can use them. The JMF names are out
> there today.
>
> {^_^}    Joanne
>


Re: Hostkarma white list

2009-09-29 Thread MySQL Student
Hi,

> For those of you getting spam from IPs/Hostnames on my hostkarma
> white list, if you could email me a list of false hits (IP or host name) I
> could probable clean out the bad entries in the white list pretty quick.

I'm not sure this is the best approach. I have a procmail recipe that
filters specifically the JMF_W and I go through it every day before
training the folder as ham. I'd say around a quarter of the messages
are spam.

How many entries on the whitelist? How were they added? I'd almost
rather start from scratch (or from a more proven list) with a
percentage known to be valid and build from there.

At the least, wouldn't it be best to move the default score closer to
zero on your wiki page for the time being?

Maybe another method for submitting FPs rather than emailing them to
you could be created?

Wouldn't the veracity of the list be better assured if you built the
list from a pile of known ham?

Mail originating from priorityoneemail.com [69.10.237.52] would be one
prime suspect for removal consideration.

On a somewhat related topic, how do people classify topica.com? That
is one for sure sends junk, but looks like people may actually request
it, heh.

Thanks,
Alex















>
>


Re: .cn Oddity

2009-10-02 Thread MySQL Student
Hi All,

Regarding the .cn oddity, I added these to my rules, and of about 79k
messages today so far, I have the following:

uri LOC_URI_CN  m;^https?://[^/?]+\.cn\b;
uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i

LOC_URI_CN: 2926
T_CN_8_URL: 1634

HTH,
Alex


Re: OT bad news

2009-10-06 Thread MySQL Student
Hi,

> It's a shame that, living in Denver, I will be *just* out of range of
> hearing the screams as the mailspools fill with viruses, malware, and
> massive payloads of Spanish Prinsoner spams.

Awe, c'mon now. Yes, I agree SA is a better solution, but Microsoft
didn't get to be a multi-billion-dollar company solely because of its
marketing. Certainly a competent admin following some SANS guides can
secure an Exchange box to sufficiently avoid it getting hacked, and a
properly-installed version of Symantec will keep most spam away.

It /is/ possible, I suppose :-)

I'd bet that if he kept the FreeBSD box in place and just told his
boss he "upgraded" to Exchange, they'd never even know :-)

Regards,
Alex


Re: Uppercase E-mail in Latin America

2009-10-06 Thread MySQL Student
Hi,

> doesnt it appear to everyone else that this has the (slim to none) makings
> of a new urban legend?

I have to admit that when Warren posted this, I went to snopes to
check, and there was nothing there :-)

Regards,
Alex


Re: SpamAssassin Ruleset Generation

2009-10-06 Thread MySQL Student
Hi,

> Other than the sought rules, all the rules are manually generated? Is there
> any statistics on how frequently are new rules/regex adopted by
> spamassasssin? Who are the people who write them? Any details related to

Information on Justin Mason's SOUGHT rules is here:

http://taint.org/2007/08/15/004348a.html

Use sa-update to update your SA rules once or twice per day with the
new stuff. His ongoing development work is here:

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jm/?sortby=date

HTH,
Alex


Re: Subject Rewrite Based on Score

2009-10-08 Thread MySQL Student
Hi,

>  I actually would be doing that but the filter does not know how to
>  handle int(), so I would have to build a filter for all possible number
>  combinations, but if I could just get SA to do the basic math for me and
>  write a header or subject I can filter off of that.

We do something similar here using a procmail/formail script which
calls a perl script to match on X-Spam-Status then rewrite the subject
to include the bayes score prepended to the subject. We then use a few
procmail rules to filter the mail based on the bayes score for
analysis.

Regards,
Alex


Re: Subject Rewrite Based on Score

2009-10-08 Thread MySQL Student
Hi,

> That sounds overly complicated and like a lot of wasted cycles. Calling
> a Perl script for each message? What you just described sounds a hell of
> lot like this light-weight SA configuration:

Yes, I should have mentioned that it is a copy of the mail that users
receive and only visible by a single account. It also only occurs once
every four hours as the mail is pulled from the spool.

Regards,
Alex


  1   2   >