Loren Wilton wrote:
If you are only correctly classifying 50% of the spam (you said 100 caught
to 100 missed, I htink) then you have SERIOUS problems of some sort.
----
Yeah, well, I try not to be too reactionary on computer
things like this -- especially when it could just be a
matter of flipping a config switch somewhere and things get
instantly better. While the number of spams getting through
are significantly higher, probably 75-80% of them are duplicate
emails sent to multiple email addresses -- including some
blacklisting To-Addresses. Apparently, the spammer isn't being
kind enough to send the spam to the black-listed To-Add'ies first
and with the new spamc client, sendmail notices the lower load
average and likely allows more parallel incoming instances to
process incoming email before a given spam gets "locked out".
I suppose this could be a "downside" of this efficiency, but
previous to this I never saw multiple instances of these
simple spams get through **undetected**. This makes me think
it isn't just the increased efficiency causing problems as
I would have expected at least one or two duplicate spams
that wouldn't have been caught by "other means" (than being
sent to a blacklisted To-addr).
As a
happy 2.63 user that upgraded to 3.04, it too a little minor fiddling, but
by and large things are *much* better now, and they were good before.
-- *(oh the salt, the salt [in the wound]...:-) )* ---
Also, you mentioned training with 'old spam' and 'new ham'. Presumably you
were talking about bayes training. Really training with new spam,
especially the stuff slipping through, would be the right thing to do. Spam
has changed considerably in character in just the last 6 months.
----
Sorry, unclear: I archive current spams after "sa-learn"ing on
them, so "archives" contain anything older than whatever I
haven't processed "recently". With SA 2.63, I'd go through my
Junk email folder sometimes as infrequently as once/month and find
maybe 6-10 emails that should have gone to subscribed lists or
where from recent online vendors that sent me spammy looking
receipts (although those were rare). I'd drop them in my "despam"
folder for later "ham learning". But sifted folders of junk
email, I process(sa-learn-junk) in bulk and archive.
Suggestion: let us see the full list of SA hits on some of the stuff
slipping through.
The full list of SA hits? -- for that message, that was it, here's another
passed. Note, there is a weird header "X-SA-Do-Not-Rej: Yes" which doesn't
look normal:
---junk email that passed as ham; sent to multiple email accounts---
Received: (qmail 16547 invoked from network); 16 Sep 2005 18:08:51 -0000
Received: from unknown (HELO thaimail.org) ([202.150.81.42])
(envelope-sender <[EMAIL PROTECTED]>)
by mail7.sea5.speakeasy.net (qmail-ldap-1.03) with SMTP
for <[EMAIL PROTECTED]>; 16 Sep 2005 18:08:49 -0000
From: "Molnar Chris" <[EMAIL PROTECTED]>
To: "Siedler Clemens" <[EMAIL PROTECTED]>
Subject: Re[6]:
Date: Fri, 16 Sep 2005 18:09:04 +0000
Message-ID: <[EMAIL PROTECTED]>
X-SA-Do-Not-Rej: Yes
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_40CE_1F627A89.B53D40CE"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2527
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527
X-Spam-DCC: :
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on
ishtar.sc.tlinx.org
X-Spam-Level: ***
X-Spam-Status: No, score=3.5 required=4.8 tests=BAYES_99,HTML_MESSAGE
autolearn=no version=3.0.4
X-Spam-Pyzor:
X-Spam-Report:
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
* [score: 1.0000]
Status:
X-Status:
X-Keywords: Junk
--------------------------------------------------------------------------------
In a rejected email, I see many more tests:
------------------junk email correctly
labeled----------------------------------
Subject: ***SPAM*** Athena, Electric-chair for little or no-cost
MIME-Version: 1.0
X-Mailid: 6977
Content-Type: multipart/alternative; boundary="==8aa9d3a4cb398b"
Date: Thu, 15 Sep 2005 14:56:00 -0700
X-Spam-Prev-Subject: Athena, Electric-chair for little or no-cost
X-Spam-DCC: :
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on
ishtar.sc.tlinx.org
X-Spam-Level: ******
X-Spam-Status: Yes, score=6.9 required=4.8 tests=BAYES_99,HTML_90_100,
HTML_IMAGE_ONLY_20,HTML_IMAGE_RATIO_02,HTML_MESSAGE,HTML_WEB_BUGS,
MIME_HTML_MOSTLY,MPART_ALT_DIFF,MSGID_FROM_MTA_HEADER,
MSGID_FROM_MTA_ID autolearn=no version=3.0.4
X-Spam-Pyzor:
X-Spam-Report:
* 1.7 MSGID_FROM_MTA_ID Message-Id for external message added
locally
* 0.4 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000
bytes of wor
ds
* 0.0 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to
image a
rea
* 0.0 HTML_90_100 BODY: Message is 90% to 100% HTML
* 0.0 HTML_WEB_BUGS BODY: Image tag intended to identify you
* 1.0 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html
MIME
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 0.1 MPART_ALT_DIFF BODY: HTML and text parts are different
* 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
* [score: 1.0000]
* 0.1 MSGID_FROM_MTA_HEADER Message-Id was added by a relay
----------------------------------------------------------------------------------
It seems likely this is a "default" configuration problem, or my
slight tinkering with a default config that sometimes turns off most
of the useful tests.
About a 20% chance of being related to having installed 3.0.2 from the SuSE
package then upgrading components to 3.0.4 through CPAN and manual copying
over non-standard placement of 3.0.2 images (like location for spamd).
Example:
*Ignoring docs*
SuSE's SA rpm has RC scripts and:
/usr/bin/sa-learn
/usr/bin/spamassassin
/usr/bin/spamc
/usr/sbin/spamd -- *non standard loc*; but copied 3.0.4 spamd from /usr/bin
over 3.0.2 location and verfied start of correct version daemon.
-----
SuSE's perl-spamassassin RPM has more files that could provide for
some unstandard placement. The have:
/etc/mail/spamassassin/init.pre
/etc/mail/spamassassin/local.cf
Then the various module files under "vendor/<perl
version>/Mail/SpamAssassin".
All of the rule files are placed under /usr/share/spamassassin/
Note - the rule files under /usr/share/spamassassin are marked as
requiring
SA version 3.000004, so it seems this directory is the correct dir for
the rule-base.
====================
So ... I'm wondering...why were most of the local tests disabled? Is
it related to the "X-SA-no-reject"? That could be a "hole"....I note
that non of my "valid" email has that header marked in it -- but I
did notice one or more spam's that passed through with that header.
Could be coincidence?
Linda
p.p.s. -- This is truely an annoying problem if it occurs, and I don't
know if it will occur on this message with all the junk-marked headers
that are included, but when does the "SA" list reject
content about "Spm" because an email discussing it contains too much
content that looks like "Spm"? Seems there should be a way to include
or attach content (I don't think most modern emailers have built-in
rot13 enc/decoders built-in :-)) ...?