e scores and that should take care of it all.
Be *very* careful with some of those rules. Many of them match
bounce messages in general, not just virus bounces, so you'll
never know when your mail isn't delivered. And some of them,
like matching any message that has "approved" o
;&
!__SUBJ_EQ_BANG && __SUBJ_ENCODED
describe L_SUBJ_GRATUITOUS_ENCODING Subject is encoded
unnecessarily
score L_SUBJ_GRATUITOUS_ENCODING 1
I still don't score it very high, since some people's mail
programs are set to use subject encoding even when the subject
contains
mingly genuine messages here:
http://www.winehq.com/hypermail/wine-users/2003.09.txt
http://lists.ira.uka.de/pipermail/javaparty-
users.mbox/javaparty-users.mbox
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email
hes [a-z] can't be '=', the negative look-
ahead ends up doing nothing.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Conf
give the relay multiple names to allow things to pass. Or so I
> would think?
Definitely too restrictive, though it might work for big ISPs
like AOL. An IP address can only have one reverse DNS, so
servers that handle multiple domains won't match the way you
want them to.
--
oung adult"
is what you get if you take the most harmless word from each of
the two sets. It's likely that the creator of the rule didn't
consider every possible pair and didn't notice that "young
adult" was not a porn indicator.
--
Keith C. Ivey <[EMAIL PROTECT
nd it should be fixed.
I reported it a while back and submitted a suggested patch, but
nothing seems to have happened:
http://bugzilla.spamassassin.org/show_bug.cgi?id=2619
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF
Peter Kiem <[EMAIL PROTECTED]> wrote:
> I thought the from rule worked on the envelope sender of the email and not
> the easily forged from header :(
What makes you think the envelope sender isn't easily forged?
--
Keith C. Ivey <[EMAIL PROTEC
; me?
In your case, you should get rid of the comma, since the regex
matches the same messages without it. Think about it: Any
message that has three or more of those lines also has three of
those lines. So once you've found three, there's no point in
continuing to look.
--
Keith C. I
.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us hel
fault (too many false
positives, I assume).
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you crea
n't really get bounded
> in a URL.
Could you give an example? A domain name in a URL should never
have word characters adjacent to it, so putting '\b' before and
after should work fine.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
-
think
VERP would help in this situation.
Putting such things in the "From:" header in place of the
actual author's name and address would make it more difficult
to sort through a folder of SATalk messages, since the author
is often useful in determining
er is substituted for another ("PayPaI").
Sometimes the same character is used in different words to
represent different letters ("[EMAIL PROTECTED]", "[EMAIL PROTECTED]").
The solution is anything but simple.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
, but perhaps 10
such strings would be. Similarly with empty HTML markup, like
''.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn
STY, IMPOTENCE, ITS_LEGAL,
OPT_IN_CAPS, and PENIS_ENLARGE2, along with BAYES_99
(presumably because the other site's SA has learned some of his
previous messages). The message was perfectly legitimate and
didn't contain anything that you'd notice as spammy-sounding
when you rea
d a tab-separated text file attached
containing a table from a database in which all the text was
uppercase.
Rules almost always match messages you didn't intend them to
match. That's one reason why it's almost always a bad idea to
assign a large score to any single rule.
--
K
ut what words are reasonable to have in spam and
nonspam mail.
I have a custom rule for "vicodin" and other drug names, but I
haven't scored it 5.5. It is rare for spam to trigger only one
rule, so a few points are enough.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington
..
There is a way: the Bayesian analysis. If "mortgage" never
appears in nonspam and often appears in spam, then messages
containing the word will very quickly start getting BAYES_99.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
--
ave you seen any site other than Yahoo using that format for a
redirector? I haven't, and I've seen plenty of redirectors
that don't use it. I wouldn't expect that rule to be any
better than Mike Kuentz's version, but I guess it wouldn't
hurt.
-
http:/taint.org
But this causes endless redirection and crashes Mozilla 1.4:
http://srd.yahoo.com/*http/taint.org
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
f an abused unrestricted redirector
The "illuminating" part is just a random word. It will be
different in the next message. I'd make it
uri YAHOO_REDIR /srd.yahoo.com\/drst\/.*\*http:/
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
ould /\bs\.?e\.?x\b/ work?
No, because that matches "sex". But this would work:
/\b(?!sex)s\.?e\.?x\b/
The negative lookahead prevents it from matching "sex" if it's
unobfuscated.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
Something like this?
header WEIRD_X_HEADER ALL =~ /\nX-[a-
z]*[bcdfghjklmnpqrstvwxz]{4}[a-z]*: /
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net he
the middle of words unless put there intentionally.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help
me messages the have already
been identified as spam by autolearning. There's no need to
separate out the already-learned messages first.
That said, you certainly don't want to keep all your spam in
one giant folder that you learn over and over every night. You
should move or delet
ich allows you to see all the headers
easily, save the mail in its original form, and forward a
message in various ways -- as an attachment, as text included
in another message, or "bounced" (just adding "Resent"
headers).
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
ur version doesn't match if there are attributes
(which happens quite often on BODY).
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what Source
ss, then
you're starting with a clean slate, which means that your
earlier sending of GTUBE is forgotten.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: The SF.net Donation Program.
r the current rules but are open to
abuse (and spammers are using them), so any new rule is likely
to be much less specific.
Are you saying Yahoo will get it right next time, and they'll
check to see what the current state of the SA rules is when
they decide on their URL format?
--
Keit
be a
problem though.
> To try to curb the FPs for tests within the {1,5} range, I will experiment
> with the following rule:
>
> full MY_FULL_OBFU_HTML /([\s>]\w+<[\w\s\/\$&;]{1,6}>\w+){2,}/
That will only match when one word is interrupted by more than
one obfuscati
slashes are normally used unless there are
slashes in the pattern itself, in which case another delimiter
is often used to avoid the need to backslash the slashes in the
pattern. For more, see the Perl documentation:
http://perldoc.com/perl5.8.0/pod/perlop.html#m-PATTERN-cgi
.
>
> It is tough to remember everything SA looks for. Does 2.60 have
> something like this? Comments?
Look at the NORMAL_HTTP_TO_IP and WEIRD_PORT tests in
20_uri_test.cf.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
Th
Todd Joseph <[EMAIL PROTECTED]> wrote:
> He could be one of these folks:
>http://www.tbray.org/ongoing/When/200x/2003/10/12/SpamPlan27.
More likely one of these:
http://www.rhyolite.com/anti-spam/you-might-be.html
--
Keith C. Ivey <[EMAIL PROTECTED]&
iles), and it doesn't load files that aren't named
specifically in the program.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo
The Event Fo
that effect.
I think you're referring to the text here:
http://spamassassin.org/where.html
It says that that *page* has been superseded, not the site as a
whole.
Now if only I could figure out why I keep getting redirected to
the Australian mirror of the spamassassin.org site.
--
Keith C. Iv
l that improve the Bayes
> scores for similar future messages?
I've had that happen too, especially for Nigerian scam mail,
for some reason. Running sa-learn on them should help. That's
what I've been doing.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
of
> it" (from `perldoc perlre`)
>
> As a word boundary, would not \b also match . , / ?
No, \b matches a *boundary*, not a character. It would match
the spot between any of those characters and a letter/number/
underscore, n
kups, whereas spammers often send
mail directly to the backups in an attempt to bypass filtering.
"Received" tokens that are good nonspam indicators include some
indicating that the mail came from servers at organizations
that frequently send legitimate mail to our users.
It seems to me
for the lines like this (there are
two):
$tag =~ s/_HITS_/sprintf("%05.2f", $self->{hits})/e;
Changing the "%05.2f" to "%04.1f" (or whatever you prefer)
should do it. There's no need to recompile anything. Just
restart spamd if you'
ant tokens being purged periodically, so the added
tokens aren't increasing the size.
The people who developed the Bayes tokenizing for SA have done
analysis on how effective various strategies are, and I'm
inclined to trust their analysis unless
earn --dump magic", if
you're using 2.60), then Bayes scoring will be disabled until
more messages are autolearned.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Give
and
look more carefully at the low-scoring messages at the top.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
ust having their mail eaten and never knowing it wasn't
delivered.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source P
la isn't that
good.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
---
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provi
you can write it
this way:
/[>\s]\w<[-\w\s\$&!]{0,150}>\w\W/
I must admit I'm puzzled about why Larry wants to limit the
pattern to having only one letter on each side of the angle-
bracketed stuff.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
-
table for everyone. I do use some RBLs to
refuse mail (with qmail and qpsmtpd and the dnsbl plugin), but
others, which I think have more false positives, I just use in
SA to increase the spam score.
--
Keith C. Ivey <[EMAIL PROTECTED]>
Washington, DC
n't making any
difference.
It seems to me that your rule is going to have a fair number of
false positives, though. For example, '' often shows up
between words with no intervening whitespace, and depending on
what's used to produce the HTML I wouldn't be that surprised to
x27;re not catching that. For example, 'A' instead
of being 'A' can be represented as 'A'. You could
combine parts of your two regexes to match those. Also, you
can have leading 0's in the numbers, so 'A' can be written
as 'A' (or 'A
ome bugs.
Another thing that we should be checking for is stuff like
this:
> http://ewtajsland.b&#
> 105;z/rmp6651/">Visit_to_begin_your_order
There's a test for something similar, SPAM_FORM_ACTION, but it
needs to be expanded to test for HREFs as well, as for URLs
4 FAKE_HELO_AOL Host HELO did not match rDNS:
> aol.com
Can you post the headers from some of those messages? Is your
mail server not putting the rDNS into the headers?
I'd lower the scores for those tests in local.cf for the time
being.
--
Keith C. Iv
turn,
line feed, form feed),
followed by 1 to 5 word characters (letters, numbers, and
underscores),
followed by '<',
followed by an optional '/',
followed by an optional single whitespace character,
followed by 6 to 150 word or whitespace characters,
52 matches
Mail list logo