Re: [SAtalk] W32.Novarg.A@mm virus

2004-01-28 Thread Keith C. Ivey
e scores and that should take care of it all. Be *very* careful with some of those rules. Many of them match bounce messages in general, not just virus bounces, so you'll never know when your mail isn't delivered. And some of them, like matching any message that has "approved" o

Re: [SAtalk] Munged (encoded) Subject

2004-01-14 Thread Keith C. Ivey
;& !__SUBJ_EQ_BANG && __SUBJ_ENCODED describe L_SUBJ_GRATUITOUS_ENCODING Subject is encoded unnecessarily score L_SUBJ_GRATUITOUS_ENCODING 1 I still don't score it very high, since some people's mail programs are set to use subject encoding even when the subject contains

Re: [SAtalk] Rules for word-jumble spam

2004-01-11 Thread Keith C. Ivey
mingly genuine messages here: http://www.winehq.com/hypermail/wine-users/2003.09.txt http://lists.ira.uka.de/pipermail/javaparty- users.mbox/javaparty-users.mbox -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email

RE: [SAtalk] detecting large collections of random words

2004-01-08 Thread Keith C. Ivey
hes [a-z] can't be '=', the negative look- ahead ends up doing nothing. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Conf

Re: [SAtalk] Useful to compare sender domain with relay?

2004-01-02 Thread Keith C. Ivey
give the relay multiple names to allow things to pass. Or so I > would think? Definitely too restrictive, though it might work for big ISPs like AOL. An IP address can only have one reverse DNS, so servers that handle multiple domains won't match the way you want them to. --

RE: [SAtalk] Re: False positives

2003-12-30 Thread Keith C. Ivey
oung adult" is what you get if you take the most harmless word from each of the two sets. It's likely that the creator of the rule didn't consider every possible pair and didn't notice that "young adult" was not a porn indicator. -- Keith C. Ivey <[EMAIL PROTECT

Re: [SAtalk] Re: False positives

2003-12-29 Thread Keith C. Ivey
nd it should be fixed. I reported it a while back and submitted a suggested patch, but nothing seems to have happened: http://bugzilla.spamassassin.org/show_bug.cgi?id=2619 -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF

Re: [SAtalk] Re: Having trouble coding a local rule

2003-12-28 Thread Keith C. Ivey
Peter Kiem <[EMAIL PROTECTED]> wrote: > I thought the from rule worked on the envelope sender of the email and not > the easily forged from header :( What makes you think the envelope sender isn't easily forged? -- Keith C. Ivey <[EMAIL PROTEC

Re: [SAtalk] RD: "justified" HTML

2003-12-15 Thread Keith C. Ivey
; me? In your case, you should get rid of the comma, since the regex matches the same messages without it. Think about it: Any message that has three or more of those lines also has three of those lines. So once you've found three, there's no point in continuing to look. -- Keith C. I

Re: [SAtalk] re: X-Spam-Status: No, hits=-2.1 required=1.0

2003-11-22 Thread Keith C. Ivey
. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us hel

Re: [SAtalk]

2003-11-19 Thread Keith C. Ivey
fault (too many false positives, I assume). -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you crea

RE: Re[2]: [SAtalk] Sanity checking new uri rules?

2003-11-18 Thread Keith C. Ivey
n't really get bounded > in a URL. Could you give an example? A domain name in a URL should never have word characters adjacent to it, so putting '\b' before and after should work fine. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC -

Re: [SAtalk] LIST ADMIN - SUGGESTION: pool.com

2003-11-15 Thread Keith C. Ivey
think VERP would help in this situation. Putting such things in the "From:" header in place of the actual author's name and address would make it more difficult to sort through a folder of SATalk messages, since the author is often useful in determining

RE: [SAtalk] Is punctuation really needed? (fwd)

2003-11-10 Thread Keith C. Ivey
er is substituted for another ("PayPaI"). Sometimes the same character is used in different words to represent different letters ("[EMAIL PROTECTED]", "[EMAIL PROTECTED]"). The solution is anything but simple. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC

Re: [SAtalk] Re: Accumulator rules (Re: 'random' character sets)

2003-11-07 Thread Keith C. Ivey
, but perhaps 10 such strings would be. Similarly with empty HTML markup, like ''. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.Net email sponsored by: ApacheCon 2003, 16-19 November in Las Vegas. Learn

Re: [SAtalk] Re: 'random' character sets

2003-11-07 Thread Keith C. Ivey
STY, IMPOTENCE, ITS_LEGAL, OPT_IN_CAPS, and PENIS_ENLARGE2, along with BAYES_99 (presumably because the other site's SA has learned some of his previous messages). The message was perfectly legitimate and didn't contain anything that you'd notice as spammy-sounding when you rea

Re: [SAtalk] scoring system and values...

2003-11-07 Thread Keith C. Ivey
d a tab-separated text file attached containing a table from a database in which all the text was uppercase. Rules almost always match messages you didn't intend them to match. That's one reason why it's almost always a bad idea to assign a large score to any single rule. -- K

Re: [SAtalk] scoring system and values...

2003-11-07 Thread Keith C. Ivey
ut what words are reasonable to have in spam and nonspam mail. I have a custom rule for "vicodin" and other drug names, but I haven't scored it 5.5. It is rare for spam to trigger only one rule, so a few points are enough. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington

Re: [SAtalk] scoring system and values...

2003-11-07 Thread Keith C. Ivey
.. There is a way: the Bayesian analysis. If "mortgage" never appears in nonspam and often appears in spam, then messages containing the word will very quickly start getting BAYES_99. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --

RE: [SAtalk] Abused redirector URLs ?

2003-11-07 Thread Keith C. Ivey
ave you seen any site other than Yahoo using that format for a redirector? I haven't, and I've seen plenty of redirectors that don't use it. I wouldn't expect that rule to be any better than Mike Kuentz's version, but I guess it wouldn't hurt. -

Re: [SAtalk] Abused redirector URLs ?

2003-11-06 Thread Keith C. Ivey
http:/taint.org But this causes endless redirection and crashes Mozilla 1.4: http://srd.yahoo.com/*http/taint.org -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program.

Re: [SAtalk] Abused redirector URLs ?

2003-11-06 Thread Keith C. Ivey
f an abused unrestricted redirector The "illuminating" part is just a random word. It will be different in the next message. I'd make it uri YAHOO_REDIR /srd.yahoo.com\/drst\/.*\*http:/ -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC ---

Re: [SAtalk] RegEx Question

2003-11-04 Thread Keith C. Ivey
ould /\bs\.?e\.?x\b/ work? No, because that matches "sex". But this would work: /\b(?!sex)s\.?e\.?x\b/ The negative lookahead prevents it from matching "sex" if it's unobfuscated. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC ---

Re: [SAtalk] X-pvkhgmeblyqcmv header

2003-11-02 Thread Keith C. Ivey
Something like this? header WEIRD_X_HEADER ALL =~ /\nX-[a- z]*[bcdfghjklmnpqrstvwxz]{4}[a-z]*: / -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net he

Re: [SAtalk] Exessive HTML Code

2003-10-29 Thread Keith C. Ivey
the middle of words unless put there intentionally. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help

Re: [SAtalk] Can I delete ham/spam email once I run sa-learn on them w/o impacting the database?

2003-10-29 Thread Keith C. Ivey
me messages the have already been identified as spam by autolearning. There's no need to separate out the already-learned messages first. That said, you certainly don't want to keep all your spam in one giant folder that you learn over and over every night. You should move or delet

Re: [SAtalk] White & black lists on server

2003-10-28 Thread Keith C. Ivey
ich allows you to see all the headers easily, save the mail in its original form, and forward a message in various ways -- as an attachment, as text included in another message, or "bounced" (just adding "Resent" headers). -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC

Re: [SAtalk] Filtering mailsl without text

2003-10-27 Thread Keith C. Ivey
ur version doesn't match if there are attributes (which happens quite often on BODY). -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what Source

RE: [SAtalk] Totally whitelisting someone?

2003-10-26 Thread Keith C. Ivey
ss, then you're starting with a clean slate, which means that your earlier sending of GTUBE is forgotten. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: The SF.net Donation Program.

Re: [SAtalk] [RD] yahoo redirect

2003-10-25 Thread Keith C. Ivey
r the current rules but are open to abuse (and spammers are using them), so any new rule is likely to be much less specific. Are you saying Yahoo will get it right next time, and they'll check to see what the current state of the SA rules is when they decide on their URL format? -- Keit

RE: [SAtalk] [RD] Popcorn, Backhair, and Weeds

2003-10-25 Thread Keith C. Ivey
be a problem though. > To try to curb the FPs for tests within the {1,5} range, I will experiment > with the following rule: > > full MY_FULL_OBFU_HTML /([\s>]\w+<[\w\s\/\$&;]{1,6}>\w+){2,}/ That will only match when one word is interrupted by more than one obfuscati

RE: [SAtalk] [RD] Trojaned machines

2003-10-24 Thread Keith C. Ivey
slashes are normally used unless there are slashes in the pattern itself, in which case another delimiter is often used to avoid the need to backslash the slashes in the pattern. For more, see the Perl documentation: http://perldoc.com/perl5.8.0/pod/perlop.html#m-PATTERN-cgi

Re: [SAtalk] [RD] Trojaned machines

2003-10-22 Thread Keith C. Ivey
. > > It is tough to remember everything SA looks for. Does 2.60 have > something like this? Comments? Look at the NORMAL_HTTP_TO_IP and WEIRD_PORT tests in 20_uri_test.cf. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- Th

Re: [SAtalk] Looking for some interview subjects

2003-10-22 Thread Keith C. Ivey
Todd Joseph <[EMAIL PROTECTED]> wrote: > He could be one of these folks: >http://www.tbray.org/ongoing/When/200x/2003/10/12/SpamPlan27. More likely one of these: http://www.rhyolite.com/anti-spam/you-might-be.html -- Keith C. Ivey <[EMAIL PROTECTED]&

Re: [SAtalk] Re: Re: adding SPAM hits score to headers

2003-10-20 Thread Keith C. Ivey
iles), and it doesn't load files that aren't named specifically in the program. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo The Event Fo

Re: [SAtalk] documentation (was: Using SPAMD ?)

2003-10-19 Thread Keith C. Ivey
that effect. I think you're referring to the text here: http://spamassassin.org/where.html It says that that *page* has been superseded, not the site as a whole. Now if only I could figure out why I keep getting redirected to the Australian mirror of the spamassassin.org site. -- Keith C. Iv

Re: [SAtalk] Messages without Bayes score

2003-10-19 Thread Keith C. Ivey
l that improve the Bayes > scores for similar future messages? I've had that happen too, especially for Nigerian scam mail, for some reason. Running sa-learn on them should help. That's what I've been doing. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC ---

Re: [SAtalk] lint difference in 2.60

2003-10-19 Thread Keith C. Ivey
of > it" (from `perldoc perlre`) > > As a word boundary, would not \b also match . , / ? No, \b matches a *boundary*, not a character. It would match the spot between any of those characters and a letter/number/ underscore, n

Re: [SAtalk] bayes learning "To" and "Received" headers?

2003-10-19 Thread Keith C. Ivey
kups, whereas spammers often send mail directly to the backups in an attempt to bypass filtering. "Received" tokens that are good nonspam indicators include some indicating that the mail came from servers at organizations that frequently send legitimate mail to our users. It seems to me

Re: [SAtalk] Re: adding SPAM hits score to headers

2003-10-19 Thread Keith C. Ivey
for the lines like this (there are two): $tag =~ s/_HITS_/sprintf("%05.2f", $self->{hits})/e; Changing the "%05.2f" to "%04.1f" (or whatever you prefer) should do it. There's no need to recompile anything. Just restart spamd if you'

Re: [SAtalk] bayes learning "To" and "Received" headers?

2003-10-19 Thread Keith C. Ivey
ant tokens being purged periodically, so the added tokens aren't increasing the size. The people who developed the Bayes tokenizing for SA have done analysis on how effective various strategies are, and I'm inclined to trust their analysis unless

RE: [SAtalk] LOTS of mail being tagged wrong

2003-10-17 Thread Keith C. Ivey
earn --dump magic", if you're using 2.60), then Bayes scoring will be disabled until more messages are autolearned. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Give

Re: [SAtalk] adding SPAM hits score to headers

2003-10-17 Thread Keith C. Ivey
and look more carefully at the low-scoring messages at the top. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects.

Re: [SAtalk] Deleting e-mail from a blacklisted site on a mail relay

2003-10-15 Thread Keith C. Ivey
ust having their mail eaten and never knowing it wasn't delivered. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source P

Re: [SAtalk] Possible rule - excessive punctuation in subject

2003-10-15 Thread Keith C. Ivey
la isn't that good. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC --- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provi

Re: [SAtalk] Regex Detail (was RE: Popcorn, Backhair, and Weeds)

2003-10-15 Thread Keith C. Ivey
you can write it this way: /[>\s]\w<[-\w\s\$&!]{0,150}>\w\W/ I must admit I'm puzzled about why Larry wants to limit the pattern to having only one letter on each side of the angle- bracketed stuff. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC -

Re: [SAtalk] Too many rules?

2003-10-15 Thread Keith C. Ivey
table for everyone. I do use some RBLs to refuse mail (with qmail and qpsmtpd and the dnsbl plugin), but others, which I think have more false positives, I just use in SA to increase the spam score. -- Keith C. Ivey <[EMAIL PROTECTED]> Washington, DC

RE: [SAtalk] Popcorn, Backhair, and Weeds

2003-10-14 Thread Keith C. Ivey
n't making any difference. It seems to me that your rule is going to have a fair number of false positives, though. For example, '' often shows up between words with no intervening whitespace, and depending on what's used to produce the HTML I wouldn't be that surprised to

RE: [SAtalk] More HTML Obfuscation: This One Made It Through

2003-10-13 Thread Keith C. Ivey
x27;re not catching that. For example, 'A' instead of being 'A' can be represented as 'A'. You could combine parts of your two regexes to match those. Also, you can have leading 0's in the numbers, so 'A' can be written as 'A' (or 'A&#x

Re: [SAtalk] More HTML Obfuscation: This One Made It Through

2003-10-13 Thread Keith C. Ivey
ome bugs. Another thing that we should be checking for is stuff like this: > http://ewtajsland.b&# > 105;z/rmp6651/">Visit_to_begin_your_order There's a test for something similar, SPAM_FORM_ACTION, but it needs to be expanded to test for HREFs as well, as for URLs

Re: [SAtalk] AOL addresses

2003-10-13 Thread Keith C. Ivey
4 FAKE_HELO_AOL Host HELO did not match rDNS: > aol.com Can you post the headers from some of those messages? Is your mail server not putting the rDNS into the headers? I'd lower the scores for those tests in local.cf for the time being. -- Keith C. Iv

RE: [SAtalk] Popcorn, Backhair, and Weeds

2003-10-11 Thread Keith C. Ivey
turn, line feed, form feed), followed by 1 to 5 word characters (letters, numbers, and underscores), followed by '<', followed by an optional '/', followed by an optional single whitespace character, followed by 6 to 150 word or whitespace characters,