Re[2]: [SAtalk] [RD] Rule Philosophy

Robert Menschel Thu, 14 Aug 2003 11:56:39 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Matt, Chris,

Monday, August 4, 2003, 11:03:04 PM, Matt wrote:

MK> At 09:45 PM 8/4/03 -0700, Robert Menschel wrote:
>>uri       L_u_time4more  /time4more\.net/i
>>describe  L_u_time4more  Body text references known spammer
>>score     L_u_time4more  9.00  # graphics-only spam Aug 4 03

MK> Personaly, I tend to not go over 4.0, even on a sure-fire spam rule.
MK> This is mostly as a result of accepting the general spamassassin
MK> philosophy that any single rule shouldn't be enough. (with the
MK> exception of things like GTUBE)

I understand, but then I also firmly believe and accept the philosophy
that there's an exception to every rule. :-)  The exception to the
philosophy of not flagging spam with a single rule has exceptions in 1)
the blacklist, and 2) when the spammer gives no other spam clues.

An extreme example of the latter would be those spam which contain
nothing but a single URL, itself not clearly spam, with no spamsign in
subject or other headers.

MK> Admittedly this isn't very likely to false positive, however rather
MK> than creating one rule worth 9, if at all possible I tend to create a
MK> handful of rules for the same spam which total 6-9.

I normally aim for the same. Since most false negatives get 50-75% of the
way to my required hits, I frequently need to just add 1-2 points to get
the email to score as spam.

MK> I would also try improve the rule by framing it with \b's, or at
least 
MK> starting it with one.

MK> /\btime4more\.net\b/

That's a good enhancement, which I'm adding to my rule. This should match
"go to time4more.net." as well as http://www.time4more.net and
http://www.time4more.net/links/spampage.html

Tuesday, August 5, 2003, 6:53:35 AM, Chris wrote:

CS> THis is exactly what MY_EVIL does and works great. My tips page talks
CS> about how you should mark these as Temp_My_EVIL, as they will
CS> eventually expire.

Good point.

CS> For those who may not see it, these are not the sender of the spam
CS> domains, but the domain of the image hosts, often owned by spammers.
CS> Therefore it is ever changing like a RBL. So submissions of these to
CS> the Rule Emporium would be tooo lengthy. You would almost have to
CS> have an RBL for the rule :)

We could, however, set up a blacklist through a website, such that
anyone can submit an entry, a simple domain name such as time4more.net,
or an IP address if that's the reference in the spam, or a more specific
URI (spaml3.time4more.net/spamdir or 123.234.56.78/spamdir). The web
system would track submissions, and create a ruleset from them.

Initial score on first submission would be 0.1, with score increasing
perhaps to 1.0 as additional submissions/reports come in. We could also
have password-authorized trusted submitters, whose submissions would
score higher (allowing scores to get up to 2.5 perhaps).

Perhaps these scores would be doubled for those systems not using DNSBLs?

The system would then dump these scores into an ASCII file that could be
retrieved by anonymous FTP. This file could be stored as auto-uribl.cf
for those who can have multiple local.cf files, and could be
automatically added to the user_prefs file for people like me who are
limited to the user_prefs file. (Such rules wouldn't do any good unless
you use a system like mine that calls SA a second time.)

CS> This type of rule can also be combined with others. There is almost
no
CS> chance of timeformore.net showing up in a code at the same time as
CS> tastemysalad.com, so it is easier to combine.

Agreed. The only concern would be readability / editability. Rules which
get too long bother me from an esthetic perspective. (That wouldn't apply
if we develop an automated system.)

Since the rules are temporary, perhaps it'd be good to name them
something like L_u_Tmp_AugW1 (rule added first week in August). When we
then review the rules, we know to check in Sept and Oct whether this rule
has been superceded by a DNSBL. We know to check in Nov and Dec whether
the domain's in this rule are no longer being used.  We know in Jan and
Feb that if the rule is still active and beneficial, maybe we should
remove the Tmp flag.

>>header    L_s_CorelWPOffice  Subject =~
>>/(?:Corel|WordPerfect).{1,15}Office/i  

MK> More \b action, on general principle, although not strictly needed.

Agreed. Thanks.

CS> Yeah, I have the norton system works rule like this. If you don't use
CS> WP office, then by all means make a rule. But an ISP would shy away
CS> from this one.

Actually, we DO use WP Office. And we frequently share files from WP
Office. But we don't refer to WP Office as such in subject headings. Just
like we don't name each other in subject headings either.

As for an ISP, I would think it's still a valid rule; they'd just need to
be careful to score it low enough to be incremental rather than
definitional.

>>header    L_hr_lattelekom  Received =~ /lattelekom\.net/

MK> Seems fine, although a bit of a duplication of effort with DNSBL's..
MK> have you enabled them?

DNSBLs are enabled by my host. I wouldn't be without them.

This was a spam that didn't score from them -- apparently it's too new a
pathway. This should probably be given a temporary name/flag, and removed
once the DNSBLs catch up.

CS> Hmmm.....this is interesting. This would help me greatly if I listed
CS> IPs. My blocked IPaccess list has stopped a few legit emails that
CS> I've had to fix. However if I had SA read in that list and simply
CS> score some points for matches, it would be less apinful on FPs.

As an end user, with no access to procmail or similar, SA is my only
method of providing a fixed access list, and it works well for me.

Perhaps a web site like the one theorized above could provide a set of
Received header rules, by domain name and/or IP address, which indicate
spam, with the same scoring considerations applying (so provisional or
wrong submissions don't cause false positives).

CS> This comes down to the same problem of SA and my lack of perl. Having
CS> SA function that reads a text file and looks for matches in the
CS> email. Such as a list of domains or IPs.  There was mention that 2.6
CS> might have some sort of eval like this. That would be sweet.

My understanding is that if you can create/update your local.cf, you also
have the ability to have multiple *.cf files in that directory, and all
of them will be used. So you can have a relays.cf file which you replace
daily or weekly, and it takes effect whenever spamd is restarted or SA is
run manually.

As for people like me with user_prefs files, if we can get rules
activated as in my system, it's simple enough to have multiple *.cf files
in our $HOME/.spamassassin directory which are normally invisible to SA,
but to "cat *.cf >user_prefs" on a daily basis to do the same type of
updating.

I am considering doing something like that to automate the implementation
of William Stearns' blacklist collection.

CS> I think your method has some potential. But most of the spam I see
CS> fake the domain names and come right from an open relay.

Can they be identified? If we discard the domain names, are there
reliable IP addresses which identify the open relays? If so, the same
idea should work with those.

Thanks to both of you for your ideas.

(Anyone else?)

Bob Menschel

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPzCGk5ebK8E4qh1HEQJh6ACfSmGnQ1I8gOzM/B229ch9B2ZBqdoAnizB
cl/71T7NtScCYIeHYJacoCkd
=6n1t
-----END PGP SIGNATURE-----

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re[2]: [SAtalk] [RD] Rule Philosophy

Reply via email to