On Sat, 15 Nov 2003, David B Funk wrote:

> On Fri, 14 Nov 2003, Carl R. Friend wrote:
> 
> >    For the assembled group -- is it possible to do a DB lookup,
> > either in an eval() or some other mechanism, in a "uri" rule?
> > If we could do a DB lookup on URIs (or, more properly, the
> > domain portion of URIs) I think that'd be a win (at, of course,
> > the expense in human time).
> >
> 
> I've been thinking about that exact topic. The Bayes engine
> already parses and tokenizes hostnames from URIs (the UD: tokens).
> If there were a hash DB made with the spam-site hostname as key and
> score,description as value (something like the sendmail access db)
> then it should be pretty easy to take those UD: tokens and do a
> lookup and add results to total score.

   As near as I can tell, there's no current way for a rule
to return a score -- they're boolean in nature.  That said,
even that would be a win for looking up domains in URIs and
whatnot.

> Another advantage is that it would be possible to update the
> database 'hot' (IE without having to kill and restart spamd,
> the way that you have to do to update regex rules).

   Absolutely true, and that's another reason I'm tottering
down this path.  The tactic would be dog simple using the
"uri" syntax if that de-mimes and de-quoted-printables the
(probably obfuscated) URL and case-bangs it to all-lower (or
upper if that floats one's boat).

> It might even be possible to automate the updating of the
> database. (take hostnames found by Bayes in spam, do DNS lookup
> and add if IP in spamhaus nets, in trusted DSBLs, has short TTL,
> etc).

   Ideally it should be possible to fairly quickly determine
when a new spammer comes online with a new domain and get them
into the databases with rapidity.  Unfortunately, who is going
to be willing to host such a notifier?  Once that IP addy gets
out, the poor bloke will get DDOSsed out of existance pretty
quickly; spammers are getting known for that.

> I can see one of two different implementations:
> 1) Have the value be just "score,description" and synthesize the
> rule name from the hostname (EG:
> 
> 2) have the value be a triple, "name,score,description" and
> explicitly store all attributes:
> 
> 1) would be simpler to update and use up less memory,
> 2) would be more flexible and let you combine several different
> sites into one class of rule.

   I was thingking much more simply: if the domain is in the
database of known spmvertised sites it gets a "hit" and a fixed
value -- very simple, and hopefully very fast.  If one has two
classes of domain that one dislikes, one could open up two
databases, one having merely disliked sites in it and the other
having really detested sites in it and score them differently.

> Probably should also add some kind of time-stamp to each entry to
> facilitate automated updating.

   Nice idea.  This would also assist in purging stale entries
from a database, but I'm given to understand that that tactic
may not work well any longer as spammers are "retiring" sites
once they get into databases, waiting a period of time for them
to expire out of caches and blacklists, and then opening up
with them again.

> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527

   Cool.  Please say, "Hi!" to Doug Jones over in the Computer Dept.
for me if you get a chance.  Small world....

+------------------------------------------------+---------------------+
| Carl Richard Friend (UNIX Sysadmin)            | West Boylston       |
| Minicomputer Collector / Enthusiast            | Massachusetts, USA  |
| mailto:[EMAIL PROTECTED]                        +---------------------+
| http://users.rcn.com/crfriend/museum           | ICBM: 42:22N 71:47W |
+------------------------------------------------+---------------------+



-------------------------------------------------------
This SF. Net email is sponsored by: GoToMyPC
GoToMyPC is the fast, easy and secure way to access your computer from
any Web browser or wireless device. Click here to Try it Free!
https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to