On Sat, 15 Nov 2003, David B Funk wrote: > On Fri, 14 Nov 2003, Carl R. Friend wrote: > > > For the assembled group -- is it possible to do a DB lookup, > > either in an eval() or some other mechanism, in a "uri" rule? > > If we could do a DB lookup on URIs (or, more properly, the > > domain portion of URIs) I think that'd be a win (at, of course, > > the expense in human time). > > > > I've been thinking about that exact topic. The Bayes engine > already parses and tokenizes hostnames from URIs (the UD: tokens). > If there were a hash DB made with the spam-site hostname as key and > score,description as value (something like the sendmail access db) > then it should be pretty easy to take those UD: tokens and do a > lookup and add results to total score.
As near as I can tell, there's no current way for a rule to return a score -- they're boolean in nature. That said, even that would be a win for looking up domains in URIs and whatnot. > Another advantage is that it would be possible to update the > database 'hot' (IE without having to kill and restart spamd, > the way that you have to do to update regex rules). Absolutely true, and that's another reason I'm tottering down this path. The tactic would be dog simple using the "uri" syntax if that de-mimes and de-quoted-printables the (probably obfuscated) URL and case-bangs it to all-lower (or upper if that floats one's boat). > It might even be possible to automate the updating of the > database. (take hostnames found by Bayes in spam, do DNS lookup > and add if IP in spamhaus nets, in trusted DSBLs, has short TTL, > etc). Ideally it should be possible to fairly quickly determine when a new spammer comes online with a new domain and get them into the databases with rapidity. Unfortunately, who is going to be willing to host such a notifier? Once that IP addy gets out, the poor bloke will get DDOSsed out of existance pretty quickly; spammers are getting known for that. > I can see one of two different implementations: > 1) Have the value be just "score,description" and synthesize the > rule name from the hostname (EG: > > 2) have the value be a triple, "name,score,description" and > explicitly store all attributes: > > 1) would be simpler to update and use up less memory, > 2) would be more flexible and let you combine several different > sites into one class of rule. I was thingking much more simply: if the domain is in the database of known spmvertised sites it gets a "hit" and a fixed value -- very simple, and hopefully very fast. If one has two classes of domain that one dislikes, one could open up two databases, one having merely disliked sites in it and the other having really detested sites in it and score them differently. > Probably should also add some kind of time-stamp to each entry to > facilitate automated updating. Nice idea. This would also assist in purging stale entries from a database, but I'm given to understand that that tactic may not work well any longer as spammers are "retiring" sites once they get into databases, waiting a period of time for them to expire out of caches and blacklists, and then opening up with them again. > Dave Funk University of Iowa > <dbfunk (at) engineering.uiowa.edu> College of Engineering > 319/335-5751 FAX: 319/384-0549 1256 Seamans Center > Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 Cool. Please say, "Hi!" to Doug Jones over in the Computer Dept. for me if you get a chance. Small world.... +------------------------------------------------+---------------------+ | Carl Richard Friend (UNIX Sysadmin) | West Boylston | | Minicomputer Collector / Enthusiast | Massachusetts, USA | | mailto:[EMAIL PROTECTED] +---------------------+ | http://users.rcn.com/crfriend/museum | ICBM: 42:22N 71:47W | +------------------------------------------------+---------------------+ ------------------------------------------------------- This SF. Net email is sponsored by: GoToMyPC GoToMyPC is the fast, easy and secure way to access your computer from any Web browser or wireless device. Click here to Try it Free! https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk