RE: [SAtalk] Re: URI database lookup feature (was Sanity checking new uri rules?)

Larry Gilson Tue, 18 Nov 2003 15:40:46 -0800


> -----Original Message-----
> From: Chris Santerre


--snip--

> > 
> > One of the big advantages of using a DB type system is that it 
> > can be updated 'hot' on a running system. A system based upon 
> > parsing a config file and creating an in-memory hash table would 
> > requirerestarting spamd every time an update was made.
> > 
> > If we want to have any hope of automating such a system, it
> > needs to be updatable 'hot' (note how Bayes operates).
> 
> Again, am I the only one that thinks this should operate like 
> an AWL? If it hits a spam threshold, them parse it for URLs. 
> Match those against good URL list so it can't be poisoned. 
> Then adjust the score. You got yourself an ABL. 

--snip--

> > I envision this working in a couple of possible ways, either
> > updated from a central site (EG the rules emporium) via 
> > wget/rsync etc, or  by a local engine that would use some kind of
> > heuristics on suspect host names found in potential spam (do DNS
> > lookups, use IP that point to spammer nets, look at 'whois' data
> > for spammer hosting, look at DNS TTLs, etc).
> 
> I think each machine should handle there own. However it 
> would be nice to be able to import files from others into 
> your own DB. 

Most of you guys can get over my head quickly.  So I am just adding my $.02
as food for thought - 

This sounds a lot like the squidGuard blacklist implementation.  You start
with a base text file - one each for domains, urls, and regex.  It is up to
each the administrator to convert to db at each installation.  The
squidguard binary has a command line option to create each configured
(meaning via conf) blacklist domains and urls file.db.  A different command
line option will update the configured db files from a file.diff.  Example -
start with a urls file, create a urls.db file and then update with the
urls.diff.  The urls.diff file simply supplies '+' or '-' prefixes to the
entries.  So if you want to share site specific data, you can supply the
.diff file.  What would be beneficial here though would be to create the
update mechanism so it will update any file#.diff entry (in the same manner
that SA will parse any .cf file for rules) while ignoring existing file.db
entries.

The porn domains list alone has 48536 entries.  My code builds this list
fairly slowly using DB_File.  BerkeleyDB, which can not be used with
squidGuard, would be similar but only slightly faster.  I am not the only
one to experience this but then again, I am not the most efficient coder.
squidGuard's build of the databases is extremely quick but that also may be
attributed to the C code they developed.  Regardless, the squidGuard authors
have mentioned in the documentation that using pre-built databases performs
marginally slower even with a 1 million entry database compared to the
in-memory-only B-trees.

References:
http://www.squidguard.org/doc/
News and changes --> Major news and changes in --> 1.1.0.beta1:

http://www.squidguard.org/config/
The database


--Larry



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

RE: [SAtalk] Re: URI database lookup feature (was Sanity checking new uri rules?)

Reply via email to