> -----Original Message----- > From: Chris Santerre
--snip-- > > > > One of the big advantages of using a DB type system is that it > > can be updated 'hot' on a running system. A system based upon > > parsing a config file and creating an in-memory hash table would > > requirerestarting spamd every time an update was made. > > > > If we want to have any hope of automating such a system, it > > needs to be updatable 'hot' (note how Bayes operates). > > Again, am I the only one that thinks this should operate like > an AWL? If it hits a spam threshold, them parse it for URLs. > Match those against good URL list so it can't be poisoned. > Then adjust the score. You got yourself an ABL. --snip-- > > I envision this working in a couple of possible ways, either > > updated from a central site (EG the rules emporium) via > > wget/rsync etc, or by a local engine that would use some kind of > > heuristics on suspect host names found in potential spam (do DNS > > lookups, use IP that point to spammer nets, look at 'whois' data > > for spammer hosting, look at DNS TTLs, etc). > > I think each machine should handle there own. However it > would be nice to be able to import files from others into > your own DB. Most of you guys can get over my head quickly. So I am just adding my $.02 as food for thought - This sounds a lot like the squidGuard blacklist implementation. You start with a base text file - one each for domains, urls, and regex. It is up to each the administrator to convert to db at each installation. The squidguard binary has a command line option to create each configured (meaning via conf) blacklist domains and urls file.db. A different command line option will update the configured db files from a file.diff. Example - start with a urls file, create a urls.db file and then update with the urls.diff. The urls.diff file simply supplies '+' or '-' prefixes to the entries. So if you want to share site specific data, you can supply the .diff file. What would be beneficial here though would be to create the update mechanism so it will update any file#.diff entry (in the same manner that SA will parse any .cf file for rules) while ignoring existing file.db entries. The porn domains list alone has 48536 entries. My code builds this list fairly slowly using DB_File. BerkeleyDB, which can not be used with squidGuard, would be similar but only slightly faster. I am not the only one to experience this but then again, I am not the most efficient coder. squidGuard's build of the databases is extremely quick but that also may be attributed to the C code they developed. Regardless, the squidGuard authors have mentioned in the documentation that using pre-built databases performs marginally slower even with a 1 million entry database compared to the in-memory-only B-trees. References: http://www.squidguard.org/doc/ News and changes --> Major news and changes in --> 1.1.0.beta1: http://www.squidguard.org/config/ The database --Larry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk