Hey there, sorry I didn't respond yet. I was actually figuring out how I would use these cool scripts. There are some GREAT ideas here, but I might be working counter to them. Let me explain how I do this, and how I think updates would be done. This way we can talk about changing it to get better regex:
Basically you see the rules now in Alpha order. This is because I cat >> all my lists together for the last few months, sorted, and ran uniq. My scrpits for writing the rules work with 2 formats: 1 domain per line many domans per line seperated by a pipe '|' So I need whatever scripts to be able to deal with that kind of input. I'm going to hack Alex's code to take a list of 1 domain per line, and convert it to X number per line for me with pipes. I did that part by hand!!!! OUCH! Ok, so I have a huge list of domains 1 per line. They INCLUDE FPs that have been removed from the rules. I want that for reasons I'm about to explain. So here is how I forsee updates being done. Start with same process as before. Script the pulls out all http: domains from my spam corpus of 5-20 days. strips it down to 1 domain per line. Now I run a hit frequency script to see how many times a domain from the new list is in the old list. I'm interested in only the ones with zero hits. Because the old list contains FPs, it will also eliminate them from the update. So it keeps track that way, so I never have to worry about akaimetech or whatever again. Now taking the domains with 0 hits , I form a clean new list. I run the Hacked Alex code script I've yet to do, to convert the list to 15 domains per line. Then I run the script to make the rules and I tell it what rule number to start with (179). Poof it makes the new rules. I cat >> to the old rules and we are done. However now they are not in ALPHA anymore. 1st part of list in Alpha, then update in Alpha. Obviously the only way to do this is to recreate the entire ruleset from a big list again. I wanted to shy away from this. I would also have to keep a seperate FP list to match against, rather then leaving in the one list I check. I'm looking to make this better in any way, while still keeping my sanity :) Right now, each line of ~15 rules averages 5.75K use of memory. Thats .38K per domain :) --Chris ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk