On Sat, Jul 04, 2009 at 11:53:27AM +0300, Jari Fredriksson wrote: > > Hello, > > > > while I get currently several 1000 shop/meds/pill/gen spams a day and > > some are going throug my filters, I have to move them to my spamfolder > > manualy and feed them to "sa-learn --spam" but this does not work... > > > > ...because the Spamer From: is in the auto_whitelist. > > > > For me, this seems to be a bug, becuase sa-learn has to remove the From: > > from the auto_whitelist and then RESCAN this crap. > > > > the two last days I have uncompressed the spamarchives from the last 27 > > weeks (from this year), used "formail" to extract all From: E-Mails > > unified them and used > > > > for FROM in ${LIST} ; do > > spamassassin --remove--addr-from-whitelist=${FROM} > > done > > > > which took over 52 hours for 487000 EMails. Hell, I have a super fast > > machine with 15000 RpM SCSI drives and 32 GByte of memory. This are 2.6 > > E-Mails per second...
You are loading a big perl program for every single email, what do you expect? ;) You should edit the database directly. If not using SQL, it's a bit more trickier.. could modify trim_whitelist to do it etc.. > Do You have SQL based AWL? If not, it might be worth a consideration, > given your amounts of email. > > With SQL > > for FROM in ${LIST} ; do > mysql -u spamassassin -psecret spamassassin <<EOF > delete from awl where email='${FROM}' ; > EOF > done > > Should be MUCH faster. It's possible that $FROM may contain quote characters, so it should be handled. It's always a good practise, even though I doubt any emails contain SQL injections.. Also you could just output all sql clauses into a file first and then run it. To avoid the same pitfall as above, though in a smaller scale. ;)