On Sun, Jul 25, 2010 at 04:43:36AM -0500, Jared Johnson wrote: > The plugin has the following advantages over the original:
Based on a single read-through of the code, I like most of it. Some assorted observations, though: - Parsing out the MIME sections is worthwhile, though it's really something that ought to be done in the core libs when plugins indicate they need it, or in a utility plugin if that's practical. This was something I meant to bring up a while back -- we have quite a few different content-scanning plugins, and if they're all obliged to invoke MIME::Parser repeatedly it'll be even more costly than that module already is. Not that I have time to do it myself, though. :P - For messages below a few hundred kb or so, consider MIME::Parser->output_to_core(), so we can avoid the fs churn. Annoyingly, MIME::Parser doesn't provide a mechanism to only consider particular mimetypes, or we could skip decoding non-text bodies. - It looks like you're scanning inside all parts of multipart MIME, where doing so only on text parts might be preferable. - The config really needs to live in a config file -- changing or expanding the format is fine if needed, but config that lives in code can't really be edited. I'm not bothered by having per-list defaults in the plugin, but one should be able to fully add a new list or alter an existing one without changing Perl code. - In the unwinding of HTML entity escaping -- I suggest doing & last. ">" is ">", not ">". > the SURBL two-level and three-level lists -- but the SURBL lists are > pruned to exclude stuff that we're pretty sure is never going to actually > get listed -- when was the last time you saw a spammer use .mil? > uribl.com never has, according to our datafeed :) We'll probably see compromised .gov and .mil sites hosting malware, much as we see in other domains. They're not that useful for classical spammer purposes since they can't be registered in the usual fashion. Devin -- Devin \ aqua(at)devin.com, IRC:Requiem; http://www.devin.com Carraway \ 1024D/E9ABFCD2: 13E7 199E DD1E 65F0 8905 2E43 5395 CA0D E9AB FCD2