On Mon, Jul 02, 2007 at 01:28:27PM -0700, Jo Rhett wrote: > Both of these assume I know every person who needs to e-mail me, and > everything they will send me. Theo, you're active in enough open > source projects to know better.
Well, you just said you were receiving a large amount of "system" type mails, which for me would all be from my own/well defined set of systems. > Well then we need to alter the code. While bareword domain matching > might make sense, it doesn't make sense for /a/valid/system/path/ > file.pl for "file.pl" to be checked. Zero hits on spam corpus. I think this is definitely a section of SA that could use some work, so ... Patches welcome. :) As a start, PerMsgStatus::_get_parsed_uri_list() is the function that goes through the text looking for hostnames or domains. It looks for both schemed URIs (http://.../) and schemeless URIs, which is where you're getting hit. Everything else, such as URIDNSBL, keys off of that. Random thought: URIDNSBL actually has a set of priorities when figuring out which domains to query. I wonder if the results would be better/worse if the rules were based on the source type -- at least HTML versus parsed, but could also be HTML tag, etc. -- Randomly Selected Tagline: "G: And are you using Windows or a Mac? T: Neither, I'm using Linux. G: Oh, you're a power user." - Theo and his ex-ISP
pgpgMmcI3NQqn.pgp
Description: PGP signature