On Wed, 10 Dec 2003 01:44:56 -0500, Bryan Hoover <[EMAIL PROTECTED]> posted to spamassassin-talk: > [EMAIL PROTECTED] wrote: >> > /^reply-to:[EMAIL PROTECTED](\.org|\.net)[EMAIL PROTECTED](\.org|\.net)\$/igm >> This is probably a sufficient pattern, but one distinguishing feature >> in the examples was that the same address would be repeated twice. > Think there were instances with two different addresses.
Then you can't use a backref after all. >> Also the examples are in the .com domain so the restriction to .org/.net >> is wrong. > Sure. Would have to add one for each domain. Or just forego the ambition to keep an up-to-date list of all valid TLDs in the world, and accept anything which looks vaguely like an email address. It's not exactly likely to get you a large number of false positives anyhow. >> I'd go with simply: >> /^Reply-to:\s+(\S+)\s+\1/i > I like the \1 -- "backreference" as I've come to know. > /(^reply-to:([EMAIL PROTECTED]){2})/i You are also requiring a space after the second occurrence. You shouldn't really be grabbing the spaces inside the parentheses anyway as simply adding a bit of variation in the spaces would cause the regex to fail. I'd turn it around like this: /^Reply-to:\s*(\w+\@(\w+\.)+\w+)\s+\1/i Actually I'd probably replace the \w:s with something which is better tuned to match on domain names, as characters such as dash are valid in domain names but not included in \w. Also the examples Robert posted had <>s around them. So here we go again: /^Reply-to:\s*(<[-a-z0-9_.]+\@([-a-z0-9_]+\.)+[a-z]+>)\s+\1/i Underscore is not technically valid in a domain name but you do see them in practice anyway. I'm not sure this is any better than what I originally posted, as I haven't tested this properly. My originally proposed rule would be prone to false positives in case somebody had the same token twice (or more :-) at the beginning of their Reply-To:, like so: Reply-To: Ma Ma Ma Belle <[EMAIL PROTECTED]> Your mileage not included when stirred, etc. >> I'm guessing the multi-line appearance was simply due to word wraps in >> Robert's mail program, and not actually there in the original headers. Oh, and even if the header was spread over two lines originally, SA would have folded it back onto a single line before attempting to match any rules. /* era */ -- The email address era the contact information Just for kicks, imagine at iki dot fi is heavily link on my home page at what it's like to get spam filtered. If you <http://www.iki.fi/era/> 500 pieces of spam for want to reach me, see instead. each wanted message. ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk