On Wed, 10 Dec 2003 01:44:56 -0500, Bryan Hoover <[EMAIL PROTECTED]>
posted to spamassassin-talk:
 > [EMAIL PROTECTED] wrote:
 >> > /^reply-to:[EMAIL PROTECTED](\.org|\.net)[EMAIL PROTECTED](\.org|\.net)\$/igm
 >> This is probably a sufficient pattern, but one distinguishing feature
 >> in the examples was that the same address would be repeated twice.
 > Think there were instances with two different addresses.

Then you can't use a backref after all.

 >> Also the examples are in the .com domain so the restriction to .org/.net
 >> is wrong.
 > Sure.  Would have to add one for each domain.

Or just forego the ambition to keep an up-to-date list of all valid
TLDs in the world, and accept anything which looks vaguely like an
email address. It's not exactly likely to get you a large number of
false positives anyhow.

 >> I'd go with simply:
 >> /^Reply-to:\s+(\S+)\s+\1/i
 > I like the \1 -- "backreference" as I've come to know.
 > /(^reply-to:([EMAIL PROTECTED]){2})/i

You are also requiring a space after the second occurrence. You
shouldn't really be grabbing the spaces inside the parentheses anyway
as simply adding a bit of variation in the spaces would cause the
regex to fail. I'd turn it around like this:

  /^Reply-to:\s*(\w+\@(\w+\.)+\w+)\s+\1/i

Actually I'd probably replace the \w:s with something which is better
tuned to match on domain names, as characters such as dash are valid
in domain names but not included in \w. Also the examples Robert
posted had <>s around them. So here we go again:

  /^Reply-to:\s*(<[-a-z0-9_.]+\@([-a-z0-9_]+\.)+[a-z]+>)\s+\1/i

Underscore is not technically valid in a domain name but you do see
them in practice anyway.

I'm not sure this is any better than what I originally posted, as I
haven't tested this properly. My originally proposed rule would be
prone to false positives in case somebody had the same token twice (or
more :-) at the beginning of their Reply-To:, like so:

  Reply-To: Ma Ma Ma Belle <[EMAIL PROTECTED]>

Your mileage not included when stirred, etc.

 >> I'm guessing the multi-line appearance was simply due to word wraps in
 >> Robert's mail program, and not actually there in the original headers.

Oh, and even if the header was spread over two lines originally, SA
would have folded it back onto a single line before attempting to
match any rules.

/* era */

-- 
The email address era     the contact information   Just for kicks, imagine
at iki dot fi is heavily  link on my home page at   what it's like to get
spam filtered.  If you    <http://www.iki.fi/era/>  500 pieces of spam for
want to reach me, see     instead.                  each wanted message.



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to