On 1/30/06, Theo Van Dinter <[EMAIL PROTECTED]> wrote: > On Mon, Jan 30, 2006 at 11:48:17AM -0500, Dan wrote: > > <a > > href="http://123.123.123.123/fraud_uri">http://amazon.com/official_looking_path</a> > > I can write a regexp that looks for an address in the <a> tag's body > > that is different than in it's href, but I figured someone else had > > already written one. Can someone point me in the right direction? > > You can't do this in a regexp, you need to write some code. There's already > the check_https_ip_mismatch() function which looks for something similar to > this. It turns out that href != anchor text is a pretty bad spam sign since > it happens in ham all the time.
I was thinking of a regexp along the lines of: /href=\"https?:\/\/[0-9]{1,3}(\.[0-9]{1,3}){3}[^>]+>http:\/\/\w/i It's not perfect, but it would detect the above scenerio. What does check_https_ip_mismatch() do? -Dan