Craig Morrison wrote:

>Philip Prindeville wrote:
>  
>
>>I'm wondering what would be involved in putting in an HTML parser
>>that could call various rules to check things, like the case of:
>>
>><a href="http://www.foo.com/xyzzy";>http://www.bar.com/aardvark</a>
>>
>>where the link disagrees with the text between the anchor tags (yeah, you
>>could limit it to partial matches on the host-portion)...
>>    
>>
>
>This is the functional equivalent of pissing in the wind. If you are 
>downwind, you are going to get wet.
>
>Anchor text in too many/most cases will not match the HREF. grep is 
>good, but it isn't good enough to catch all cases without significant 
>overhead. Anchor text is a descriptor, nothing more than that. It is not 
>a regurgitation of the link HREF.
>
>  
>

Usually it's not.  That's the point.  It's when the anchor text is tries
to look
like a URL that one needs to be suspicious.  At the very least, if the
anchor text
starts with "https://"; but the anchor URL looks like "http://";, I'd say
that this is a
definite spam.

Does anyone have a way of doing a statistical analysis of ham that contains
http(s?):// as the beginning of the anchor text?

-Philip


-Philip

Reply via email to