One way that spammers could try to get around some of the URI rules (at least
for HTML only spam) is to put the main part of the URI into a <BASE> tag, so
that all of the URIs pulled from <A HREF="mumble"> won't match rules which
look for domain names and "http://";.  I've modified
get_decoded_stripped_body_text_array() so that it takes URIs from BASE tags,
so that if a spammer tries to hide http://sex-sex-sex.com/ in a <BASE> tag,
it will still be found, but URI rules that depend upon "http://"; being
present will still not work.

One way to get around this would be to rewrite the URI rules so to reduce
the dependency on the URI starting with "protocol://".  Since SA now harvests
URIs out of the message and hands them to the URI tester as an array of
strings, this shouldn't generate too many false positives.  A relative link
to an unsub page within an <A> element would still match the rule if the
"http://"; was removed.

The other way to get around it would be to take the <BASE> URI and prepend
it to all of the relative URIs before handing them to the tests, but that seems
to me to be going overboard.

-- 
Visit http://dmoz.org, the world's   | Give a man a match, and he'll be warm
largest human edited web directory.  | for a minute, but set him on fire, and
                                     | he'll be warm for the rest of his life.
[EMAIL PROTECTED]  ICQ: 132152059 |

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to