On Sat, Oct 15, 2011 at 12:38 AM, <dar...@chaosreigns.com> wrote:
> And I need to remind you that it hits almost as much ham as spam:
> http://ruleqa.spamassassin.org/20111008-r1180336-n/T_SPOOFED_URL/detail
>
> I agree it seems like we should be able to improve it.  Maybe make
> exceptions for known marketing trackers, as Adam Katz mentioned it has
> problems with.

just to add a few more suggestions:
* checking whether the anchor's actual URL (href URL) has the modal
domain (a domain that is most frequently linked in the same email),
and if it is not the modal domain then the email is spam.
* checking the age of the href URL's domain via a Whois lookup (not
all domains have the registration time stamp though), and if the age
falls below certain thresholds then it's spam.
* checking the domain rank via a search engine, and if the rank falls
below certain thresholds then it's spam.

google already uses page ranks to reduce false positives in
misclassifying phishing websites (the result is then distributed via a
blacklist to FF/Chrome via google safe browsing API). Whois and modal
domain tests are also used in some proposed classifiers (but no idea
if they are used in production yet).

this can be helpful as phishing URLs/domains are often short-lived.
IIRC the average uptime for a phishing page/domain is ~2 hours (from
top of my head, didn't verify but should be close enough).

my concern is that this URL mismatch test might have too little added
value (0.599 S/O) to spend any expensive optimizations on it. it might
be more productive to invest time in other more promising tests and
make them better.

--
Regards,
Mahmoud Khonji
PGP Key: 0x92584ECA

Reply via email to