On Wed, 14 Mar 2012, Alex wrote:
I actually created a bunch of those already, and would appreciate if
someone would check my work:
uri LOC_WP
m{https?://.[^/]+/(wp-content|modules/mod_wdbanners|wp-admin|wp-includes|cruise/wp-content|includes/|web/wp-content|google_recommends|mt-static)/}
describe LOC_WP Contains wordpress uri
score LOC_WP 0.5
meta LOC_WP_SHORT (LOC_WP && LOC_SHORT)
describe LOC_WP_SHORT Contains wp-content and short body
score LOC_WP_SHORT 0.6
meta LOC_WP_SUBJ (LOC_WP && MISSING_SUBJECT)
describe LOC_WP_SUBJ Contains wordpress uri and missing subject
score LOC_WP_SUBJ 1.2
LOC_SHORT is a meta for rawbody lt 200 and contains a URI.
These appear to hit quite a bit of ham, or false-negatives; I'm not
sure. Wordpress URLs are obviously pretty frequent, but I don't think
0.5 would be too much to push ham to spam.
That will FP, as almost any legit page ref from a WP site will have
"/wp-content/" in it, I was referring to the "/wp-content/plugins/" as
being suspicious. But your idea of using it in metas with other
spammy characteristics is good.
One clue: "X-Originating-IP: [41.189.207.189]"
Check the various RBL hits on that address. ;)
Are there existing plugins for this?
Is there a way to check a range to see if it's part of a known
blacklisted botnet?
The "cbl.abuseat.org" RBL explicitly lists infected/bot-net machines.
(which does list that IP addr). So mail that contains a CBL listed
ip addr anywhere in its headers is suspect.
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{