Regex help

Kevin Miller Thu, 21 Apr 2011 14:55:16 -0700

We've been receiving a lot of spam lately which is waltzing right through the 
spam filters.  I've trained a thousand or more yesterday (we only get around 
3-5 thousand legitimate messages a day), but of course it is always changing 
slightly, and comes from sources that most often aren't yet in any RBLs.


The spam is HTML mail and one thing I've noticed is that there is a large 
number of html break codes in the body, pushing the 'unsubscribe' down below 
the bottom of the screen.  (FWIW, clicking on an unsub link generally fires a 
warning from TrendMicro about the site being compromised.)

Anyway, I'm trying to write a local rule that will scan for 5 or more instances 
of "<br>" but not having much luck.  I'm testing first on the CLI, just trying 
to get the syntax down.  

What works:
I have a file called DomainLiterals.txt with repeating characters and it 
returns expected results:
mkm@mis-mkm-lnx:~$ egrep \[10.]{3} DomainLiterals.txt 
you can add a line containing only [10.10.10.10] to /etc/mail/local-host-names 
where 10.10.10.10 is the IP address you 

However, doing this fails:
mxg:/var/spool/MailScanner/quarantine/20110421/nonspam # egrep \[<br>]{5,} 
p3LJZSnX024470
-bash: br: No such file or directory

The file p3LJZSnX024470 is just a plain text file in a quarantine directory.

What am I missing?  I'll turn this into a body rule once I get the syntax right 
then test it for a day or so w/a score of .01.  If I'm not hitting legitimate 
mail I'll bump it up.

Thanks...

...Kevin
--
Kevin Miller                Registered Linux User No: 307357
CBJ MIS Dept.               Network Systems Admin., Mail Admin.
155 South Seward Street     ph: (907) 586-0242
Juneau, Alaska 99801        fax: (907 586-4500

Regex help

Reply via email to