On Sat, 15 Jul 2017, Antony Stone wrote:

On Saturday 15 July 2017 at 11:19:54, mastered wrote:

Hi Nicola,

I'm not good at SHELL script language, but this might be fine:

1 - Save file into lista.txt

2 - trasform lista.txt in spamassassin rules:

cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf


If anyone can optimize it, i'm happy.

My first comment would be "useless use of cat" :)

My second comment would be that you can combine sed commands into a single
string, separated by ; so that you only have to call sed itself once at the
start of all that:

sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'"
lista.txt | nl .....

Another observation/optimization; use the perl pattern-match separator character specifier to avoid delimiter collision. (EG "m!" ).

The following two regexes are functionally equivalent but one is easier to write/read:

  /http:\/\/site\.com\/this\/that\/the\other\//i

  m!http://site\.com/this/that/the/other/!i

Second one avoids the "Leaning toothpick syndrome" https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome

Another way to use that data is to extract the hostnames and feed them into a local URI-dnsbl. Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM overhead) way to implement a local DNSbl for multiple purposes (EG an IP-addr based list for RBLDNSd or host-name based URI-dnsbl). The URI-dnsbl has an advantage of being easy to add names (just 'cat' them on to the end of the data-file with appropriate suffix) and doesn't require a restart of any daemon to take effect. Clearly it has a greater risk of FPs than a targeted rule that matches on the specific URL of the malware. However if the site is purpose created by blackhats to disseminate malware or a legitimate site that has been compromised and isn't being maintained then there's a high probability that it will be (ab)used again for other payloads. In that case blacklisting the host name gets all future garbage too. IMHO: any site on that list with more than 3 entries or a registration age of less than a year is fair game for URIdnsbl listing.

Looking at that data there are clearly several patterns that could be used to create targeted rules.


--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to