On 4/3/2018 9:27 AM, Leandro wrote:
We just created an URL signature algorithm to be able to query an entire URL at our URIBL:

https://spfbl.net/en/uribl/

Now we are able to blacklist any malicious shortener URL


Leandro,

Thanks for all you do! And good luck with that. But there are a few potential problems. When I analyzed Google's shortners about a month ago, I found that a VERY large percentage of the most malicious shortened URLs were a situation where the spammers were generating a unique shortner for each individual message/recipient-address. This causes the following HUGE problems (at least for THESE particular shortners) when publishing a full-URL dnsbl:

(1) much of what you populate your rbldnsd file with is going to be totally ineffective for anyone since it ONLY applied to whatever single email address where the spam was original sent (where you had trapped it) - everyone else is going to get DIFFERENT shortners for the spam from these same campaigns that are sent to their users.

(2) get ready for EXTREME rbldnsd bloat. You're gonna need a LOT of RAM eventually? And if you ever distribute via rsync, those are going to be HUGE rsync files (and then THEY will need a lot of RAM). Sadly, most of that bloat is going to come from entries that are doing absolutely nothing for anyone.

(3) You might be revealing your spam traps to the spammers. In cases where the spammers are sending that 1-to-1 spam to single recipient shortners, then all they gave to do is enumerate through their list of shortners, checking them against your list - and they INSTANTLY get a list of every recipient address that triggers a listing on your DNSBL. If you want to destroy the effectiveness of your own DNSBL's spam traps - be my guest. But if you're getting 3rd party spam feeds (paid or free) - then know that you're then screwing over your 3rd party spam feed's spam traps - and those OTHER anti-spam system that rely on such feeds, which will then diminish in quality. (unless you are filtering OUT these MANY 1-to-1 shortner spams)

Maybe there is enough OTHER shortners (that are sending the same shortners to multiple recipients) to make this worthwhile? But the bloat from the ones that are uniquely generated could be a challenge, and could potentially cause a MASSIVE amount of useless queries. I'd be very interested to see what PERCENTAGE of such queries generated a hit!

Meanwhile, in my analysis I did about a month ago, about 80% of Google's shortners found in egregious spams (that did this one-to-one shorter-to-recipient tactic)... were all banging on one of ONLY a dozen different spammers' domains. Therefore, doing a lookup on these and then checking the domain found at the base of the link it redirects to... is a more effective strategy for these - whereas, for THESE 80% of egregious google shortners, a full URL lookup is worthless, consuming resources without a single hit.

Alternatively, you may have found a way to filter out these types of individualized shortners, to prevent that bloat? But even then, everyone should know that while your new list might be helpful, it would be good for others to know your new list isn't applicable to a large percentage of spammy shortners, since it is still useless against these individualized shortners.

NOTE: Google has made some improvements recently, and I haven't yet analyzed how much those improvements have changed any of these things I've mentioned?

PS - the alphanumeric code at the end of these shortners tend to be case-sensitive, while the rest of the URL is NOT case sensitive (and they also work with both "https" and "http")... so you might want to standardize this on (1) https and (2) everything lower case up until the code at the end of the shortner - before the MD5 is calculated. Otherwise, it could easily break if the spammer just mixes up the capitalization of the shortner URL up until the code at the end of the shortner.

--
Rob McEwen
https://www.invaluement.com

Reply via email to