On 4/3/2018 9:27 AM, Leandro wrote:
We just created an URL signature algorithm to be able to query an
entire URL at our URIBL:
https://spfbl.net/en/uribl/
Now we are able to blacklist any malicious shortener URL
Leandro,
Thanks for all you do! And good luck with that. But there are a few
potential problems. When I analyzed Google's shortners about a month
ago, I found that a VERY large percentage of the most malicious
shortened URLs were a situation where the spammers were generating a
unique shortner for each individual message/recipient-address. This
causes the following HUGE problems (at least for THESE particular
shortners) when publishing a full-URL dnsbl:
(1) much of what you populate your rbldnsd file with is going to be
totally ineffective for anyone since it ONLY applied to whatever single
email address where the spam was original sent (where you had trapped
it) - everyone else is going to get DIFFERENT shortners for the spam
from these same campaigns that are sent to their users.
(2) get ready for EXTREME rbldnsd bloat. You're gonna need a LOT of RAM
eventually? And if you ever distribute via rsync, those are going to be
HUGE rsync files (and then THEY will need a lot of RAM). Sadly, most of
that bloat is going to come from entries that are doing absolutely
nothing for anyone.
(3) You might be revealing your spam traps to the spammers. In cases
where the spammers are sending that 1-to-1 spam to single recipient
shortners, then all they gave to do is enumerate through their list of
shortners, checking them against your list - and they INSTANTLY get a
list of every recipient address that triggers a listing on your DNSBL.
If you want to destroy the effectiveness of your own DNSBL's spam traps
- be my guest. But if you're getting 3rd party spam feeds (paid or free)
- then know that you're then screwing over your 3rd party spam feed's
spam traps - and those OTHER anti-spam system that rely on such feeds,
which will then diminish in quality. (unless you are filtering OUT these
MANY 1-to-1 shortner spams)
Maybe there is enough OTHER shortners (that are sending the same
shortners to multiple recipients) to make this worthwhile? But the bloat
from the ones that are uniquely generated could be a challenge, and
could potentially cause a MASSIVE amount of useless queries. I'd be very
interested to see what PERCENTAGE of such queries generated a hit!
Meanwhile, in my analysis I did about a month ago, about 80% of Google's
shortners found in egregious spams (that did this one-to-one
shorter-to-recipient tactic)... were all banging on one of ONLY a dozen
different spammers' domains. Therefore, doing a lookup on these and then
checking the domain found at the base of the link it redirects to... is
a more effective strategy for these - whereas, for THESE 80% of
egregious google shortners, a full URL lookup is worthless, consuming
resources without a single hit.
Alternatively, you may have found a way to filter out these types of
individualized shortners, to prevent that bloat? But even then, everyone
should know that while your new list might be helpful, it would be good
for others to know your new list isn't applicable to a large percentage
of spammy shortners, since it is still useless against these
individualized shortners.
NOTE: Google has made some improvements recently, and I haven't yet
analyzed how much those improvements have changed any of these things
I've mentioned?
PS - the alphanumeric code at the end of these shortners tend to be
case-sensitive, while the rest of the URL is NOT case sensitive (and
they also work with both "https" and "http")... so you might want to
standardize this on (1) https and (2) everything lower case up until the
code at the end of the shortner - before the MD5 is calculated.
Otherwise, it could easily break if the spammer just mixes up the
capitalization of the shortner URL up until the code at the end of the
shortner.
--
Rob McEwen
https://www.invaluement.com