Hi,
I am looking on feedback and comments of the following ideas as well as
possible additions to it. Please read on as I would very much appreciate
inputs. But also know it is long too. Sorry, but lots of ideas are
include here.
I am working on this idea and put into place a series of defense that
are proved effective so far, but obviously not as practical and speedy
as spamd is at the moment. It's a variable of scripts here and there
based on multiple aspect of the standard use for web access.
The principal is to minimize the impact on legitimate users of your site
and at the same time reject all the attacks to it that on different
level goes from DDoS to wanted you to waist as much bandwidth as
possible and to bring your site to a crawl.
Some of the ideas are not new and are based on spamd, just not all in
place yet.
1. For Crawlers and Bot
First is the proliferations of mom and pop pots and crawlers. After
testing difference setup, I realize to my surprise, yes call me stupid,
that a handful are actually good citizen! The use and standard of
robots.txt is well known and all good citizen robots should respect
that. Not a mean of protection for your site, but never the less they
should respect that. So, what's inside it, if you forbid some
directories, or files, they should respect that and any that do not,
well I guess it's fine to kill them. Why should they be granted access
if they do respect my wishes as the owner and/or operator of the site(s).
1.1 First defense. No crawling on forbidden preset robots.txt with
incremental deny access to them.
Many be not the best approach, but it is working as of all crawlers,
this method in place catch 381 bad citizen crawlers in a week time. The
idea is very simple. I preset my robots.txt file to include a file, or
in this case a directory that if not to be crawler and in the directory
I put a file that include a script that will block the source via PF and
log the entry in a SQL database as well as it will be share between all
servers later on. I also put on the front page of the site a very simple
LINK to a 1 pixel image at the bottom of the page that is simply not
visible to the users and that is not click able as well. So a regular
user will never click and nor see it. But a crawler will follow all
links obviously as the definition of a crawler. Now don't forget that
the crawler is suppose to respect the robots.txt directives. So, this
URL is in the forbidden directory and many crawlers do respect that very
well. Live test proved this just to well. However, all the bad one, will
not and as such, the URL trigger a script that will log their IP and add
them to PF to block them right away! BYE BYE!
Now you may asked why I do incremental deny here. To be nice I guess,
but also because some connections are from PROXY and not all proxy also
have the header identifier as such. So, as such, you don't want to loose
traffic from legitimate users that are behind PROXY like AOL. This need
a bit more work and so far the standard should help to make sure only
proxy from the same remote users behind it would be block should all
proxy respect the standard and add this part to their header as most do.
You can call this the bypass of broken proxy for now. Should all proxy
be right, then this could be permanent, may be. This also have the side
benefit to stop some low life from stealing your content by trying to
import all your site content at once. Not the goal here, but it's a side
benefit to it should you want that.
1.2 Forbidden bad bots.
For this, it's also simple and based on the idea above with this
variation to it. In the same robots.txt file, I also include yet an
other directory and also a file as well in the directory too. But there
isn't any reference to it ANYWHERE on the site and that's the important
point here. The only way to know about it is to read the robots.txt file
and knowingly ignore the forbidden directive and go to crawl that file.
Well should you do this, you are block for good. I catch a handful so
far, but obviously some are after things you don't want public. Everyone
know that obviously. So, these are block for good after verification are
done to be sure it's not from a proxy. Again this goes to a SQL server
and is added to all other servers in the area as well protecting them
all at the same time and again the use of PF for this.
2. DDoS on httpd
Well, this you can use PF already and it's well explain as well in the
FAQ. So, I don't think there is any needs for this here. I assume PF is
in use as it's one of the main component of the gethttpd.
3. DDoS GET attacks & Bandwidth suckers defense. Multiple approach.
3.1 Good users supply data check.
So far most/all of the variations of attacks on web sites are with
scripts trying to inject itself to your servers. Well, you need to do
sanity checks on your code. Nothing can really protect you for that if
you don't check what you expect to receive from users input. So, I have
nothing for that. No idea anyway on how to, other then may be limiting
the side of the argument a get can send, but even that is bad idea I think.
3.2 Gray listing idea via 302 temporary return code.
Many scripts wants you to waist as much bandwidth as possible, if they
can't inject itself into your servers, so they will in turn attack a
specific page or section of your site and try to make you waist plenty
of bandwidth, or even SQL back end power as well.
One simple approach on this defense came to me from the idea of spamd.
But to do this. You don't want the users to wait, or they will go else
where and you just lots them. So, the idea is again simple. Just return
the users a code to tell them to come back. Simply with a 302 temporary
redirect code.
You might say this will affect my search engine, well not really. There
isn't any impact as any search engine will not save temporary content on
redirect and if they do, then they are wrong. But should you be concern
with this, then add as well in the header a do not cache or save
directive also defined in the standard.
So, what is happening is that GET attack and the like, if you look at a
few different variation of them, do send GET message, some HEAD, etc.
They impersonate a browser, an OS, etc. But NONE so far that I have seen
will also process the HTML code itself. This mean they send the GET
request and will not process the content of what they get and not follow
the 302 request.
So, what you have is obviously the connection establish to your web
server and you can't know if that's from a good or bad (fake) browser.
However, this virus, or attack will not process the received temporary
redirect and come back to you on a new URL.
So, to process this quickly, you simply put this IP in your gray listing
process, just like spamd would do, and send back a 302 redirect with a
new URL, same if you want, with added useless things to it.
Now the users are coming back to you and you see the new requests coming
in and process the changed from gray to white listing and following
requests will be without any delay or processing from your servers. So,
the impact is minimal n the first request only.
In effect you just added a very simple gray listing to your server and
protected it from bandwidth hugs! At the same time protected your
database bank end as well as no request was sent to it as well for that
use content, nor did you send any images, or object to the requesting
user and your header is very short as well, so you don't waist
processing power, drive access and bandwidth and your server(s) are
still replying to users requests fully.
3.3 What about proxy
To address that we use the same idea of the redirect, but we also add a
canary. Then you get the reply message to it and obviously check for
this canary and if you do get it, then that IP connection is now white
listed. So, this canary principal combine with the aging process also
have the benefit to work with proxy servers. But it's not perfect.
So, I also experimented with the OSFP, but obviously it doesn't see the
remote users behind the proxy, or I didn't see it anyway as it is part
of the TCP connection coming to you. Many users could be behind the same
proxy and I would always get the same OSFP signature.
To address this may be some changed there could be done, I really don't
know. However, the HTML header of the request could definitely be check
as well to help gray/white list temporary connection from very busy
proxy server to make the difference between valid redirect message and
fake one.
3.4 What about the compromise user computer itself, or proxy server.
Here again, it's possible to do this. The header does provide the
difference needed to allow/deny connection from the same user computer
that are sent to you from the real user browsers and the virus/attack
that would be on the same computer should you want to allow this.
You would even have the possibility to provide feedback and alerts to
that user of the problem and advising them to clean their computer if
you really wanted to go that far. The signature of the user browser, OS,
etc, in the header and the fake header from the GET virus will simply
not match. Not until the virus get smarter as to find that information
from the user computer and then preset itself as such. So you can see
the difference between them here allowing real user connections from a
compromise computer should you want it obviously. Should you, well
that's an other question and add to the complexity of the daemon and i
am not sure it should.
3.5 gray listing effect.
The idea of the 302 message is that oppose to spamd, where you slow down
the remote party, here you can't really do that as if you do, the users
will go else where as your site is not responsive and you don't want that.
The 302 message do just that and the remote users, doesn't really see it
as a slow connection or delay. Even if they bookmark that changed URL,
then it's not a problem as they will come to that page anyway, just the
bookmarked canary of the time would be sent to it. But as you are not
waiting for that canary, the process of redirect would apply again, the
redirect get started and all will be fine.
Then you can either keep that ip as allow source if your site is not too
busy, or if it is really busy, you need to flush the IP's from the PF
table in an aging process obviously or you will crash your servers, or
run out of memory.
As for the redirect, if it doesn't come back to you, then you really
have nothing to do as the virus or attack do not process these redirects
so far, so no connection will come back to you.
4. What about more intelligent attack.
It's possible that more intelligent attacks would be develop as to read
the incoming request and do the redirect. In witch case, most of the
above would be useless. So, what could be done then. Sign up sites,
only. Who wants that, plus it's a deterrent for users. More elaborate
redirect as having a redirect all the time, not sure that's any good
idea. Having an image on the page requested, all pages and wait until
the request for that image comes in and then white list the request.
This would require a lot more complicated design and I am not sure of
the benefit of it. But it sure would increase the results. Cost/benefit,
I am not sure however. More complex setup and software, mean more bugs
and possibly works against what this is suppose to fight.
Anyway, theses are ideas I am putting forwards and for most of them, I
have put them to work in live systems with success. I am curious as to
if other ideas might add to this and help as well before I give a crack
to my gethttpd daemon.
Thanks for your time if you read that far and any feedback good or bad
would very much be welcome.
Regards,
Daniel