Hi,

I am looking on feedback and comments of the following ideas as well as possible additions to it. Please read on as I would very much appreciate inputs. But also know it is long too. Sorry, but lots of ideas are include here.

I am working on this idea and put into place a series of defense that are proved effective so far, but obviously not as practical and speedy as spamd is at the moment. It's a variable of scripts here and there based on multiple aspect of the standard use for web access.

The principal is to minimize the impact on legitimate users of your site and at the same time reject all the attacks to it that on different level goes from DDoS to wanted you to waist as much bandwidth as possible and to bring your site to a crawl.

Some of the ideas are not new and are based on spamd, just not all in place yet.

1. For Crawlers and Bot

First is the proliferations of mom and pop pots and crawlers. After testing difference setup, I realize to my surprise, yes call me stupid, that a handful are actually good citizen! The use and standard of robots.txt is well known and all good citizen robots should respect that. Not a mean of protection for your site, but never the less they should respect that. So, what's inside it, if you forbid some directories, or files, they should respect that and any that do not, well I guess it's fine to kill them. Why should they be granted access if they do respect my wishes as the owner and/or operator of the site(s).

1.1 First defense. No crawling on forbidden preset robots.txt with incremental deny access to them.

Many be not the best approach, but it is working as of all crawlers, this method in place catch 381 bad citizen crawlers in a week time. The idea is very simple. I preset my robots.txt file to include a file, or in this case a directory that if not to be crawler and in the directory I put a file that include a script that will block the source via PF and log the entry in a SQL database as well as it will be share between all servers later on. I also put on the front page of the site a very simple LINK to a 1 pixel image at the bottom of the page that is simply not visible to the users and that is not click able as well. So a regular user will never click and nor see it. But a crawler will follow all links obviously as the definition of a crawler. Now don't forget that the crawler is suppose to respect the robots.txt directives. So, this URL is in the forbidden directory and many crawlers do respect that very well. Live test proved this just to well. However, all the bad one, will not and as such, the URL trigger a script that will log their IP and add them to PF to block them right away! BYE BYE!

Now you may asked why I do incremental deny here. To be nice I guess, but also because some connections are from PROXY and not all proxy also have the header identifier as such. So, as such, you don't want to loose traffic from legitimate users that are behind PROXY like AOL. This need a bit more work and so far the standard should help to make sure only proxy from the same remote users behind it would be block should all proxy respect the standard and add this part to their header as most do. You can call this the bypass of broken proxy for now. Should all proxy be right, then this could be permanent, may be. This also have the side benefit to stop some low life from stealing your content by trying to import all your site content at once. Not the goal here, but it's a side benefit to it should you want that.

1.2 Forbidden bad bots.

For this, it's also simple and based on the idea above with this variation to it. In the same robots.txt file, I also include yet an other directory and also a file as well in the directory too. But there isn't any reference to it ANYWHERE on the site and that's the important point here. The only way to know about it is to read the robots.txt file and knowingly ignore the forbidden directive and go to crawl that file. Well should you do this, you are block for good. I catch a handful so far, but obviously some are after things you don't want public. Everyone know that obviously. So, these are block for good after verification are done to be sure it's not from a proxy. Again this goes to a SQL server and is added to all other servers in the area as well protecting them all at the same time and again the use of PF for this.

2. DDoS on httpd
Well, this you can use PF already and it's well explain as well in the FAQ. So, I don't think there is any needs for this here. I assume PF is in use as it's one of the main component of the gethttpd.

3. DDoS GET attacks & Bandwidth suckers defense. Multiple approach.

3.1 Good users supply data check.

So far most/all of the variations of attacks on web sites are with scripts trying to inject itself to your servers. Well, you need to do sanity checks on your code. Nothing can really protect you for that if you don't check what you expect to receive from users input. So, I have nothing for that. No idea anyway on how to, other then may be limiting the side of the argument a get can send, but even that is bad idea I think.

3.2 Gray listing idea via 302 temporary return code.

Many scripts wants you to waist as much bandwidth as possible, if they can't inject itself into your servers, so they will in turn attack a specific page or section of your site and try to make you waist plenty of bandwidth, or even SQL back end power as well.

One simple approach on this defense came to me from the idea of spamd. But to do this. You don't want the users to wait, or they will go else where and you just lots them. So, the idea is again simple. Just return the users a code to tell them to come back. Simply with a 302 temporary redirect code.

You might say this will affect my search engine, well not really. There isn't any impact as any search engine will not save temporary content on redirect and if they do, then they are wrong. But should you be concern with this, then add as well in the header a do not cache or save directive also defined in the standard.

So, what is happening is that GET attack and the like, if you look at a few different variation of them, do send GET message, some HEAD, etc. They impersonate a browser, an OS, etc. But NONE so far that I have seen will also process the HTML code itself. This mean they send the GET request and will not process the content of what they get and not follow the 302 request.

So, what you have is obviously the connection establish to your web server and you can't know if that's from a good or bad (fake) browser. However, this virus, or attack will not process the received temporary redirect and come back to you on a new URL.

So, to process this quickly, you simply put this IP in your gray listing process, just like spamd would do, and send back a 302 redirect with a new URL, same if you want, with added useless things to it.

Now the users are coming back to you and you see the new requests coming in and process the changed from gray to white listing and following requests will be without any delay or processing from your servers. So, the impact is minimal n the first request only.

In effect you just added a very simple gray listing to your server and protected it from bandwidth hugs! At the same time protected your database bank end as well as no request was sent to it as well for that use content, nor did you send any images, or object to the requesting user and your header is very short as well, so you don't waist processing power, drive access and bandwidth and your server(s) are still replying to users requests fully.

3.3 What about proxy

To address that we use the same idea of the redirect, but we also add a canary. Then you get the reply message to it and obviously check for this canary and if you do get it, then that IP connection is now white listed. So, this canary principal combine with the aging process also have the benefit to work with proxy servers. But it's not perfect.

So, I also experimented with the OSFP, but obviously it doesn't see the remote users behind the proxy, or I didn't see it anyway as it is part of the TCP connection coming to you. Many users could be behind the same proxy and I would always get the same OSFP signature.

To address this may be some changed there could be done, I really don't know. However, the HTML header of the request could definitely be check as well to help gray/white list temporary connection from very busy proxy server to make the difference between valid redirect message and fake one.

3.4 What about the compromise user computer itself, or proxy server.

Here again, it's possible to do this. The header does provide the difference needed to allow/deny connection from the same user computer that are sent to you from the real user browsers and the virus/attack that would be on the same computer should you want to allow this.

You would even have the possibility to provide feedback and alerts to that user of the problem and advising them to clean their computer if you really wanted to go that far. The signature of the user browser, OS, etc, in the header and the fake header from the GET virus will simply not match. Not until the virus get smarter as to find that information from the user computer and then preset itself as such. So you can see the difference between them here allowing real user connections from a compromise computer should you want it obviously. Should you, well that's an other question and add to the complexity of the daemon and i am not sure it should.

3.5 gray listing effect.

The idea of the 302 message is that oppose to spamd, where you slow down the remote party, here you can't really do that as if you do, the users will go else where as your site is not responsive and you don't want that.

The 302 message do just that and the remote users, doesn't really see it as a slow connection or delay. Even if they bookmark that changed URL, then it's not a problem as they will come to that page anyway, just the bookmarked canary of the time would be sent to it. But as you are not waiting for that canary, the process of redirect would apply again, the redirect get started and all will be fine.

Then you can either keep that ip as allow source if your site is not too busy, or if it is really busy, you need to flush the IP's from the PF table in an aging process obviously or you will crash your servers, or run out of memory.

As for the redirect, if it doesn't come back to you, then you really have nothing to do as the virus or attack do not process these redirects so far, so no connection will come back to you.

4. What about more intelligent attack.

It's possible that more intelligent attacks would be develop as to read the incoming request and do the redirect. In witch case, most of the above would be useless. So, what could be done then. Sign up sites, only. Who wants that, plus it's a deterrent for users. More elaborate redirect as having a redirect all the time, not sure that's any good idea. Having an image on the page requested, all pages and wait until the request for that image comes in and then white list the request. This would require a lot more complicated design and I am not sure of the benefit of it. But it sure would increase the results. Cost/benefit, I am not sure however. More complex setup and software, mean more bugs and possibly works against what this is suppose to fight.

Anyway, theses are ideas I am putting forwards and for most of them, I have put them to work in live systems with success. I am curious as to if other ideas might add to this and help as well before I give a crack to my gethttpd daemon.

Thanks for your time if you read that far and any feedback good or bad would very much be welcome.

Regards,

Daniel

Reply via email to