Feedback wanted on gethttpd graylisting ideas included

Daniel Ouellet Sat, 09 Sep 2006 17:09:06 -0700

Hi,

I am looking on feedback and comments of the following ideas as well aspossible additions to it. Please read on as I would very much appreciateinputs. But also know it is long too. Sorry, but lots of ideas areinclude here.

I am working on this idea and put into place a series of defense thatare proved effective so far, but obviously not as practical and speedyas spamd is at the moment. It's a variable of scripts here and therebased on multiple aspect of the standard use for web access.

The principal is to minimize the impact on legitimate users of your siteand at the same time reject all the attacks to it that on differentlevel goes from DDoS to wanted you to waist as much bandwidth aspossible and to bring your site to a crawl.

Some of the ideas are not new and are based on spamd, just not all inplace yet.


1. For Crawlers and Bot

First is the proliferations of mom and pop pots and crawlers. Aftertesting difference setup, I realize to my surprise, yes call me stupid,that a handful are actually good citizen! The use and standard ofrobots.txt is well known and all good citizen robots should respectthat. Not a mean of protection for your site, but never the less theyshould respect that. So, what's inside it, if you forbid somedirectories, or files, they should respect that and any that do not,well I guess it's fine to kill them. Why should they be granted accessif they do respect my wishes as the owner and/or operator of the site(s).

1.1 First defense. No crawling on forbidden preset robots.txt withincremental deny access to them.

Many be not the best approach, but it is working as of all crawlers,this method in place catch 381 bad citizen crawlers in a week time. Theidea is very simple. I preset my robots.txt file to include a file, orin this case a directory that if not to be crawler and in the directoryI put a file that include a script that will block the source via PF andlog the entry in a SQL database as well as it will be share between allservers later on. I also put on the front page of the site a very simpleLINK to a 1 pixel image at the bottom of the page that is simply notvisible to the users and that is not click able as well. So a regularuser will never click and nor see it. But a crawler will follow alllinks obviously as the definition of a crawler. Now don't forget thatthe crawler is suppose to respect the robots.txt directives. So, thisURL is in the forbidden directory and many crawlers do respect that verywell. Live test proved this just to well. However, all the bad one, willnot and as such, the URL trigger a script that will log their IP and addthem to PF to block them right away! BYE BYE!

Now you may asked why I do incremental deny here. To be nice I guess,but also because some connections are from PROXY and not all proxy alsohave the header identifier as such. So, as such, you don't want to loosetraffic from legitimate users that are behind PROXY like AOL. This needa bit more work and so far the standard should help to make sure onlyproxy from the same remote users behind it would be block should allproxy respect the standard and add this part to their header as most do.You can call this the bypass of broken proxy for now. Should all proxybe right, then this could be permanent, may be. This also have the sidebenefit to stop some low life from stealing your content by trying toimport all your site content at once. Not the goal here, but it's a sidebenefit to it should you want that.


1.2 Forbidden bad bots.

For this, it's also simple and based on the idea above with thisvariation to it. In the same robots.txt file, I also include yet another directory and also a file as well in the directory too. But thereisn't any reference to it ANYWHERE on the site and that's the importantpoint here. The only way to know about it is to read the robots.txt fileand knowingly ignore the forbidden directive and go to crawl that file.Well should you do this, you are block for good. I catch a handful sofar, but obviously some are after things you don't want public. Everyoneknow that obviously. So, these are block for good after verification aredone to be sure it's not from a proxy. Again this goes to a SQL serverand is added to all other servers in the area as well protecting themall at the same time and again the use of PF for this.


2. DDoS on httpd

Well, this you can use PF already and it's well explain as well in theFAQ. So, I don't think there is any needs for this here. I assume PF isin use as it's one of the main component of the gethttpd.


3. DDoS GET attacks & Bandwidth suckers defense. Multiple approach.

3.1 Good users supply data check.

So far most/all of the variations of attacks on web sites are withscripts trying to inject itself to your servers. Well, you need to dosanity checks on your code. Nothing can really protect you for that ifyou don't check what you expect to receive from users input. So, I havenothing for that. No idea anyway on how to, other then may be limitingthe side of the argument a get can send, but even that is bad idea I think.


3.2 Gray listing idea via 302 temporary return code.

Many scripts wants you to waist as much bandwidth as possible, if theycan't inject itself into your servers, so they will in turn attack aspecific page or section of your site and try to make you waist plentyof bandwidth, or even SQL back end power as well.

One simple approach on this defense came to me from the idea of spamd.But to do this. You don't want the users to wait, or they will go elsewhere and you just lots them. So, the idea is again simple. Just returnthe users a code to tell them to come back. Simply with a 302 temporaryredirect code.

You might say this will affect my search engine, well not really. Thereisn't any impact as any search engine will not save temporary content onredirect and if they do, then they are wrong. But should you be concernwith this, then add as well in the header a do not cache or savedirective also defined in the standard.

So, what is happening is that GET attack and the like, if you look at afew different variation of them, do send GET message, some HEAD, etc.They impersonate a browser, an OS, etc. But NONE so far that I have seenwill also process the HTML code itself. This mean they send the GETrequest and will not process the content of what they get and not followthe 302 request.

So, what you have is obviously the connection establish to your webserver and you can't know if that's from a good or bad (fake) browser.However, this virus, or attack will not process the received temporaryredirect and come back to you on a new URL.

So, to process this quickly, you simply put this IP in your gray listingprocess, just like spamd would do, and send back a 302 redirect with anew URL, same if you want, with added useless things to it.

Now the users are coming back to you and you see the new requests comingin and process the changed from gray to white listing and followingrequests will be without any delay or processing from your servers. So,the impact is minimal n the first request only.

In effect you just added a very simple gray listing to your server andprotected it from bandwidth hugs! At the same time protected yourdatabase bank end as well as no request was sent to it as well for thatuse content, nor did you send any images, or object to the requestinguser and your header is very short as well, so you don't waistprocessing power, drive access and bandwidth and your server(s) arestill replying to users requests fully.


3.3 What about proxy

To address that we use the same idea of the redirect, but we also add acanary. Then you get the reply message to it and obviously check forthis canary and if you do get it, then that IP connection is now whitelisted. So, this canary principal combine with the aging process alsohave the benefit to work with proxy servers. But it's not perfect.

So, I also experimented with the OSFP, but obviously it doesn't see theremote users behind the proxy, or I didn't see it anyway as it is partof the TCP connection coming to you. Many users could be behind the sameproxy and I would always get the same OSFP signature.

To address this may be some changed there could be done, I really don'tknow. However, the HTML header of the request could definitely be checkas well to help gray/white list temporary connection from very busyproxy server to make the difference between valid redirect message andfake one.


3.4 What about the compromise user computer itself, or proxy server.

Here again, it's possible to do this. The header does provide thedifference needed to allow/deny connection from the same user computerthat are sent to you from the real user browsers and the virus/attackthat would be on the same computer should you want to allow this.

You would even have the possibility to provide feedback and alerts tothat user of the problem and advising them to clean their computer ifyou really wanted to go that far. The signature of the user browser, OS,etc, in the header and the fake header from the GET virus will simplynot match. Not until the virus get smarter as to find that informationfrom the user computer and then preset itself as such. So you can seethe difference between them here allowing real user connections from acompromise computer should you want it obviously. Should you, wellthat's an other question and add to the complexity of the daemon and iam not sure it should.


3.5 gray listing effect.

The idea of the 302 message is that oppose to spamd, where you slow downthe remote party, here you can't really do that as if you do, the userswill go else where as your site is not responsive and you don't want that.

The 302 message do just that and the remote users, doesn't really see itas a slow connection or delay. Even if they bookmark that changed URL,then it's not a problem as they will come to that page anyway, just thebookmarked canary of the time would be sent to it. But as you are notwaiting for that canary, the process of redirect would apply again, theredirect get started and all will be fine.

Then you can either keep that ip as allow source if your site is not toobusy, or if it is really busy, you need to flush the IP's from the PFtable in an aging process obviously or you will crash your servers, orrun out of memory.

As for the redirect, if it doesn't come back to you, then you reallyhave nothing to do as the virus or attack do not process these redirectsso far, so no connection will come back to you.


4. What about more intelligent attack.

It's possible that more intelligent attacks would be develop as to readthe incoming request and do the redirect. In witch case, most of theabove would be useless. So, what could be done then. Sign up sites,only. Who wants that, plus it's a deterrent for users. More elaborateredirect as having a redirect all the time, not sure that's any goodidea. Having an image on the page requested, all pages and wait untilthe request for that image comes in and then white list the request.This would require a lot more complicated design and I am not sure ofthe benefit of it. But it sure would increase the results. Cost/benefit,I am not sure however. More complex setup and software, mean more bugsand possibly works against what this is suppose to fight.

Anyway, theses are ideas I am putting forwards and for most of them, Ihave put them to work in live systems with success. I am curious as toif other ideas might add to this and help as well before I give a crackto my gethttpd daemon.

Thanks for your time if you read that far and any feedback good or badwould very much be welcome.


Regards,

Daniel

Feedback wanted on gethttpd graylisting ideas included

Reply via email to