chris derham wrote:
Let me just summarise my arguments then :
1) These scans are a burden for all webservers, not just for the vulnerable
ones.  Whether we want to or not, we currently all have to invest resources
into countering (or simply responding to) these scans.  Obviously, just
ignoring them doesn't stop them, and just protecting one's own servers
against them doesn't stop them in a general sense.
2) there is a fundamental asymmetry between how bots access a server (and
most of the responses that they get), and how "normal" clients access a
server : "normal" clients receive mostly non-404 responses, while bots - by
the very nature of what they are looking for - receive many 404 responses.
So anything that would in some way "penalise" 404 responses with respect to
other ones, should impact bots much more than normal clients
3) setting up a bot to perform such a scanning operation has a cost; if the
expected benefit does not cover the cost, it makes no sense to do it.
Assuming that botmasters are rational, they should stop doing it then. It is
debatable what proportion of servers would need to implement this proposal
in order for this kind of bot-scanning to become uneconomical in a general
sense.  What is certain is that, if none do and no better general scheme is
found, the scans will continue.  It is also fairly certain that if all
servers did, this particular type of scan would stop.
4) it is not obvious right now which method bots could use to circumvent
this in order to continue scanning HTTP servers for these known potentially
vulnerable URLs. I do not discount that these people are smart, and that
they could find a way.
But so far it would seem that any scheme thought of by people commenting on
this idea, have their own costs in some way and do not invalidate the basic
idea.
5) if the scheme works, and it does the effect of making this type of
server-scanning uneconomical, bot developers will look for other ways to
find vulnerable targets.
It is just not obvious to me where they would move their focus, HTTP-wise.
If their aim is to find vulnerable URLs on webservers, what else can they do
but try them ?
6) intuitively, it seems that implementing this would not be very
complicated, and that the foreseeable cost per server, in terms of
complexity and performance, would be quite low.  The burden imposed on
normal clients would also seem to be small.
Maybe this should be evaluated in terms of a comparison with any other
method that could provide some similar benefit at lower costs.
7) once implemented, it would be something which does not require any
special skills or and special effort on the part of the vast majority of
people that download and install tomcat.  Which means that it has a real
chance to automatically spread over time to a large proportion of servers.
This is quite unlike any other bot-fighting measure that I have seen
mentioned so far in this thread.
8) an obvious drawback to this scheme, is that if it works, it would take a
long time to show its effects, because
a) it would take a long time before a significant proportion of active
servers implement the scheme
b) even then, it would probably take an even longer time for the bots to
adapt their behaviour (the time for the current generation to die out)
So in politics, this would be a no-no, and I will probably never get a Nobel
prize for it either.  Damn. I would welcome any idea to spread this faster
and allow me to gain a just recognition for my insights however.

So a miscreant decides that they want to hack into a computer. Like
most things in computing, they break the task down into smaller more
manageable tasks. Step 1 to find targets. Easiest step would seem to
be to enumerate every ip4 address possible, and sent a tcp/ip packet
to some known ports. If you get a response, its a live IP address. You
don't need to map every port, just establish if the host is listening
to the internet. This will allow you to build up a list of live IP
addresses and feed into step 2

Step 2 fingerprint those IP addresses. To do this, use a scanning
tool. These send packets to ports of a given IP address, looking at
the responses. They don't just look for positive responses, they also
send badly formed/invalid packets. They use many techniques to do
this. My favorite is the xmas tree packet. The low level TCP protocol
defines several fields as control fields - the xmas tree packet flags
all control fields as true. The packet is completely invalid at a TCP
level, but different os'es will respond differently. The results of
all of these responses provide a fingerprint, which should provide a
identification of what os the server is running. Using similar
techniques it is generally possible to identify the software stack
running on each port. Sometime there will be 100% confidence in the
results, sometimes less. Sometimes the software can't tell what the
software stack on the server is. However the aim of the game is to
work out which os and which software is running on the port. The
miscreants are after the low hanging fruit anyway right? So they build
up a list of IP addresses with software running on each port, and feed
to step 3

Step 3 If all has gone well in steps 1 and 2, you now you a list of Ip
addresses with names and versions of os and the server side software
running, and in some cases patch level. Combine this with any of the
publicly available exploit databases, and you can cherry pick which of
the low hanging fruit you wish to attack using known exploits that
haven't been patched yet.

Step 4 is if you don't have any targets with known exploits, then you
have to start looking for holes manually. The value varies, but they
say that there is one exploitable defect per thousand lines of code.
With this in mind, and an os/server stack/app stack combining to
contain many millions of lines of code, there should be ample scope
for finding a hole. Most os'es and app servers are reviewed by
security experts, and have been battle hardened. Most apps have not.
Apps seem to be the common weak point, second only to users and weak,
reused passwords. The scanners are getting better and better each day.
Some are now capable of detecting SQL injection defects in forms, and
flagging a site as ripe for targeting.

So coming back to your proposal. My first point is that step 1 uses
TCP/IP connections, so the probing occurs lower down the stack. Hence
delaying 404 responses will not stop or affect them. Some of step 2
can occur just looking for normal headers, or by using malformed
packets against pages that exist, i.e. will not result in 404 packets.
My second point is that once they have finger printed your server,
they may be able to launch an exploit. For badly un-patched systems,
they may never even see a 404 in their logs, as the miscreants may
break in without triggering one. In short I believe that making
requests that result in 404 are not the things admins should be
worried about. There may be some script kiddies out there, probing all
web sites they find on google for tomcat manager apps. If that is
their skill level, then I would suggest that you shouldn't worry too
much about them.

If your approach was successfully implemented across all
patched/updated web servers, then the miscreants would still carry on
probing as their would still be many 1,000's/1,000,000s of servers out
there that are not patched, and hence not running the delay 404
software. I know that your argument is that over time, this would
reduce. However there are still millions of users out there running
Windows XP (20% according to
http://www.w3schools.com/browsers/browsers_os.asp). Whilst I know that
this shouldn't reflect the os used server side, my point is that for
~10 years, there will still be badly patched app servers out there not
running the delay 404 patch. So for the next 5 years (at conservative
estimate), it will still be worth searching out these old servers. I
know your argument is that after a percentage of web servers have the
404 delay software in place, scanners will slow. My points are a) the
scanners will fingerprint the newer releases and not scan them b) most
scans from real hackers will not result in 404s. There may be some,
but most of their probing will not return these responses.

I think that you have articulated your suggestion very well. I think
you have weighed the pros well and been open to debate. Personally I
just don't think what you propose will have the effect that you
desire. However since I seem to be the only voice of dissent, I will
stop now. I would like to hear some other list members to voice in
with their thoughts - PID it is not like you to be shy of coming
forward. What are your thoughts?

Personally end-user/developer/administrator education would seem a
prudent avenue to reduce the problems on the modern internet.


Thank you for your thoughtful responses and comments.
The above should be required reading for would-be botmasters.

I feel that I have to add a couple of comments still.

I am totally aware of the fact that bots nowadays are sophisticated beasts, and that they are using a lot of ways to spread and infect new hosts, sometimes astonishingly broadly and quickly. And I know that finding and breaking into webservers that have "vulnerable" URLs is only one tiny facet of how they operate, and probably by far not the main one (which seems to remain email attachments opened carelessly by users themselves).

If I led anyone to think that I thought that implementing a delay in webservers 404 responses would kill bots in general, I apologise for such misrepresentation. The proposed 404 delay is meant only to discourage bots from scanning for vulnerable URLs in the way in which they (or some of them) seem to be doing it currently. The origin of my proposal is in fact my personal annoyance at seeing these scans appear relentlessly since years now in the logs of all those of my servers which are internet-facing. I had been thinking for a long time (and probably not alone) of some way to get rid of them, which would not require additional resources to be spent by my own infrastructure, nor require a major investment in terms of configuration and setup, nor make life more unpleasant for legitimate users, and thus be potentially usable on a majority of webservers on the Internet. Like, squaring the circle. Then I hit on this idea, which seems to have at least the potential to annoy bots in this particular case.

Bots are sophisticated and multi-faceted. So it is futile to look for one definitive weapon to get rid of them, and the approach has to be also multi-faceted. This could be one of these facets, no more and no less. Following your comments above, I have done some additional back-of-the-envelope calculations, which seem to show that even applying your bot-efficiency principles above, adding a 404 delay of 1 s on less than 20% of Internet webservers, would already slow down this particular line of enquiry/attack by this particular kind of bot by at least 50% (details on request). Independently of everything else, I believe that this is already something worth having, and unless someone can prove that the approach is wrong-headed, I will thus continue to advocate it.

But honestly, I am also a bit at a loss now as to how to continue. There is of course no way for me to prove the validity of the scheme by installing it on 31 million (20%) of webservers on the Internet and looking at the resulting bot activity patterns to confirm my suspicions.

The Wikipedia article on botnets mentions the following :
"Researchers at Sandia National Laboratories are analyzing botnets behavior by simultaneously running one million Linux kernels as virtual machines on a 4,480-node high-performance computer cluster.[12]"
Maybe I should get in touch with them ?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to