Most web crawlers won't check a 404, because of the way servers send Http responses.
When a crawler requests a page that is missing, it first receives the header response from the request, and it can read the response code, content-type, and other information. The web crawler can then stop the download of the content after it has checked the response code, reducing the bandwidth placed on the server, and reducing time the web crawler is spending on missing content. If a redirect response is sent, then the crawler must make another request to the server and will download the entire content of a page that does not reflect the source url. The web crawler will see a 200 response code on the new URI, download all the content, and increase the time and bandwidth spent crawling that domain. But I understand what your saying Brendon about it being a design choice. I'm just not sure traversing the URL path improves the visitors usability of the website their visiting. Once they step up to an invalid URI they will be redirected somewhere else, which would stop the traversal of the URL. Here's CNN as an example. http://edition.cnn.com/2008/POLITICS/11/06/middle.east.peace.deal/index.html http://edition.cnn.com/2008/POLITICS/11/06/middle.east.peace.deal http://edition.cnn.com/2008/POLITICS/11/06 http://edition.cnn.com/2008/POLITICS/11 http://edition.cnn.com/2008/POLITICS http://edition.cnn.com/2008 While these links will produce a 404 response and display Html. A web crawler will not download the content after it has rejected the response code in the header of the Http response. So the most bandwidth load placed on the server is a few bytes per bad URI. This makes your domain crawler friendly, but a friendly crawler would not request phantom URIs. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "CakePHP" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~----------~----~----~----~------~----~------~--~---
