Most web crawlers won't check a 404, because of the way servers send
Http responses.

When a crawler requests a page that is missing, it first receives the
header response from the request, and it can read the response code,
content-type, and other information. The web crawler can then stop the
download of the content after it has checked the response code,
reducing the bandwidth placed on the server, and reducing time the web
crawler is spending on missing content. If a redirect response is
sent, then the crawler must make another request to the server and
will download the entire content of a page that does not reflect the
source url. The web crawler will see a 200 response code on the new
URI, download all the content, and increase the time and bandwidth
spent crawling that domain.

But I understand what your saying Brendon about it being a design
choice. I'm just not sure traversing the URL path improves the
visitors usability of the website their visiting. Once they step up to
an invalid URI they will be redirected somewhere else, which would
stop the traversal of the URL.

Here's CNN as an example.

http://edition.cnn.com/2008/POLITICS/11/06/middle.east.peace.deal/index.html
http://edition.cnn.com/2008/POLITICS/11/06/middle.east.peace.deal
http://edition.cnn.com/2008/POLITICS/11/06
http://edition.cnn.com/2008/POLITICS/11
http://edition.cnn.com/2008/POLITICS
http://edition.cnn.com/2008

While these links will produce a 404 response and display Html. A web
crawler will not download the content after it has rejected the
response code in the header of the Http response. So the most
bandwidth load placed on the server is a few bytes per bad URI.

This makes your domain crawler friendly, but a friendly crawler would
not request phantom URIs.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to