Hi,

I have a simple system running ok except when google's web crawlers start 
trying to find some old, long url's which used to be on the site. I have 
just migrated an old basically static website to web2py, to provide a base 
for some more interesting features in the future. However google knows the 
old url's and tries to crawl them, at which point the system dies by going 
into a tight loop.

It is quite repeatable on my development machine, now I know which url's 
trigger it. An example is:

http://localhost:63123/aaaaaaaaaa/Abbbbbbbb%20Lccc%20-%20Pddddddd%20GA%20Deeeeee%20(ffff%20ffff%20A).pdf

If I remove the two brackets in the final part of the url (the pdf file 
name) so the url becomes 

http://localhost:63123/aaaaaaaaaa/Abbbbbbbb%20Lccc%20-%20Pddddddd%20GA%20Deeeeee%20ffff%20ffff%20A.pdf

then I get "invalid function (default/aaaaaaaaaa)" as I would expect.

I know the brackets are invalid characters and should not be in the uri (or 
should be encoded), but the system should be robust against invalid 
characters being sent to the server.

I am running on web2py 2.2.1. I am wondering how to debug this further. If 
I turn on DEBUG logging for root, rewrite, web2py and rocket I get this 
output:

....
2012-11-13 13:05:57,934 - Rocket.Errors.ThreadPool - DEBUG - Examining 
ThreadPool. 10 threads and 0 Q'd conxions
2012-11-13 13:05:58,936 - Rocket.Errors.ThreadPool - DEBUG - Examining 
ThreadPool. 10 threads and 0 Q'd conxions
2012-11-13 13:05:59,600 - Rocket.Errors.Thread-3 - DEBUG - Received a 
connection.
2012-11-13 13:05:59,600 - Rocket.Errors.Thread-3 - DEBUG - Serving a request
2012-11-13 13:05:59,601 - Rocket.Errors.Thread-3 - DEBUG - Getting sock_file
select application=tgaa
route: controller=default
route: function.ext=aaaaaaaaaa.html

and now the system is in its tight loop with no more logging output.

routes.py contains just

logging = 'print'
routers = dict(
    BASE  = dict(
        default_application = 'tgaa',
        map_hyphen = 'True',
    ),
)

Thanks for any assistance.

-- 



Reply via email to