On 21-08-2010 14:46, mdipierro wrote:

> what do you find that is strange?
This is the result with the last letter removed, so all links should give an 
error,
but they differ with the 2 methods,
and some of them produce 200, while they are definitely wrong
404 500 http://127.0.0.1:8000/welcome/default/user/logi
404 500 http://127.0.0.1:8000/welcome/default/user/registe
404 500 http://127.0.0.1:8000/welcome/default/user/request_reset_passwor
200 500 http://127.0.0.1:8000/welcome/default
400 500 http://127.0.0.1:8000/welcome/default/inde
200 500 http://127.0.0.1:8000/admin/default/design/welcom
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/controllers/default.p
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/views/default/index.htm
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/views/layout.htm
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/static/base.cs
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/models/db.p
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/models/menu.p
400 500 http://127.0.0.1:8000/welcome/appadmin/inde
200 500 http://127.0.0.1:8000/admin/default/inde
400 400 http://127.0.0.1:8000/examples/default/inde
200 -1 http://web2py.co
400 400 http://web2py.com/boo
400 500 http://127.0.0.1:8000/welcome/default/inde
200 500 http://127.0.0.1:8000/welcome/default
200 500 http://127.0.0.1:8000/admin/default/peek/welcome/controllers/default.p
200 500 http://127.0.0.1:8000/admin/default/peek/welcome/views/default/index.htm
200 -1 http://www.web2py.co

This is the normal result
200 500 http://127.0.0.1:8000/welcome/default/user/login
200 500 http://127.0.0.1:8000/welcome/default/user/register
200 500 http://127.0.0.1:8000/welcome/default/user/request_reset_password
200 500 http://127.0.0.1:8000/welcome/default
200 500 http://127.0.0.1:8000/welcome/default/index
200 500 http://127.0.0.1:8000/admin/default/design/welcome
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/controllers/default.py
200 500 
http://127.0.0.1:8000/admin/default/edit/welcome/views/default/index.html
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/views/layout.html
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/static/base.css
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/models/db.py
200 500 http://127.0.0.1:8000/admin/default/edit/welcome/models/menu.py
200 500 http://127.0.0.1:8000/welcome/appadmin/index
200 500 http://127.0.0.1:8000/admin/default/index
200 200 http://127.0.0.1:8000/examples/default/index
200 200 http://web2py.com
200 500 http://web2py.com/book
200 500 http://127.0.0.1:8000/welcome/default/index
400 500 http://127.0.0.1:8000/welcome/default/index#
200 500 http://127.0.0.1:8000/admin/default/peek/welcome/controllers/default.py
200 500 
http://127.0.0.1:8000/admin/default/peek/welcome/views/default/index.html
200 200 http://www.web2py.com

So when is a URL valid ?

thanks,
Stef
> On Aug 21, 7:32 am, Stef Mientki <stef.mien...@gmail.com> wrote:
>>> Graphical representation of links or pages that don't get linked to.
>> I tried to test the links (with 2 algorithms, code below) in a generated 
>> webpage, but the result I
>> get are very weird.
>> Probably one you knows a better way ?
>>
>> cheers,
>> Stef
>>
>> from BeautifulSoup import BeautifulSoup
>> from urllib        import urlopen
>> from httplib       import HTTP
>> from urlparse      import urlparse
>>
>> def Check_URL_1 ( URL ) :
>>   try:
>>     fh = urlopen ( URL )
>>     return fh.code == 200
>>   except :
>>     return False
>>
>> def Check_URL_2 ( URL ) :
>>   p = urlparse ( URL )
>>   h = HTTP ( p[1] )
>>   h.putrequest ( 'HEAD', p[2] )
>>   h.endheaders()
>>   if h.getreply()[0] == 200:
>>     return True
>>   else:
>>     return False
>>
>> def Verify_Links ( URL ) :
>>   Parts   = URL.split('/')
>>   Site    = '/'.join ( Parts [:3] )
>>   Current = '/'.join ( Parts [:-1] )
>>
>>   fh = urlopen ( URL )
>>   lines = fh.read ()
>>   fh.close()
>>
>>   Soup = BeautifulSoup ( lines )
>>   hrefs = lines = Soup.findAll ( 'a' )
>>
>>   for href in hrefs :
>>     href = href [ 'href' ] #[:-1]     ## <== remove "#" to generate all 
>> errors
>>
>>     if href.startswith ( '/' ) :
>>       href = Site + href
>>     elif href.startswith ('#' ) :
>>       href = URL + href
>>     elif href.startswith ( 'http' ) :
>>       pass
>>     else :
>>       href = Current + href
>>
>>     try:
>>       fh = urllib.urlopen ( href )
>>     except :
>>       pass
>>     print Check_URL_1 ( href ), Check_URL_2 ( href ), href
>>
>> URL = 'http://127.0.0.1:8000/welcome/default/index'
>> fh = Verify_Links ( URL )

Reply via email to