Re: Problem with Python's "robots.txt" file parser in module robotparser

John Nagle Thu, 12 Jul 2007 21:06:15 -0700

Nikita the Spider wrote:
> In article <[EMAIL PROTECTED]>,
>  John Nagle <[EMAIL PROTECTED]> wrote:
> 
> 
>>Nikita the Spider wrote:
>>
>>
>>>Hi John,
>>>Are you sure you're not confusing your sites? The robots.txt file at 
>>>www.ibm.com contains the double slashed path. The robots.txt file at 
>>>ibm.com  is different and contains this which would explain why you 
>>>think all URLs are denied:
>>>User-agent: *
>>>Disallow: /
>>>
>>
>>    Ah, that's it.  The problem is that "ibm.com" redirects to
>>"http://www.ibm.com";, but but "ibm.com/robots.txt" does not
>>redirect.  For comparison, try "microsoft.com/robots.txt",
>>which does redirect.
> 
> 
> Strange thing for them to do, isn't it? Especially with two such 
> different robots.txt files.


   I asked over at Webmaster World, and over there, they recommend against
using redirects on robots.txt files, because they questioned whether all of
the major search engines understand that.  Does a redirect for 
"foo.com/robots.txt" mean that the robots.txt file applies to the domain
being redirected from, or the domain being redirected to?

                                        John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Problem with Python's "robots.txt" file parser in module robotparser

Reply via email to