[Python-Dev] 2 modifications in robotparser.py

2008-10-12 Thread Taskinoor Hasan
im a novice python programmer. i have made two changes to robotparser.py. i
apologize if this is the wrong list to post this mail.

1. some sites /* specially wikipedia */ returns 403 when default User-Agent
is used. so i have changed the code to use urllib2 and added a
set_user_agent method. this is simple.

2. this problem is slight complicated. please check the robots.txt file from
mathworld.

   http://mathworld.wolfram.com/robots.txt

it contains 2 User-Agent: * lines.

from http://www.robotstxt.org/norobots-rfc.txt

These name tokens are used in User-agent lines in /robots.txt to
identify to which specific robots the record applies. The robot
must obey the first record in /robots.txt that contains a User-
Agent line whose value contains the name token of the robot as a
substring. The name comparisons are case-insensitive. If no such
record exists, it should obey the first record with a User-agent
line with a "*" value, if present. If no record satisfied either
condition, or no records are present at all, access is unlimited.


but it seems that our robotparser is obeying the 2nd one. the problem
occures because robotparser assumes that no robots.txt will contain two *
user-agent. it should not have two two such line, but in reality many site
may have two.

so i have changed the code as follow:

def _add_entry(self, entry):
if "*" in entry.useragents:
# the default entry is considered last
if self.default_entry == None:
   self.default_entry = entry
else:
self.entries.append(entry)

and at the end of parse(self, lines) method

if state==2:
#self.entries.append(entry)
self._add_entry(entry)

red marked lines are added by me.

as im a very novice python programmer, i really want some experts comment
about this matter.

i apologize again if im wasting ur times.

thanks in advance
Taskinoor Hasan Sajid


customrobotparser.py
Description: Binary data
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2 modifications in robotparser.py

2008-10-12 Thread skip

Taskinoor> im a novice python programmer. i have made two changes to
Taskinoor> robotparser.py. i apologize if this is the wrong list to post
Taskinoor> this mail.

Thanks for the message.  Can you please file a bug report at
http://bugs.python.org though?

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2 modifications in robotparser.py

2008-10-12 Thread Taskinoor Hasan
On Sun, Oct 12, 2008 at 7:26 PM, <[EMAIL PROTECTED]> wrote:

>
>Taskinoor> im a novice python programmer. i have made two changes to
>Taskinoor> robotparser.py. i apologize if this is the wrong list to post
>Taskinoor> this mail.
>
> Thanks for the message.  Can you please file a bug report at
> http://bugs.python.org though?


thanks. issue created. here is the link : http://bugs.python.org/issue4108
i have already fixed this problem but not 100% sure about the status.


>
>
> Skip
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] for __future__ import planning

2008-10-12 Thread Lennart Regebro
On Sat, Oct 4, 2008 at 00:56, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> I think 2.7 should continue along the path of convergence toward 3.x.  The
> vision some of us talked about at Pycon was that at some point down the
> line, maybe there's no difference between "python2.9 -3" and "python3.3 -2".

I like that. Do we know what the next "hurdle" would be? The testing I
have done seems to indicate that one major area is handling of binary
file data that may or may not be binary. Is a real bytes type (and not
just an alternate spelling for str) a possibility? It may be that this
isn't a problem in practice, I don't know yet. :)

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r66863 - python/trunk/Modules/posixmodule.c

2008-10-12 Thread Martin v. Löwis
> (2.5.3 reminder: there are lists of commits in sandbox/py2.5.3 to be
> considered.  I've seen no reactions on python-dev or modifications to
> those files, so I don't think anyone else is looking at them.  Is
> everyone waiting for the weekend, maybe?)

I may have said that before: I don't have any plans to look through
change lists myself. If people want certain changes considered, they
will tell us; if nobody is interested in a certain backport, there is
no need to backport.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com