[issue16932] urlparse fails at parsing "www.python.org:80/"

Georg Brandl Fri, 11 Jan 2013 09:54:25 -0800

Georg Brandl added the comment:

Hmm, you're right.  The behavior has been like this at least since Python 2.5:


Python 2.5.4 (r254:67916, Dec 16 2012, 20:33:12) 
[GCC 4.6.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> from urlparse import urlparse
>>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
('www.cwi.nl', '', '80/%7Eguido/Python.html', '', '', '')

The docs refer to RFC 1808.  From a quick glance at the BNF in section 2.2, RFC 
1808 allows dots in the scheme, but also allows ":" in the path.  So there 
seems to be a parsing ambiguity, but see section 2.4.2:

   If the parse string contains a colon ":" after the first character
   and before any characters not allowed as part of a scheme name (i.e.,
   any not an alphanumeric, plus "+", period ".", or hyphen "-"), the
   <scheme> of the URL is the substring of characters up to but not
   including the first colon.  These characters and the colon are then
   removed from the parse string before continuing.

That would indicate that the implementation is correct and the documentation 
should be fixed. Senthil?

----------
keywords: +buildbot -patch
status: closed -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16932>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16932] urlparse fails at parsing "www.python.org:80/"

Reply via email to