[ python-Bugs-1722348 ] urlparse.urlunparse forms file urls incorrectly

SourceForge.net Wed, 23 May 2007 14:27:42 -0700

Bugs item #1722348, was opened at 2007-05-21 04:05
Message generated for change (Comment added) made by orsenthil
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1722348&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Folz-Donahue (eigenlambda)
Assigned to: Nobody/Anonymous (nobody)
Summary: urlparse.urlunparse forms file urls incorrectly

Initial Comment:
This is a conversation with the current Python interpreter.

>>> import urlparse
>>> urlparse.urlparse(urlparse.urlunparse(urlparse.urlparse("file:////usr/bin/python")))
('file', 'usr', '/bin/python', '', '', '')

As you can see, the results are incorrect.  The problem is in the urlunsplit 
function:

def urlunsplit((scheme, netloc, url, query, fragment)):
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/': url = '/' + url
        url = '//' + (netloc or '') + url
    if scheme:
        url = scheme + ':' + url
    if query:
        url = url + '?' + query
    if fragment:
        url = url + '#' + fragment
    return url

RFC 1808 (see http://www.ietf.org/rfc/rfc1808.txt ) specifies that a URL shall 
have the following syntax:
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>

The problem with the current version of urlunsplit is that it tests if there 
are already two slashes before the 'url' section before outputting a URL.  This 
is incorrect because (1) RFC 1808 clearly specifies at least three slashes 
between the end of the scheme portion and the beginning of the path portion and 
(2) this method will strip the first few slashes from an arbitrary path 
portion, which may require those slashes.  Removing that url[:2] != '//' causes 
urlunsplit to behave correctly when dealing with urls like 
file:////usr/bin/python .


----------------------------------------------------------------------

Comment By: O.R.Senthil Kumaran (orsenthil)
Date: 2007-05-24 02:57

Message:
Logged In: YES 
user_id=942711
Originator: NO

Hi Thomas,
Verified the Bug with Python 2.5 and verified the fix as well. Works
fine.

>>> urlparse(urlunparse(urlparse('file:////home/ors')))
('file', '', '//home/ors', '', '', '')
>>> urlparse(urlunparse(urlparse('file://///home/ors')))
('file', '', '///home/ors', '', '', '')
>>> urlparse(urlunparse(urlparse('file://////home/ors')))
('file', '', '////home/ors', '', '', '')
>>>
urlparse(urlunparse(urlparse(urlunparse(urlparse('file://////home/ors')))))
('file', '', '////home/ors', '', '', '')
>>>



----------------------------------------------------------------------

Comment By: Thomas Folz-Donahue (eigenlambda)
Date: 2007-05-21 04:42

Message:
Logged In: YES 
user_id=1797315
Originator: YES

Some other issues with the urlparse module.  Several constant lists
defined at the beginning of the module should be sets because they are only
used for testing if certain strings are in them.  Also, urlunsplit() uses
the + operator way too much, creating strings that are immediately thrown
away.  IMO, the alternative is actually more readable.  Attaching a patch
(diff -u urlparse.py urlparse.py.new > urlparse.diff).
File Added: urlparse.diff

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1722348&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-1722348 ] urlparse.urlunparse forms file urls incorrectly

Reply via email to