Bugs item #1457264, was opened at 2006-03-23 20:49 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1457264&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Steve (onlynone) Assigned to: Nobody/Anonymous (nobody) Summary: urllib.splithost parses incorrectly Initial Comment: urllib.splithost(url) requires that the url passed in be of the form '//host[:port]/path'. Yet I've run across some urls that are of the form '//host[:port]?querystring'. This causes splithost to return everything as the host and nothing as the path. Section 3.2 of rfc2396 (Uniform Resource Identifiers: Generic Syntax) states that 'The authority component is preceded by a double slash "//" and is terminated by the next slash "/", question-mark "?", or by the end of the URI.' Also, this is how it defines a URI: absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments Based on the above, you could certainly have: 'http://authority?query' as a valid url. In python2.3 you would just need to change line 939 in urllib.py from: _hostprog = re.compile('^//([^/]*)(.*)$') to: _hostprog = re.compile('^//([^/?]*)(.*)$') This appears to affect all python versions, I just happened to be using 2.3. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1457264&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com