Bugs item #754016, was opened at 2003-06-13 15:15 Message generated for change (Settings changed) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=754016&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Krüger (kruegi) >Assigned to: Georg Brandl (gbrandl) Summary: urlparse goes wrong with IP:port without scheme Initial Comment: urlparse doesnt work if IP and port are given without scheme: >>> urlparse.urlparse('1.2.3.4:80','http') ('1.2.3.4', '', '80', '', '', '') should be: >>> urlparse.urlparse('1.2.3.4:80','http') ('http', '1.2.3.4', '80', '', '', '') ---------------------------------------------------------------------- Comment By: Georg Brandl (birkenfeld) Date: 2005-08-25 13:22 Message: Logged In: YES user_id=1188172 Will look into it. Should be easy to fix. ---------------------------------------------------------------------- Comment By: Facundo Batista (facundobatista) Date: 2004-12-26 21:55 Message: Logged In: YES user_id=752496 The problem is still present in Py2.3.4. IMO, it should support dirs without the "http://" or raise an error if it's not present (never fail silently!). ---------------------------------------------------------------------- Comment By: Shannon Jones (sjones) Date: 2003-06-14 04:18 Message: Logged In: YES user_id=589306 Ok, I researched this a bit, and the situation isn't as simple as it first appears. The RFC that urlparse tries to follow is at http://www.faqs.org/rfcs/rfc1808.html and notice specifically sections 2.1 and 2.2. It seems to me that the source code follows rfc1808 religiously, and in that sense it does the correct thing. According to the RFC, the netloc should begin with a '//', and since your example didn't include one then it technical was an invalid URL. Here is another example where it seems Python fails to do the right thing: >>> urlparse.urlparse('python.org') ('', '', 'python.org', '', '', '') >>> urlparse.urlparse('python.org', 'http') ('http', '', 'python.org', '', '', '') Note that it is putting 'python.org' as the path and not the netloc. So the problem isn't limited to just when you use a scheme parameter and/or a port number. Now if we put '//' at the beginning, we get: >>> urlparse.urlparse('//python.org') ('', 'python.org', '', '', '', '') >>> urlparse.urlparse('//python.org', 'http') ('http', 'python.org', '', '', '', '') So here it does the correct thing. There are two problems though. First, it is common for browsers and other software to just take a URL without a scheme and '://' and assume it is http for the user. While the URL is technically not correct, it is still common usage. Also, urlparse does take a scheme parameter. It seems as though this parameter should be used with a scheme-less URL to give it a default one like web browsers do. So somebody needs to make a decision. Should urlparse follow the RFC religiously and require '//' in front of netlocs? If so, I think the documentation should give an example showing this and showing how to use the 'scheme' parameter. Or should urlparse use the more commonly used form of a URL where '//' is omitted when the scheme is omitted? If so, urlparse.py will need to be changed. Or maybe another fuction should be added to cover whichever behaviour urlparse doesn't cover. In any case, you can temporarily solve your problem by making sure that URL's without a scheme have '//' at the front. So your example becomes: >>> urlparse.urlparse('//1.2.3.4:80', 'http') ('http', '1.2.3.4:80', '', '', '', '') ---------------------------------------------------------------------- Comment By: Shannon Jones (sjones) Date: 2003-06-14 02:39 Message: Logged In: YES user_id=589306 Sorry, previous comment got cut off... urlparse.urlparse takes a url of the format: <scheme>://<netloc>/<path>;<params>?<query>#<fragment> And returns a 6-tuple of the format: (scheme, netloc, path, params, query, fragment). An example from the library refrence takes: urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') And produces: ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') -------------------------------- Note that there isn't a field for the port number in the 6-tuple. Instead, it is included in the netloc. Urlparse should handle your example as: >>> urlparse.urlparse('1.2.3.4:80','http') ('http', '1.2.3.4:80', '', '', '', '') Instead, it gives the incorrect output as you indicated. ---------------------------------------------------------------------- Comment By: Shannon Jones (sjones) Date: 2003-06-14 02:26 Message: Logged In: YES user_id=589306 urlparse.urlparse takes a url of the format: <scheme>://<netloc>/<path>;<params>?<query>#<fragment> And returns a 6-tuple of the format: (scheme, netloc, path, params, query, fragment). An example from the library refrence takes: ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=754016&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com