[issue27657] urlparse fails if the path is numeric

Martin Panter Sat, 30 Jul 2016 19:37:32 -0700

Martin Panter added the comment:

The main backward compatibility consideration would be Issue 754016, but don’t 
agree with the changes made, and would support reverting them. The original bug 
reporter wanted urlparse("1.2.3.4:80", "http") to be treated as the URL 
http://1.2.3.4:80, but the IP address was being parsed as a scheme, so the 
default “http” scheme was ignored.


The original fix (r83701) affected any URL that had a digit 0–9 immediately 
after the “scheme:” prefix. In such URLs, the scheme component was no longer 
parsed. A test case for “path:80” was added, and a demonstration of not parsing 
any scheme from www.cwi.nl:80/%7Eguido/Python.html was added in the 
documentation.

Later, the logic was altered to test if the URL looked like an integer 
(revision 495d12196487, Issue 11467). This restored proper parsing of 
clsid:85bbd92o-42a0-1o69-a2e4-08002b30309d and mailto:[email protected], 
although another URL given, javascript:123, remains misparsed. The 
documentation was subsequently adjusted in Issue 16932 to just demonstrate 
www.cwi.nl/%7Eguido/Python.html being parsed as a path.

The logic was watered down to its current form by revision 9f6b7576c08c, Issue 
14072. Now it tests for a non-digit anywhere after the scheme, so that 
tel:+31641044153 is again parsed properly. But it was pointed out that tel:1234 
remains misparsed.

What’s the next step in the watering-down process? All the attempts so far 
break valid URLs in favour of special-casing inputs that are not valid URLs.

----------
nosy: +martin.panter, orsenthil
versions: +Python 2.7, Python 3.5, Python 3.6

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27657>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27657] urlparse fails if the path is numeric

Reply via email to