Demian Brecht added the comment:

I think some further consideration around this change is worthwhile:

Currently, urllib.parse.split* methods seem to do inconsistent validation 
around the data they're splitting. For example:

(None for an invalid port)
>>> parse.splitnport('example.com:foo')
('example.com', None)

Whereas other split* methods do not:

(Auth part should be URL-encoded)
>>> parse.splituser('u@ser:p@ssw...@example.com:80')
('u@ser:p@ssword', 'example.com:80')

And others are just plain incorrect:

(:bad should be the port as defined by the ABNF 'authority = [ userinfo "@" ] 
host [ ":" port ]')
>>> parse.splitport('example.com:bad')
('example.com:bad', None)

However, none of these cases (currently) raise exceptions.

Looking at urllib.parse, two large motivations behind it are splitting and 
parsing. In my mind, splitting should solely be responsible for splitting input 
based on RFC-defined delimiters. Parsing on the other hand, should be 
responsible for both splitting as necessary as well as input validation. It may 
also make sense to add simple validation functions to the module to comply with 
the "batteries included" philosophy, but that's a topic for another issue.

My concern with the proposed patch is that it adds further inconsistency to 
split* methods:

Before patch:

>>> parse.urlsplit('http://[::1]spam:80')
SplitResult(scheme='http', netloc='[::1]spam:80', path='', query='', 
fragment='')

After patch:

>>> parse.urlsplit('http://[::1]spam:80')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/src/p/cpython/Lib/urllib/parse.py", line 350, in urlsplit
    netloc, url = _splitnetloc(url, 2)
  File "/Volumes/src/p/cpython/Lib/urllib/parse.py", line 324, in _splitnetloc
    raise ValueError('Invalid IPv6 URL')

(While the above examples still yield the same results and don't raise 
exceptions)

I do think that the validation should be done and I agree that an exception 
should be raised, but it should be done at the urlparse level, not on split (in 
this case, due to the change to _splitnecloc).

----------
nosy: +demian.brecht

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to