Mike Lissner <mliss...@michaeljaylissner.com> added the comment:

> Instead of the patches as you see them, we could've raised an exception.

In my mind the definition of a valid URL is what browsers recognize. They're 
moving towards the WHATWG definition, and so too must we. 

If we make python raise an exception when a URL has a newline in the scheme 
(e..g: "htt\np"), we'd be raising exceptions for *valid* URLs as browsers 
define them. That doesn't seem right at all to me. I'd be frustrated to have to 
catch such an exception, and I'd wonder how to pass through valid exceptions 
without urlparse raising something.


> Making the output 'sanitized' means that invalid input is converted into 
> valid output.  This goes against the principle of least surprise.

Well, not quite, right? The URLs this fixes *are* valid according to browsers. 
Browsers say these tabs and newlines are OK. 

----

I agree though that there's an issue with the approach of stripping input in a 
way that affects output. That doesn't seem right. 

I think the solution I'd favor (and I imagine what's coming in 43883) is to do 
this properly so that newlines are preserved in the output, but so that the 
scheme is also placed properly in the scheme attribute. 

So instead of this (from the initial report):

> In [9]: from urllib.parse import urlsplit
> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='', netloc='', path="java\nscript:alert('bad')", 
> query='', fragment='')

We get something like this:

> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='java\nscript', netloc='', path="alert('bad')", 
> query='', fragment='')

In other words, keep the funky characters and parse properly.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to