[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Gregory P. Smith Sat, 01 May 2021 10:26:34 -0700


Gregory P. Smith <g...@krypto.org> added the comment:


I think there's still a flaw in the fixes implemented in 3.10 and 3.9 so far.  
We're closer, but probably not quite good enough yet.

why?  We aren't stripping the newlines+tab early enough.

I think we need to do the stripping *right after* the _coerce_args(url, ...) 
call at the start of the function.

Otherwise we
  (1) are storing url variants with the bad characters in _parse_cache [a mere 
slowdown in the worst case as it'd just overflow the cache sooner]
  (2) are splitting the scheme off the URL prior to stripping.  in 3.9+ there 
is a check for valid scheme characters, which will defer to the default scheme 
when found.  The WHATWG basic url parsing has these characters stripped before 
any parts are split off though, so 'ht\rtps' - for example - would wind up as 
'https' rather than our behavior so far of deferring to the default scheme.

I noticed this when reviewing the pending 3.8 PR as it made it more obvious due 
to the structure of the code and would've allowed characters through into query 
and fragment in some cases.  
https://github.com/python/cpython/pull/25726#pullrequestreview-649803605

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Reply via email to