Gregory P. Smith <g...@krypto.org> added the comment:
I think there's still a flaw in the fixes implemented in 3.10 and 3.9 so far. We're closer, but probably not quite good enough yet. why? We aren't stripping the newlines+tab early enough. I think we need to do the stripping *right after* the _coerce_args(url, ...) call at the start of the function. Otherwise we (1) are storing url variants with the bad characters in _parse_cache [a mere slowdown in the worst case as it'd just overflow the cache sooner] (2) are splitting the scheme off the URL prior to stripping. in 3.9+ there is a check for valid scheme characters, which will defer to the default scheme when found. The WHATWG basic url parsing has these characters stripped before any parts are split off though, so 'ht\rtps' - for example - would wind up as 'https' rather than our behavior so far of deferring to the default scheme. I noticed this when reviewing the pending 3.8 PR as it made it more obvious due to the structure of the code and would've allowed characters through into query and fragment in some cases. https://github.com/python/cpython/pull/25726#pullrequestreview-649803605 ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43882> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com