New submission from Samani Gikandi <sam...@gojulas.com>:
RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as valid schemes. Example from Python3.8 using "+git+ssh://g...@github.com/user/project.git": >>> from urllib.parse import urlsplit, urlparse >>> urlparse("+git+ssh://g...@github.com/user/project.git") ParseResult(scheme='+git+ssh', netloc='g...@github.com', path='/user/project.git', params='', query='', fragment='') >>> urlsplit("+git+ssh://g...@github.com/user/project.git") SplitResult(scheme='+git+ssh', netloc='g...@github.com', path='/user/project.git', query='', fragment='') I double checked this behavior and number of other languages (Rust, Go, Javascript, Ruby) all complain if you try to use parse this URL For reference, RFC3986 section 3.1 -- Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-"). [...] scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) ---------- components: Library (Lib) messages: 367452 nosy: sgg priority: normal severity: normal status: open title: urllib.parse.urlsplit parses schemes that do not begin with letters type: behavior versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40409> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com