New submission from Samani Gikandi :
RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however
urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as
valid schemes.
Example from Python3.8 using "+git+ssh://g...@github.com/user/project.git":
>>> from urllib.parse import urlsplit, urlparse
>>> urlparse("+git+ssh://g...@github.com/user/project.git")
ParseResult(scheme='+git+ssh', netloc='g...@github.com',
path='/user/project.git', params='', query='', fragment='')
>>> urlsplit("+git+ssh://g...@github.com/user/project.git")
SplitResult(scheme='+git+ssh', netloc='g...@github.com',
path='/user/project.git', query='', fragment='')
I double checked this behavior and number of other languages (Rust, Go,
Javascript, Ruby) all complain if you try to use parse this URL
For reference, RFC3986 section 3.1 --
Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").
[...]
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
--
components: Library (Lib)
messages: 367452
nosy: sgg
priority: normal
severity: normal
status: open
title: urllib.parse.urlsplit parses schemes that do not begin with letters
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9
___
Python tracker
<https://bugs.python.org/issue40409>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com