New submission from Samani Gikandi <sam...@gojulas.com>:

RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however 
urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as 
valid schemes.

Example from Python3.8 using "+git+ssh://g...@github.com/user/project.git":

>>> from urllib.parse import urlsplit, urlparse
>>> urlparse("+git+ssh://g...@github.com/user/project.git")
ParseResult(scheme='+git+ssh', netloc='g...@github.com', 
path='/user/project.git', params='', query='', fragment='')
>>> urlsplit("+git+ssh://g...@github.com/user/project.git")
SplitResult(scheme='+git+ssh', netloc='g...@github.com', 
path='/user/project.git', query='', fragment='')

I double checked this behavior and number of other languages (Rust, Go, 
Javascript, Ruby) all complain if you try to use parse this URL

For reference, RFC3986 section 3.1 --

Scheme names consist of a sequence of characters beginning with a
   letter and followed by any combination of letters, digits, plus
   ("+"), period ("."), or hyphen ("-"). 

   [...]

   scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

----------
components: Library (Lib)
messages: 367452
nosy: sgg
priority: normal
severity: normal
status: open
title: urllib.parse.urlsplit parses schemes that do not begin with letters
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40409>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to