New submission from benrg <benrud...@gmail.com>:
If `HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer` contains the string `http=host:123;https=host:456;ftp=host:789`, then getproxies_registry() should return {'http': 'http://host:123', 'https': 'http://host:456', 'ftp': 'http://host:789'} for consistency with WinInet and Chromium, but it actually returns {'http': 'http://host:123', 'https': 'https://host:456', 'ftp': 'ftp://host:789'} This bug has existed for a very long time (since Python 2.0.1 if not earlier), but it was exposed recently when urllib3 added support for HTTPS-in-HTTPS proxies in version 1.26. Before that, an `https` prefix on the HTTPS proxy url was silently treated as `http`, accidentally resulting in the correct behavior. There are additional bugs in the treatment of single-proxy strings (the case when the string contains no `=` character). The Chromium code for parsing the ProxyServer string can be found here: https://source.chromium.org/chromium/chromium/src/+/refs/tags/89.0.4353.1:net/proxy_resolution/proxy_config.cc;l=86 Below is my attempt at modifying the code from `getproxies_registry` to approximately match Chromium's behavior. I could turn this into a patch, but I'd like feedback on the corner cases first. if '=' not in proxyServer and ';' not in proxyServer: # Use one setting for all protocols. # Chromium treats this as a separate category, and some software # uses the ALL_PROXY environment variable for a similar purpose, # so arguably this should be 'all={}'.format(proxyServer), # but this is more backward compatible. proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer) for p in proxyServer.split(';'): # Chromium and WinInet are inconsistent in their treatment of # invalid strings with the wrong number of = characters. It # probably doesn't matter. protocol, addresses = p.split('=', 1) protocol = protocol.strip() # Chromium supports more than one proxy per protocol. I don't # know how many clients support the same, but handling it is at # least no worse than leaving the commas uninterpreted. for address in addresses.split(','): if protocol in {'http', 'https', 'ftp', 'socks'}: # See if address has a type:// prefix if not re.match('(?:[^/:]+)://', address): if protocol == 'socks': # Chromium notes that the correct protocol here # is SOCKS4, but "socks://" is interpreted # as SOCKS5 elsewhere. I don't know whether # prepending socks4:// here would break code. address = 'socks://' + address else: address = 'http://' + address # A string like 'http=foo;http=bar' will produce a # comma-separated list, while previously 'bar' would # override 'foo'. That could potentially break something. if protocol not in proxies: proxies[protocol] = address else: proxies[protocol] += ',' + address ---------- components: Library (Lib), Windows messages: 382921 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: urllib.request.getproxies() misparses Windows registry proxy settings type: behavior versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42627> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com