Abhilash Raj <raj.abhila...@gmail.com> added the comment:

The bug is interesting due to some of the implementation details of 
"guess_type". The documentation says that it can parse either a URL or a 
filename.

Switching from urllib.parse._splittype to urllib.parse.urlparse changed what a 
valid "path" is. _splittype doesn't care about the rest of the URL except the 
scheme, but urlparse does. Previously, we used to split things like:

   >>> print(urllib.parse._splittype(';1.tar.gz')
   (None, ';1.tar.gz')

Then, we'd just treat the 2nd part as a filesystem path, which would rightfully 
guess the extension as .tar.gz

However, switching to using parsing via urllib.parse.urlparse, we get:

    >>> print(urllib.parse.urlparse(';1.tar.gz')
    ParseResult(scheme='', netloc='', path='', params='1.tar.gz', query='', 
fragment='')

And then we get the ".path" attribute for further processing, which being 
empty, returns (None, None).

The format of all these parts is:

    scheme://netloc/path;parameters?query#fragment

A simple fix would be to just merge path, parameters, query and fragment 
together (with appropriate delimiters) and the proceed with further processing. 
That would fix parsing of Filesystem paths but would break (again) parsing of 
URLs like:

    >>> mimetypes.guess_type('http://example.com/index.html;1.tar.gz')
    ('application/x-tar', 'gzip')

It should return 'text/html' as the type, since this is a URL and everything 
after the ';' should not be used to determine the mimetype. But, if there is no 
scheme provided, we should treat it as a filesystem path and in that case 
'application/x-tar' is the right type.

I hope I am not confusing everyone here. 

The right fix IMO would be to make "guess_type" not treat URLs and filesytem 
paths alike.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38449>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to