[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread David K. Hess


David K. Hess  added the comment:

Hi, I'm the author of the commit that's been fingered. Some comments about the 
behavior being reported

First, as pointed out by @xtreak, indeed the mimetypes module uses mimetypes 
files present on the platform to add to the built in list of mimetypes. In this 
case, "video/x-mastroska" and ".mkv" are not found in the mimetypes module and 
were never there - they are coming from the host OS.

Also, for better or worse, the mimetypes module has an internal "init" method 
that does more than just instantiates a MimeTypes instance for default use:

https://github.com/python/cpython/blob/5c0c325453a175350e3c18ebb10cc10c37f9595c/Lib/mimetypes.py#L345

It also loads in these system files (and also Windows Registry entries on 
Win32) into a fresh MimeTypes instance. So, addressing what @The Compiler is 
seeing, properly resetting the mimetypes module really involves calling 
mimetypes.init(). By historical design, instantiating a MimeTypes class 
instance directly will not use host OS system mime type files.

As to why this commit is causing a change in the observed behavior, the problem 
that was corrected in this commit was that the mimetypes module had 
non-deterministic behavior related to initialization. In the original init 
code, the module level mime types tables are changed (really corrupted) after 
first load and you can never reinitialize the module back to a known good state 
(i.e. to original module defaults without information from the host OS system).

So, realistically, the behavior currently observed is the correct behavior 
given the presence and historical nature of the init function. The fact that a 
fresh MimeTypes instance without having been init()'d or with no filenames 
provided is returning an OS entry prior to this commit is really part of the 
initialization bug which was fixed.

Regarding the ranger bug, the main thing is you should not use a MimeTypes 
instance directly unless you run it through the same initializations that the 
init code does.

Anyway, that's my perspective having waded through all of that during the 
original BPO. I don't claim it's the correct one but that's where we are at.

--

___
Python tracker 
<https://bugs.python.org/issue38656>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2019-11-18 Thread David K. Hess

David K. Hess  added the comment:

The documentation you quoted does read to me as compatible? The database it is 
referring to is the one hardcoded in the module – not the one assembled from 
that and the host OS. But, maybe this is just the vagaries of language and 
perspective at play.

Anyway I do agree it is an unexpected behavior change from the perspective of a 
user of the MimeTypes class directly. To get the best context for this change, 
it's useful to run through the long history of the issue that drove it:

https://bugs.python.org/issue4963

Note, that discussion never touched on the use case of instantiating a 
MimeTypes class directly and there are apparently no test cases covering this 
particular scenario either. With no awareness of this perspective/use case it 
didn't get directly addressed.

Perhaps all MimeTypes instances should auto-load system files unless a new 
__init__ param selects for this new "clean" behavior?

--

___
Python tracker 
<https://bugs.python.org/issue38656>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40139] mimetypes module racy

2020-04-04 Thread David K. Hess

David K. Hess  added the comment:

I’m not sure I can shed any light on this particular bug, but I would say that 
based on my dealings with this module, it is definitely not thread-safe. That 
means that if you are going to have multiple threads accessing it 
simultaneously, you really should have a mutex around that access ensuring only 
one thread is running through the code in this module at a time. 

Now in reality, asyncio and other cooperatively scheduled multi-processing 
packages like gevent are not going to unpredictably yield control to another 
thread like true threads will. So, in this particular case, since the init code 
doesn’t use async or await, I don’t think there is a chance of an 
initialization race bug there. 

As to the bug witnessed, the only thing I can suggest is to add a considerable 
amount of debugging that logs the argument to guess_type and prints out the 
mimetype module’s internal state if and when this happens again. My best guess 
based on the amount of work that method does to inspect the passed in url, is 
that it has something to do with the url itself.

--

___
Python tracker 
<https://bugs.python.org/issue40139>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38656] mimetypes for python 3.7.5 fails to detect matroska video

2020-07-17 Thread David K. Hess


David K. Hess  added the comment:

@michael-lazar a documentation change seems the path of least resistance given 
the complicated history of this module. +1 from me.

--

___
Python tracker 
<https://bugs.python.org/issue38656>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2019-06-25 Thread David K. Hess


David K. Hess  added the comment:

Thank you Steve!

Nice to see this one make it across the finish line.

--

___
Python tracker 
<https://bugs.python.org/issue4963>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2017-08-10 Thread David K. Hess

Changes by David K. Hess :


--
pull_requests: +3096

___
Python tracker 
<http://bugs.python.org/issue4963>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2017-08-10 Thread David K. Hess

David K. Hess added the comment:

FYI, PR opened: https://github.com/python/cpython/pull/3062

--

___
Python tracker 
<http://bugs.python.org/issue4963>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2017-05-13 Thread David K. Hess

David K. Hess added the comment:

Ok, I followed @r.david.murray's advice and decided to take a shot at this.

First, I noticed that I couldn't reproduce the non-deterministic behavior that 
I reported above on the latest code (i.e. pre-3.7). After doing some research 
it appears this was the sequence of events:

1) Pre-3.3, hashing was stable and this wasn't a problem.
2) Hash randomization became the default in version 3.3 and this 
non-determinism showed up.
3) A new dict implementation was introduced in 3.6 and key orders became stable 
between runs and this non-determinism was gone. However, as the notes on the 
new dict implementation indicate, this ordering should not be relied upon.

I also looked at some other issues:

* 6626 - The patch here basically rewrote the module. I agreed with the last 
comment on that issue that it probably doesn't need that.
* 24527 - Related to the .init() problems discussed here in r.david.murray's 
excellent analysis of the init behavior.
* 1043134 - Where the preferred extension issue was addressed via a proposed 
new map.

My approach with this patch is to address the init problem, the non-determinism 
and the preferred extension issue.

For the init, I made two changes:

1) I added new references to the initial values of the maps so they could be 
retained between init() calls. I also modified MimeTypes.__init__ to refer to 
these.

2) I modified the init() function to check the files argument as r.david.murray 
suggested. If it is supplied, then the existing database is used and the files 
are added to it. If it is not supplied, then the module reinitializes from 
scratch. I'll update the documentation to reflect this if the commit passes 
muster.

For the non-determinism and preferred extension, I changed the two extension 
type maps to be OrderedDicts. I then sorted the entries to the OrderedDict 
constructor by mime type and then placed the preferred extension as the first 
extension to be processed. This guarantees that it will be the extension 
returned for guess_type. The OrderedDict also guarantees that 
guess_all_extensions will always build and return the same value.

The commit can be reviewed here:

https://github.com/davidkhess/cpython/commit/ecabb1cb57e7e066a693653f485f2f687dcc7f6b

I'll open a PR if and when this approach gets enough positive feedback.

--

___
Python tracker 
<http://bugs.python.org/issue4963>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2017-05-13 Thread David K. Hess

David K. Hess added the comment:

Pushed more commits so here's a branch compare:

https://github.com/python/cpython/compare/master...davidkhess:fix-issue-4963

--

___
Python tracker 
<http://bugs.python.org/issue4963>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4963] mimetypes.guess_extension result changes after mimetypes.init()

2018-04-09 Thread David K. Hess

David K. Hess  added the comment:

Are there any committers watching this issue that are able to review the PR?

https://github.com/python/cpython/pull/3062

It's close to 6 months old now with no action on it. I'm willing to help but 
doing so and then having the PR gather dust is pretty discouraging.

Thanks in advance!

--

___
Python tracker 
<https://bugs.python.org/issue4963>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com