R. David Murray <rdmur...@bitdance.com> added the comment:

It must be that the different key order only happens on the one platform 
because of the quirky nature of dictionary construction.  That is, there is 
*something* on that platform that is changing where things get hashed when the 
dictionary is recreated.

The problem with fixing this is that any fix is going to change the behavior, 
unless we go to the lengths of recording the order of the initializations in 
add_type and replay it when init is called a second time.  That solution is 
pretty much a non-starter :)

The mimetypes docs say that init can be called more than once,  They say that a 
MimeTypes object starts out "with the same database as provided by the rest of 
the module".  The docs explain how the initial database state is created.

What the docs don't do is say what *happens* when you call init more than once. 
 There are two possibilities: either we (1) restart from the initial state, or 
we (2) start from the current (possibly modified) state of the database and 
then add whatever is specified in the init call.  (Actually, there's a third 
possibility: we could also add back in anything from the default init that was 
deleted; but this halfway version is unlikely to be anyone's intent or 
expectation.)

The actual implementation of the mimetypes module does (2) if and only if you 
pass init a list of files.  If you don't then it does something that isn't even 
the third way above: it reloads *just* the data from the system files it 
managed to find, without reloading the data from the internal tables.

Clearly this behavior is....odd.  When no files are passed, init should do one 
of two things: either nothing, or reset the global db state to its initial 
value.

It's not so clear what the behavior should be when you pass init one or more 
files.  It is possible, even highly probable, that there is code out there that 
depends on the fact that doing so is additive.

Given this analysis, I think that the best fix would be implement (and 
document) the following behavior for init:

  If called with no arguments, it rebuilds the module database from scratch

  If called with a list of files, it adds the contents of those files to the 
module database

The second is a backward compatibility hack.  Ideally it would be deprecated in 
favor of some sort of load_mime_files method.

It is possible that the first will also break code, but I think it is less 
likely, and probably an acceptable risk in a new major release.  But I'd be 
prepared to change it to 'init does nothing' if breakage showed up during RC 
testing.

The problem with this "fix" is that it does not, in fact, address the root 
cause of the OP's bug report.  The specific behavior he observes when calling 
init() would be fixed, but the underlying problem remains.  If he were to 
instead instantiate a new MimeTypes db, then when it "copies" the module 
database, it will build its own database by running the old database in key 
order, and once again the results returned by guess_extension might mutate.  
This means that the new db is *not* a copy of the old db when it starts.

That problem could be fixed by having MimeTypes.__init__ do a copy of the 
types_map and types_map_inv data structures instead of rebuilding them from 
scratch.  This would mean shifting the initialization of these structures out 
of MimeTypes and in to init (in the 'reinitialize' code path) or perhaps into 
_default_mime_types, but I don't see that as a big problem, once init is doing 
a full reinitialization by default.  (There is also the question of whether it 
should be a 'deep copy', but I don't think that is needed since a user would 
need to be doing something pretty hackish to run afoul of a 
shallow-copy-induced problem.)

Can anyone see flaws in this analysis and proposed solution?  I've marked the 
fix as easy since a python hacker should be able to knock out a solution in a 
day, but it isn't trivial.  And I have no clue how to write a unit test for the 
MimeTypes.__init__ order-shifting bug.

I'm also resetting the priority to normal since I consider the ambiguity of 
what calling init twice actually does to be a bigger issue than it sometimes 
changing the results of a function with 'guess' in its name :)

I've attached a patch with a unit test for the 'init doesn't re-init' behavior.

(By the way, it also appears to me from reading the code that read_mime_types 
is buggy in that it actually returns a merge of the loaded file with the 
current module DB state, but I haven't checked that observation.)

----------
keywords: +easy, patch
priority: low -> normal
resolution: works for me -> 
Added file: http://bugs.python.org/file17803/mimetypes-init-test.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4963>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to