[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
New submission from Dave Chambers : The current mimetypes.read_windows_registry() enums the values under HKCR\MIME\Database\Content Type However, this is the key for mimetype to extension lookups, NOT for extension to mimetype lookups. As a result, when >1 MIME types are mapped to a particular extension, the last-found entry is used. For example, both "image/png" and "image/x-png" map to the ".png" file extension. Unfortunately, what happens is this code finds "image/png", then later finds "image/x-png" and this steals the ".png" extension. The solution is to use the correct regkey, which is the HKCR root. This is the correct location for extension-to-mimetype lookups. What we should do is enum the HKCR root, find all subkeys that start with a dot (i.e. file extensions), then inspect those for a 'Content Type' value. The attached ZIP contains: mimetype_flaw_demo.py - this demonstrates the error (due to wrong regkey) and my fix (uses the correct regkey) mimetypes_fixed.py - My suggested fix to the standard mimetypes.py module. -- components: Windows files: mimetype_flaw_demo.zip messages: 164167 nosy: dlchambers priority: normal severity: normal status: open title: mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file26180/mimetype_flaw_demo.zip ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: My first diff file... I hope I did it right :) -- keywords: +patch Added file: http://bugs.python.org/file26181/mimetypes.py.diff ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: I added a diff file to the bug. Dunno if that's the same as a patch file, or how to create a patchfile if it's not. >Do you know if image/x-png and image/png are included in the registry on all > windows versions? I think your question is reversed, in the same way that the code was reversed. You're not looking for image/png and/or image/x-png. You're looking for .png in order to retrieve its mimetype (aka Content Type). While nothing is 100% certain on Windows :), I'm quite confident that every copy will have an HKCR\.png regkey, and that regkey will have a Content Type value, and that value's setting will be the appropriate mometype, which I'd expect to be image/png. I was kinda surprised to find this bug as it's so obvious I started chasing it because Chrome kept complaining that pngs were being served as image/x-png (by CherryPy). There are other bugs (eg: 15199, 10551) that my patch should fix. -Dave -- ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Changes by Dave Chambers : Added file: http://bugs.python.org/file26185/mimetypes.py.diff.u ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: Disappointing that "faster but broken" is preferable to "slower but fixed" -- ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: Seems to me that some hybrid would be a good solution: Hardcode the known types (which solves the "windows is just wrong" case) then as a default look in the registry for those that aren't hardcoded. Therefore the hit of additional time would only be for lesser-known types. In any case, it's pretty bad that python allows the wrong mimetype for PNG , even if it is a Windows registry issue. -- ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: > removing read_windows_registry() If you're suggesting hardcoding *ALL* the mimetypes for *ALL* OSes, I think that's probably the best overall solution. No variability, as fast as can be. The downside is that there would occasionally be an unrecognized type, thus there'd need to be diligence to keep the hardcoded list up to date, but overall I think Ben Hoyt's suggestion is best. -- ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: (I'm a windows dev type) I would say that there are 2 issues with relying on the registry: 1) Default values (ie. set by Windows upon OS install) are broken and MS never fixes them. 2) The values can be changed at any time, by any app. Thus the values are unreliable. If I were to code it from scratch today, I'd create a three-pronged approach: a) Hardcode a list of known types (fast & reliable). b) Have a default case where unknown types are pulled from the registry. Whatever value is retrieved is likely better than returning e.g. "application/octet-stream". c) When we neither find it in hardcoded list or in the registry, return a default value (e.g. "application/octet-stream") For what it's worth, my workaround will be to have my app delete the HKCR\MIME\Database\Content Type\image/x-png regkey, thus forcing the original braindead mimetypes.py code to use HKCR\MIME\Database\Content Type\image/png And, for what it's worth, my patch is actually faster than the current mimetypes.py code because I'm not doing reverse lookups. Thus any argument about a difference in speed is moot. Arguments about the speed of pulling mimetypes from registry are valid. Another registry based approach would be to build a dictionary of mimetypes on demand. In this scenario, at startup, the dictionary would be empty. When python needs the mimetype for ".png", on the 1st request it would cause a "slow" registry lookup for only that type but on all subsequent requests for the type it would use the "fast" value from the dictionary. Given that an app will probably use only a handful of mimetypes but will use that same handful over and over, such a solution would have the benefits of (a) not using hardcoded values (thus no ongoing maintenance), (b) performing slow stuff only on demand, (c) optimizing repeat calls, and (d) consuming zero startup time. I'll code his up & run some timing tests if anyone thinks it's worthwhile. BTW, who makes the final determination as to if/when any such changes would be incorporated? -- ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers added the comment: Enough with the bikeshedding... it's been 10 months... fix the bug. -- ___ Python tracker <http://bugs.python.org/issue15207> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com