#INTRO After digging up for a while I've found where the issue comes from for both `.html` and `.py` (bug #1857824) files.
#SHORT The culprit responsible for misidentification resides in `.xml` database which specifies how to match mime-type against input data. It can be found here [2]. #LONG The `kmimetypefinder.cpp` pulls up [0] `QMimeDatabase db` apis by `db.mimeTypeForFile(...)` which in turns bootstrup `QMimeDatabasePrivate ...` XML database from .xml file.[1] If we look carefully at the content of the `"text/x-perl"` entry we would see the following: ``` <alias type="text/x-perl"/> <magic priority="50"> ... <match value="use strict" type="string" offset="0:256"/> ... </magic> ``` Did you notice the offset attribute `"0:256"`? Now if we run the following two cases we will see that files whose content contains keywords `use strict` in the range of 1..256 will be identified as `text/x-perl` script and as `text/html` if the `use trict` is located outside of such range otherwise, checkout: 💲 tee "index.html" <<eol ; echo -e "\n"; kmimetypefinder5 index.html `printf "_"%.0s {1..256}`use strict eol application/x-perl # <- OUTPUT IS WRONG ⚠️ 💲 tee "index.html" <<eol ; echo -e "\n"; kmimetypefinder5 index.html `printf "_"%.0s {1..257}`use strict eol text/html # <- OUTPUT IS CORRECT!!! ✅ - Surprising, huh? 😏 #CONCLUSION This proves that the bug comes from QTBase database which wrongly identifies `x-perl`'s keywords in JS scripts. The latter have `'use strict'` keyword that specifically should be placed at the top of the script. It seems like that they overlap for both languages. I think appropriate bug should be opened in the QTBase bug registry. [0]: https://github.com/KDE/kde-cli-tools/blob/master/kmimetypefinder/kmimetypefinder.cpp [1]: https://github.com/qt/qtbase/blob/03dfd4199deb4a0f5123fb1eead42f7e1f85e9e3/src/corelib/mimetypes/qmimedatabase.cpp#L102 [2]: https://github.com/qt/qtbase/tree/03dfd4199deb4a0f5123fb1eead42f7e1f85e9e3/src/corelib/mimetypes/mime/packages -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to shared-mime-info in Ubuntu. https://bugs.launchpad.net/bugs/1890716 Title: misidentifies .html file as Perl script when it contains JavaScript "use strict" Status in shared-mime-info: Unknown Status in kde-cli-tools package in Ubuntu: New Status in shared-mime-info package in Ubuntu: Fix Committed Status in shared-mime-info package in Debian: Confirmed Bug description: For .html files `xdg-mime` reports wrong type. The culprit is the `"use strict"` phrase which is used in JavaScript. It should not mistake .html files for anything else except of text/html ! STEPS TO REPRODUCE: Run the following step by step in any folder: 1. $ echo "\"use strict\"" > index.html 2. $ xdg-mime query filetype index.html # -> application/x-perl - this should be text/html! Platform: Ubuntu 20.04.1 LTS (Focal Fossa)" Linux version 5.4.0-42-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) xdg-utils: 1.1.3-2ubuntu1 To manage notifications about this bug go to: https://bugs.launchpad.net/shared-mime-info/+bug/1890716/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp