The undocumented -O/-I option~ So unzip can handle different encodings, at least when I tested it. But it doesn't handle them automatically. p7zip doesn't work because, as noted in Bug #269482, it only handles UTF-8 and ASCII (or maybe ISO-8859).
When I uninstalled p7zip and p7zip-full and looked at archives, some had the pattern of mojibake that I'm used to, and some had a new type. None of them were correct. So it's possible there is some automatic correction and this is why Bug #580961 for unzip has been marked as fixed. Automatic detection can probably usually be correct, but it could also sometimes be wrong. Either file-roller or unzip could probably improve their automatic detection, and in case certainty is low communicate this to users so they know there might be a problem. Two other bugs, #592109 and #1199239, refer to non-ASCII filenames encoded in UTF-8, which is likely to be the output from Unix-like systems. So non-UTF-8 encodings are probably from computers running Windows (?). Some open-source programs that do automatic detection of encodings include Firefox and, I think, gedit. Also maybe a bit similar to magic numbers used to determine file types..? This might be how those programs already do this, but I think the way to detect the proper encoding is to interpret the filenames as a certain type, and then look for characters, or character combinations, that are unlikely to appear in a normal filename, or that can't even be output on the file system type. For example, the character '', U+0082 <control> BREAK PERMITTED HERE, often appears in some languages when common encodings are interpreted as iso8859(-number?). Either file-roller or unzip could test various combinations to see what looks valid. This is worse than an archive file saying what encoding it uses, but basically it seems like some regions (like Japan) don't feel like using UTF-8. The zip format, which is probably worse than other formats in handling filenames, is probably still used because it encodes the contents of files separately, which means it's faster but gives worse compression than other archive formats. Maybe there are other reasons it's faster too. Having a unique file suffix for a certain set of compression options or quality means that people don't have to worry about which options to choose, and can't argue about which ones other people should use for that suffix. There is probably also a sort of stigma attached to having a poor compression ratio for many types of files, compared to other formats. (For example, some non-zip archive formats can compress a windows bmp file so that it's smaller than a png of the same image, especially if the image has repeating portions.) So either people can agree on a 'low-quality' archive suffix to use in cases where actual compression isn't important, that's also operating- system independent, or we will continue to encounter zips from different languages that don't tell you how to decode the file names. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to file-roller in Ubuntu. https://bugs.launchpad.net/bugs/495880 Title: File Roller cannot handle archive that doesn't encode filenames in UTF-8 Status in file-roller package in Ubuntu: Confirmed Bug description: Binary package hint: file-roller I have received a zip containing a file with a german "Umlaut" in the filename. I cannot extract the file because I get the following error message: caution: filename not matched: Liste Verwaltung und Verk\?ndigung Dezember 2009.xls I have no possibility to change the filename and eliminate the "Umlaut" in the filename... ProblemType: Bug Architecture: i386 CheckboxSubmission: e27141b8feed9a0134eefdd87f008818 CheckboxSystem: 558fbfb2a1258711a37bb7e23c5d4e6e Date: Sat Dec 12 11:48:49 2009 DistroRelease: Ubuntu 9.10 ExecutablePath: /usr/bin/file-roller NonfreeKernelModules: nvidia Package: file-roller 2.28.1-0ubuntu1 ProcEnviron: LANGUAGE=de_DE.UTF-8 PATH=(custom, no user) LANG=de_DE.UTF-8 SHELL=/bin/bash ProcVersionSignature: Ubuntu 2.6.31-16.53-386 SourcePackage: file-roller Uname: Linux 2.6.31-16-386 i686 XsessionErrors: (gnome-settings-daemon:3121): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed (gnome-settings-daemon:3121): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed (polkit-gnome-authentication-agent-1:3161): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed (nautilus:3155): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/file-roller/+bug/495880/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp