Randy McMurchy wrote:
"This package doesn't work correctly in UTF-8 based locales".
This can never be placed on a BLFS page. Something needs to be
placed on the page that accurately describes the breakage, or
limited functionality you *may* see. "Doesn't work correctly"
is simply too vague.
[snip all others]
Overall, very good reports, Alex. However, I'm not sure we need to
dramatize this so much in the book. For packages that have some issues,
we identify the issues and let the readers decide what is best for
them.
OK, I will make this a per-package note. Let me start with two packages,
please tell if the new proposed form is acceptable. If there are no
objections for UnZip, I will continue with other packages. The style of
the two packages below is intentionally different, only UnZip is meant
to be acceptable. I want your version of MC breakage description,
because it is hard for me to describe such obvious things better.
1. UnZip-5.52
This package assumes that filenames in the ZIP archives created under
non-UNIX systems are encoded in CP850, and that they should be converted
to ISO-8859-1 when writing files onto the filesystem. Such assumptions
are not always valid. In fact, inside the ZIP archive, filenames are
encoded in the DOS codepage that is in use in the relevant country, and
the filenames on disk should be in the locale encoding. In MS Windows,
the OemToChar() C function (from User32.DLL) does the correct conversion
(which is indeed the conversion from CP850 to a superset of ISO-8859-1
if MS Windows is set up to use the US English language), but there is no
equivalent in Linux.
When using unzip on a ZIP archive containing non-ASCII filenames, unzip
damages them by using improper conversion when any of. E.g., in
ru_RU.KOI8-R locale, conversion of filenames from CP866 to KOI8-R is
required, but conversion from CP850 to ISO-8859-1 is done, which
produces filenames consisting of garbage instead of words (closest
equivalent understandable for English-only users: rot13). There are
several ways around this limitation:
1) For uncompressing ZIP archives with filenames containing non-ASCII
characters, use WinZIP under WINE.
2) After unzipping, fix the damage made to filenames using the convmv
tool (http://j3e.de/linux/convmv/). The following is an example for the
ru_RU.KOI8-R locale:
Step 1. Undo the conversion done by unzip:
convmv -f iso-8859-1 -t cp850 -r --nosmart --notest /path/to/unzipped/files
Step 2. Do the correct conversion instead:
convmv -f cp866 -t koi8-r -r --nosmart --notest /path/to/unzipped/files
3) Apply this patch to unzip:
https://bugzilla.altlinux.ru/attachment.cgi?id=532
It allows to specify the assumed filename encoding in the ZIP archive
using the "-O charset_name" option and the on-disk filename encoding
using the "-I charset_name" option. Defaults: the on-disk filename
encoding is the locale encoding, the encoding inside the ZIP archive is
guessed according to the builtin table based on the locale encoding. For
US English users, this still means that unzip converts from CP850 to
ISO-8859-1 by default.
Caveat: this method works only with 8-bit locale encodings, not with
UTF-8. Attempting a use of patched unzip in UTF-8 locales may result in
a segmentation fault and probably is a security risk.
Use of UnZip in BLFS installation instructions of Mozilla, Docbook and
[insert other packages here] is not a problem, because in this book
UnZip is never told to extract a file with non-ASCII characters in its name.
2. MC-4.6.1 (without the Ubuntu patch)
This package makes the assumption that "characters" and "bytes" are the
same thing. This is not true in UTF-8 based locales. Failure of this
assumption means that MC will incorrectly position characters on the
screen, and after moving the cursor a bit, the screen becomes a total
mess, as illustrated on the screenshot (taken in ru_RU.UTF-8 locale).
Input of non-ASCII characters in the editor is impossible, even after
selecting "Other 8-bit" encoding in the menu.
--
Alexander E. Patrakov

--
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page