Hello,
there are claims that the current instructions are overly complicated.
Below is the first part of the proof that they are in fact needed, but
still insufficient for producing a system that works with both UTF-8
based and traditional locales with zero regressions, and anything else
is oversimplification. All testcases assume that LC_ALL is not set.
1) On Glibc-2.3.6 page: sed -i '/vi_VN.TCVN/d' localedata/SUPPORTED
Testcase: LANG=vi_VN.TCVN bash
Expected result: bash command prompt, no CPU usage.
Actual result: bash eats 100% of CPU, doesn't show the prompt.
2) coreutils-5.93-i18n-1.patch, required by LSB. Upstream is RedHat CVS.
Only tests that fail without the patch are included below. All other
tests will be provided upon request.
Question: how to explain shorter this in the book, preferrably without
producing a damaged PDF file?
Testcases, all adapted to the en_US.UTF-8 locale, the origin is LSB:
(please copy-and-paste from Mozilla or KMail into 'LANG=en_US.UTF-8
xterm', the characters like '5' are not usual ASCII digits, they take
three bytes and are two columns wide, you can't enter them directly
without special software, and ' ' is a Chinese blank character, not two
spaces).
A) echo '1234567123572468' | fold -w 7
(the intention is to verify that the "fold" utility is able to fold
strings based on their width, not number of bytes)
Expected result:
1234567
12357
246
8
Wrong result (here "invalid character" boxes are replaced with asterisks):
1234567
1235*
**2*
*68
(i.e., the lines are folded after 7 bytes, not 7 cells)
B) echo '123456712347369' | fold -w 7
(this is just a variation of the above test)
Expected result:
1234567
12347
369
Wrong result:
1234567
12347
36*
**
(i.e., the lines are folded after 7 bytes, not 7 cells)
C) echo 'blank wblank EOF' | fold -w 10 -s
This tests the ability of the "fold" program to recognize the whitespace
according to the current locale.
Expected result:
blank
wblank
EOF
Wrong result:
blank
wblank E
OF
(i.e., the ' ' character is not recognized as whitespace).
D)
cat >tp3_a.input <<"EOF"
sss_John
SSS_Paul
ttt_Ringo
EOF
cat >tp3_b.input <<"EOF"
sss_Lennon
ttt_Starr
TTT_Harison
EOF
join -t '_' -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp3_a.input tp3_b.input
This tests whether the "join" program accepts a wide character as a
field separator.
Expected results:
sss John Lennon
SSS Paul (null)
ttt Ringo Starr
TTT (null) Harison
Wrong results:
join: multi-character tab `_'
(this is wrong because _ is a single character that takes multiple bytes)
E) [my own test, fails with the patch and demonstrates that the patch is
buggy and the testsuite is skewed]
join -t 'a!' -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp3_a.input tp3_b.input
Expected result:
join: multi-character tab `a!'
F)
cat >tp4_a.input <<"EOF"
sss John
SSS Paul
ttt Ringo
EOF
cat >tp4_b.input <<"EOF"
sss Lennon
ttt Starr
TTT Harison
EOF
join -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp4_a.input tp4_b.input
The intention is to verify that the "join" command by default treats all
whitespace characters from the current locale as field separators.
Expected results:
sss John Lennon
SSS Paul (null)
ttt Ringo Starr
TTT (null) Harison
Wrong results:
sss John (null) (null)
sss Lennon (null) (null)
SSS Paul (null) (null)
ttt Ringo (null) (null)
ttt Starr (null) (null)
TTT Harison (null) (null)
G) ("pr" and "tr" tests fail even with the patch, maybe it is a good
idea to update it from RedHat. This didn't happen with patched
coreutils-5.2.1)
H) echo '1日 国際化 きほん file' | unexpand -a | wc -c
The intention is to test that the "unexpand" program counts cells, not
bytes.
Expected result: 30
I) [my own test, fails even with patched coreutils-5.2.1, demonstrates
that the LSB sample implementation implements LSB up to the letter, but
doesn't catch the spirit of i18n requirements]
echo ä | tr [:lower:] [:upper:]
Expected result: Ä
Wrong result: ä
More later.
--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page