Hello,

there are claims that the current instructions are overly complicated. Below is the first part of the proof that they are in fact needed, but still insufficient for producing a system that works with both UTF-8 based and traditional locales with zero regressions, and anything else is oversimplification. All testcases assume that LC_ALL is not set.

1) On Glibc-2.3.6 page: sed -i '/vi_VN.TCVN/d' localedata/SUPPORTED

Testcase: LANG=vi_VN.TCVN bash

Expected result: bash command prompt, no CPU usage.
Actual result: bash eats 100% of CPU, doesn't show the prompt.

2) coreutils-5.93-i18n-1.patch, required by LSB. Upstream is RedHat CVS. Only tests that fail without the patch are included below. All other tests will be provided upon request.

Question: how to explain shorter this in the book, preferrably without producing a damaged PDF file?

Testcases, all adapted to the en_US.UTF-8 locale, the origin is LSB:

(please copy-and-paste from Mozilla or KMail into 'LANG=en_US.UTF-8 xterm', the characters like '5' are not usual ASCII digits, they take three bytes and are two columns wide, you can't enter them directly without special software, and ' ' is a Chinese blank character, not two spaces).

A) echo '1234567123572468' | fold -w 7

(the intention is to verify that the "fold" utility is able to fold strings based on their width, not number of bytes)

Expected result:
1234567
12357
246
8

Wrong result (here "invalid character" boxes are replaced with asterisks):
1234567
1235*
**2*
*68
(i.e., the lines are folded after 7 bytes, not 7 cells)

B) echo '123456712347369' | fold -w 7

(this is just a variation of the above test)

Expected result:
1234567
12347
369

Wrong result:
1234567
12347
36*
**
(i.e., the lines are folded after 7 bytes, not 7 cells)

C) echo 'blank wblank EOF' | fold -w 10 -s

This tests the ability of the "fold" program to recognize the whitespace according to the current locale.

Expected result:
blank
wblank 
EOF

Wrong result:
blank
wblank E
OF
(i.e., the ' ' character is not recognized as whitespace).

D)

cat >tp3_a.input <<"EOF"
sss_John
SSS_Paul
ttt_Ringo
EOF
cat >tp3_b.input <<"EOF"
sss_Lennon
ttt_Starr
TTT_Harison
EOF
join -t '_' -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp3_a.input tp3_b.input

This tests whether the "join" program accepts a wide character as a field separator.

Expected results:
sss John Lennon
SSS Paul (null)
ttt Ringo Starr
TTT (null) Harison

Wrong results:
join: multi-character tab `_'
(this is wrong because _ is a single character that takes multiple bytes)

E) [my own test, fails with the patch and demonstrates that the patch is buggy and the testsuite is skewed]

join -t 'a!' -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp3_a.input tp3_b.input

Expected result:
join: multi-character tab `a!'

F)

cat >tp4_a.input <<"EOF"
sss John
SSS Paul
ttt Ringo
EOF
cat >tp4_b.input <<"EOF"
sss Lennon
ttt Starr
TTT Harison
EOF
join -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp4_a.input tp4_b.input

The intention is to verify that the "join" command by default treats all whitespace characters from the current locale as field separators.

Expected results:
sss John Lennon
SSS Paul (null)
ttt Ringo Starr
TTT (null) Harison

Wrong results:
sss John (null) (null)
sss Lennon (null) (null)
SSS Paul (null) (null)
ttt Ringo (null) (null)
ttt Starr (null) (null)
TTT Harison (null) (null)

G) ("pr" and "tr" tests fail even with the patch, maybe it is a good idea to update it from RedHat. This didn't happen with patched coreutils-5.2.1)

H) echo '1日     国際化  きほん  file' | unexpand -a | wc -c

The intention is to test that the "unexpand" program counts cells, not bytes.

Expected result: 30

I) [my own test, fails even with patched coreutils-5.2.1, demonstrates that the LSB sample implementation implements LSB up to the letter, but doesn't catch the spirit of i18n requirements]

echo ä | tr [:lower:] [:upper:]

Expected result: Ä
Wrong result: ä

More later.

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to