Overcomplicated? [part 1: glibc, coreutils]

Alexander E. Patrakov Thu, 19 Jan 2006 07:35:23 -0800

Hello,

there are claims that the current instructions are overly complicated.Below is the first part of the proof that they are in fact needed, butstill insufficient for producing a system that works with both UTF-8based and traditional locales with zero regressions, and anything elseis oversimplification. All testcases assume that LC_ALL is not set.


1) On Glibc-2.3.6 page: sed -i '/vi_VN.TCVN/d' localedata/SUPPORTED

Testcase: LANG=vi_VN.TCVN bash

Expected result: bash command prompt, no CPU usage.
Actual result: bash eats 100% of CPU, doesn't show the prompt.

2) coreutils-5.93-i18n-1.patch, required by LSB. Upstream is RedHat CVS.Only tests that fail without the patch are included below. All othertests will be provided upon request.

Question: how to explain shorter this in the book, preferrably withoutproducing a damaged PDF file?


Testcases, all adapted to the en_US.UTF-8 locale, the origin is LSB:

(please copy-and-paste from Mozilla or KMail into 'LANG=en_US.UTF-8xterm', the characters like '５' are not usual ASCII digits, they takethree bytes and are two columns wide, you can't enter them directlywithout special software, and '　' is a Chinese blank character, not twospaces).


A) echo '1234567123５７２４６８' | fold -w 7

(the intention is to verify that the "fold" utility is able to foldstrings based on their width, not number of bytes)


Expected result:
1234567
123５７
２４６
８

Wrong result (here "invalid character" boxes are replaced with asterisks):
1234567
123５*
**２*
*６８
(i.e., the lines are folded after 7 bytes, not 7 cells)

B) echo '12345671234７３６９' | fold -w 7

(this is just a variation of the above test)

Expected result:
1234567
1234７
３６９

Wrong result:
1234567
1234７
３６*
**
(i.e., the lines are folded after 7 bytes, not 7 cells)

C) echo 'blank wblank　EOF' | fold -w 10 -s

This tests the ability of the "fold" program to recognize the whitespaceaccording to the current locale.


Expected result:
blank
wblank　
EOF

Wrong result:
blank
wblank　E
OF
(i.e., the '　' character is not recognized as whitespace).

D)

cat >tp3_a.input <<"EOF"
sss＿John
SSS＿Paul
ttt＿Ringo
EOF
cat >tp3_b.input <<"EOF"
sss＿Lennon
ttt＿Starr
TTT＿Harison
EOF
join -t '＿' -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp3_a.input tp3_b.input

This tests whether the "join" program accepts a wide character as afield separator.


Expected results:
sss John Lennon
SSS Paul (null)
ttt Ringo Starr
TTT (null) Harison

Wrong results:
join: multi-character tab `＿'
(this is wrong because ＿ is a single character that takes multiple bytes)

E) [my own test, fails with the patch and demonstrates that the patch isbuggy and the testsuite is skewed]


join -t 'a!' -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp3_a.input tp3_b.input

Expected result:
join: multi-character tab `a!'

F)

cat >tp4_a.input <<"EOF"
sss　John
SSS　Paul
ttt　Ringo
EOF
cat >tp4_b.input <<"EOF"
sss　Lennon
ttt　Starr
TTT　Harison
EOF
join -a 1 -a 2 -e '(null)' -o 0,1.2,2.2 tp4_a.input tp4_b.input

The intention is to verify that the "join" command by default treats allwhitespace characters from the current locale as field separators.


Expected results:
sss John Lennon
SSS Paul (null)
ttt Ringo Starr
TTT (null) Harison

Wrong results:
sss　John (null) (null)
sss　Lennon (null) (null)
SSS　Paul (null) (null)
ttt　Ringo (null) (null)
ttt　Starr (null) (null)
TTT　Harison (null) (null)

G) ("pr" and "tr" tests fail even with the patch, maybe it is a goodidea to update it from RedHat. This didn't happen with patchedcoreutils-5.2.1)


H) echo '1日     国際化  きほん  file' | unexpand -a | wc -c

The intention is to test that the "unexpand" program counts cells, notbytes.


Expected result: 30

I) [my own test, fails even with patched coreutils-5.2.1, demonstratesthat the LSB sample implementation implements LSB up to the letter, butdoesn't catch the spirit of i18n requirements]


echo ä | tr [:lower:] [:upper:]

Expected result: Ä
Wrong result: ä

More later.

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Overcomplicated? [part 1: glibc, coreutils]

Reply via email to