Package: libtext-unidecode-perl
Please please please add tones to the Chinese.
In Chinese they are more important than vowels and consonants.
And they are merely "1,2,3,4 and 0 or 5", and certainly in the Unicode
databases.
o Very many Unicode characters transliterate to multi-character
sequences. E.g., Han character 0x5317 transliterates as the four-
character string "Bei ".
That should be "Bei3 ".
I have not explored other tonal languages. Anyway Chinese is the
world's most speakers' language.
You could add a switch to turn the tones back off if one needs
backwards compatibility.
P.S., Recently I made big use of Text::Unidecode on a ASCII console, to find
and read the file I wanted:
$ cat Makefile
export LC_ALL=zh_TW.UTF-8
P=perl -C -Mutf8 -MText::Unidecode -wnle 'print unidecode($$_);'
l:;ls -i|$P|sed -n 's/Tai Zhong Xian Jing Cha Ju He Ping Fen Ju//p'|\
xargs find -inum|xargs $P
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]