There are 25496 Chinese characters in iso14651_t1_pinyin, most of them distribute over CJK unified ideographs and CJK unified ideographs extension A.
But there are 27552 Chinese characters in CJK unified ideographs and extension A, more than 2000 Chinese characters without pinyin were losted. So my suggestion is just add the losted characters at the end of the iso14651_t1_pinyin, in the order of unicode. Could you give me any feedback? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/821951 Title: sort -u erase some utf8 characters To manage notifications about this bug go to: https://bugs.launchpad.net/eglibc/+bug/821951/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs