On 01/08/19 12:02, Ulrich Mueller wrote: > [Forwarding bug https://bugs.gentoo.org/680244 as requested by the > Gentoo package maintainer.] > > According to printf(1): > > Interpreted sequences are: > [...] > > \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits) > > \UHHHHHHHH > Unicode character with hex value HHHHHHHH (8 digits) > > It does not work, though: > > $ /usr/bin/printf '\u0041\n' > /usr/bin/printf: invalid universal character name \u0041 > $ /usr/bin/printf '\U00000041\n' > /usr/bin/printf: invalid universal character name \U00000041 > > Other tools interpret the sequence correctly: > > $ printf '\u0041\n' # bash > A > $ echo -e '\u0041' # bash > A > $ zsh -c "echo -e '\u0041'" > A > $ emacs -Q --batch --eval '(princ "\u0041\n")' > A > $ python -c "print ('\u0041')" > A > $ ruby -e 'print("\u0041\n")' > A
I agree this is a bit surprising. The full manual states: "Unicode characters in the ranges U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax, except for U+0024 ($), U+0040 (@), and U+0060 (`)." This was previously discussed at: https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067