bug#36887: coreutils-8.31: printf chokes on \u0041

Pádraig Brady Thu, 01 Aug 2019 06:10:18 -0700

On 01/08/19 12:02, Ulrich Mueller wrote:
> [Forwarding bug https://bugs.gentoo.org/680244 as requested by the
> Gentoo package maintainer.]
> 
> According to printf(1):
> 
>    Interpreted sequences are:
>    [...]
>    
>    \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
> 
>    \UHHHHHHHH
>           Unicode character with hex value HHHHHHHH (8 digits)
> 
> It does not work, though:
> 
> $ /usr/bin/printf '\u0041\n'
> /usr/bin/printf: invalid universal character name \u0041
> $ /usr/bin/printf '\U00000041\n'
> /usr/bin/printf: invalid universal character name \U00000041
> 
> Other tools interpret the sequence correctly:
> 
> $ printf '\u0041\n'   # bash
> A
> $ echo -e '\u0041'    # bash
> A
> $ zsh -c "echo -e '\u0041'"
> A
> $ emacs -Q --batch --eval '(princ "\u0041\n")'
> A
> $ python -c "print ('\u0041')"
> A
> $ ruby -e 'print("\u0041\n")'
> A


I agree this is a bit surprising.
The full manual states:

  "Unicode characters in the ranges
  U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
  except for U+0024 ($), U+0040 (@), and U+0060 (`)."

This was previously discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067

bug#36887: coreutils-8.31: printf chokes on \u0041

Reply via email to