Package: gawk
Version: 1:3.1.4-2
Severity: important

gawk does not handle UTF-8 multibyte characters properly. Here's an
example:


$ cat example.txt

A Only_a_singlebyte_character_here_(UTF-8:_41)
Ö A_letter_which_takes_two_bytes_(UTF-8:_c3_96)
€ A_currency_symbol_which_takes_three_bytes_(UTF-8:_e2_82_ac)


$ cat example.txt | awk '{ printf "%-5s%s\n",$1, $2 }'

A    Only_a_singlebyte_character_here_(UTF-8:_41)
Ö   A_letter_which_takes_two_bytes_(UTF-8:_c3_96)
€  A_currency_symbol_which_takes_three_bytes_(UTF-8:_e2_82_ac)


As we can see the format specifier %-5s does not calculate field widths
correctly when string contains multibyte characters. Unfortunately this
makes gawk's field widths mostly unusable with UTF-8 locale.


-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (850, 'testing'), (800, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.8-2-k7
Locale: LANG=fi_FI.UTF-8, LC_CTYPE=fi_FI.UTF-8 (charmap=UTF-8)

Versions of packages gawk depends on:
ii  libc6                       2.3.2.ds1-22 GNU C Library: Shared libraries an

-- no debconf information


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to