Package: gawk Version: 1:3.1.4-2 Severity: important
gawk does not handle UTF-8 multibyte characters properly. Here's an example: $ cat example.txt A Only_a_singlebyte_character_here_(UTF-8:_41) Ö A_letter_which_takes_two_bytes_(UTF-8:_c3_96) € A_currency_symbol_which_takes_three_bytes_(UTF-8:_e2_82_ac) $ cat example.txt | awk '{ printf "%-5s%s\n",$1, $2 }' A Only_a_singlebyte_character_here_(UTF-8:_41) Ö A_letter_which_takes_two_bytes_(UTF-8:_c3_96) € A_currency_symbol_which_takes_three_bytes_(UTF-8:_e2_82_ac) As we can see the format specifier %-5s does not calculate field widths correctly when string contains multibyte characters. Unfortunately this makes gawk's field widths mostly unusable with UTF-8 locale. -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (850, 'testing'), (800, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.8-2-k7 Locale: LANG=fi_FI.UTF-8, LC_CTYPE=fi_FI.UTF-8 (charmap=UTF-8) Versions of packages gawk depends on: ii libc6 2.3.2.ds1-22 GNU C Library: Shared libraries an -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]