On 01/03/2024 15:33, lacsaP Patatetom wrote:
hi,
I did a few tests with tr and I'm surprised by the results...
$ echo éèçà
éèçà
these characters are encoded in utf-8 on 2 bytes :
$ echo éèçà | xxd
00000000: c3a9 c3a8 c3a7 c3a0 0a .........
now I use tr to remove non-printable characters :
$ echo éèçà | tr -cd '[:print:]'
$ echo éèçà | tr -cd '[:print:]' | wc
0 0 0
all characters are deleted by tr
now I want to keep the "é" character :
$ echo éèçà | tr -cd '[:print:]é'
��
why do the "�" characters appear ?
regards, lacsaP.
It's a known issue that tr is currently non multi-byte aware.
thanks,
Pádraig