On 01/03/2024 15:33, lacsaP Patatetom wrote:
hi,

I did a few tests with tr and I'm surprised by the results...

$ echo éèçà
éèçà

these characters are encoded in utf-8 on 2 bytes :

$ echo éèçà | xxd
00000000: c3a9 c3a8 c3a7 c3a0 0a                   .........

now I use tr to remove non-printable characters :

$ echo éèçà | tr -cd '[:print:]'
$ echo éèçà | tr -cd '[:print:]' | wc
       0       0       0

all characters are deleted by tr
now I want to keep the "é" character :

$ echo éèçà | tr -cd '[:print:]é'
��

why do the "�" characters appear ?

regards, lacsaP.


It's a known issue that tr is currently non multi-byte aware.

thanks,
Pádraig



Reply via email to