Le ven. 1 mars 2024 à 20:30, Pádraig Brady <p...@draigbrady.com> a écrit :
> On 01/03/2024 15:33, lacsaP Patatetom wrote: > > hi, > > > > I did a few tests with tr and I'm surprised by the results... > > > > $ echo éèçà > > éèçà > > > > these characters are encoded in utf-8 on 2 bytes : > > > > $ echo éèçà | xxd > > 00000000: c3a9 c3a8 c3a7 c3a0 0a ......... > > > > now I use tr to remove non-printable characters : > > > > $ echo éèçà | tr -cd '[:print:]' > > $ echo éèçà | tr -cd '[:print:]' | wc > > 0 0 0 > > > > all characters are deleted by tr > > now I want to keep the "é" character : > > > > $ echo éèçà | tr -cd '[:print:]é' > > é��� > > > > why do the "�" characters appear ? > > > > regards, lacsaP. > > > It's a known issue that tr is currently non multi-byte aware. > > thanks, > Pádraig > hi, thank you for this clarification. what alternative to `tr` would you recommend for this type of treatment ? regards, lacsaP.