On Mon, Dec 11, 2023 at 07:42:10AM -0500, Greg Wooledge wrote: > On Mon, Dec 11, 2023 at 09:37:42AM +0100, to...@tuxteam.de wrote: > > 2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you > > think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and > > ']'. I guess you want to say 'A-Za-z0-9.' > > Well spotted. > > > 3. As a convenience, tr has char classes. Perhaps [:alnum:] is for > > you. No idea whether this is a GNU extension > > It's POSIX. 100% portable, as long as you ignore any bugs in GNU tr. > > Looks like GNU tr in Debian 12 still doesn't handle multibyte characters > correctly: > > unicorn:~$ echo 'mañana' | tr ñ X > maXXana > > So... as long as you're working in the C locale, where [:alnum:] is > just the ASCII capital and lowercase letters and digits, you should be > fine.
Hey, you just gave us a handy way to count how many encoding units a character takes: tomas@trotzki:~$ echo 'birdie🐦here' | tr -c 'a-z' X birdieXXXXhereX ;-) Cheers -- t
signature.asc
Description: PGP signature