On Mon, Dec 11, 2023 at 11:25:13AM +0000, Albretch Mueller wrote: > "tr --complement --squeeze-repeats ..." makes sure that the replaced > characters only appear once (that it doesn't immediately repeat). Say > you have something like " " (two spaces) or "?$|" (three characters) > which will be replaced by just an underscore.
...which would change the length, as I wrote. > In the case of: "ASCII text" > what should come out of it is: "ASCII_text" > not: "ASCII_text_" > no underscore at the end. That is the question I have. That depends on whether your "ASCII text" has some thingy at the end which you don't see. A newline, perchance? > I use such constructs as: "[A-Za-z0-9.]" to make explicit to myself > and other people what I mean. I work in corpora research dealing with > text based various alphabets not just in ASCII so I avoid any kinds of > linguistic/cultural shortcuts and abbreviations. What has this to do with how tr works? It will treat [ and ] as characters not to substitute. I pointed that out, because it might have been unintended: echo -n 'This is a text with [some brackets] in it' | tr -cs "[A-Za-z0-9.]" "_" This_is_a_text_with_[some_brackets]_in_it (Note this "-n" on the echo, btw? Without it, I'd be getting a "_" at the end, the transliterated newline). Do whatever you want :-) Cheers -- t
signature.asc
Description: PGP signature