On 4/9/26 06:03, Pádraig Brady wrote:
I got an awk implementation contributed to toybox (which can't use
busybox's because licensing) which is twice the size of sed+tar+grep
_combined_ (or at least twice the line count).
cut(1) tries to be an efficient streaming filter,
and regexes don't fit that mold really.
It's in libc circa 1997 at least, ala
https://pubs.opengroup.org/onlinepubs/7990989775/xsh/regex.h.html
Given there are existing solutions for somewhat edge case functionality
it seems not appropriate to add IMHO.
Use awk or a shell function for efficiency, in a discussion about adding
unicode support.
Oh goddess, what did _they_ do about combining characters...
GNU treats combining characters as separate as per:
https://github.com/coreutils/coreutils/commit/fe0082333
It's policy that combining characters don't combine.
P.S. is cut -d $'\n' actually documented in the man page?
It will soon be documented in the info manual
(which is linked from the man page):
https://github.com/coreutils/coreutils/commit/c3e819fad
So the answer to my question is "no", but soon it will be "no".
Rob