On 2023-08-01 16:42, Pádraig Brady wrote:
On 01/08/2023 10:07, Dragan Simic wrote:
Add new command-line option and the required logic that allow multiple
consecutive delimiters to be treated as a single delimiter. Of
course,
this option is valid only with the cut's field mode.
This new feature should make cut much more usable in various
real-world
applications, some of which are already mentioned in the gotchas. For
example, merging the consecutive delimiters is very useful when cut is
used to process the outputs of various commands.
Add a whole battery of new cut tests, which cover this new feature,
and
add more tests for the related already existing features, to make sure
no regressions are introduced.
While there, clean up the comments and the whitespace in the cut tests
a bit, to make them slightly more readable.
Thanks for the patch.
I wonder whether a --empty-fields={ignore,suppress} is a more general
interface.
I wonder would it be a more complex approach, and more importantly, less
intuitive? Quite frankly, I think it's easier to visualize the empty
space. or the delimiters as a more general approach, becoming
"squeezed". I think that visualizing the empty fields is harder,
especially when the delimiter is a whitespace character.
This overlaps somewhat with the -w option in FreeBSD's cut,
which merges runs of whitespace, and which I was also considering
adding.
After thinking a bit about it, how about having both "-m", from the
patch I submitted, and "-w", which would behave differently than the
FreeBSD's "-w"? Please, allow me to explain.
More specifically, our "-w" would simply "squeeze" all the whitespace in
the input without forcing the delimiter to be whitespace. The
"squeezing" would produce a whitespace character in the input, instead
of whatever got "squeezed" there. That would be either the whitespace
character specified as an optional value for the "-w" option, or it may
by default produce a space wherever only spaces were "squeezed", or a
tab wherever the "squeezed" whitespace contained at least one tab.
With both "-m" and "-w" options in place we'd end up with a quite
versatile cut, which would cover what FreeBSD's cut does, and be able to
do more. I'd be willing to implement the "-w" option as well.