Just my $0.02 on this:
On 2020-11-01 20:02:24 +0000, David Čepelík wrote:
> I've noticed an interesting discrepancy between gawk and BusyBox awk:
> when the FS is set to e.g. ` *` (space asterisk), gawk will not consider
> empty matches of the regular expression (see e.g. [1]) while BusyBox
> will. This example demonstrates it:
>
> ~% gawk --version
> GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.0)
> [...]
> ~% echo 'foo bar' | gawk -F' *' '{print $1}'
> foo
>
> While BusyBox (a7c065354) will produce:
>
> 1! ~/sw/3rd/busybox:master% echo 'foo bar' | ./busybox awk -F' *' '{print
> $1}'
> f
>
> Is this desired behavior? To my best knowledge, this isn't standardized.I would argue that it is and that gawk is in violation of the standard. The sentence I'm basing this claim on is (from [0]): > Otherwise, the string value of FS shall be considered to be an > extended regular expression. Each occurrence of a sequence matching > the extended regular expression shall delimit fields. Empty string is in my opinion match of the ERE provided, so the busybox's behavior seems to be the correct one. > Would it make sense to harmonize BusyBox's implementation with GNU Awk? That is separate question. I do not know if diverging from standard just to make this behave same way as gawk is good idea. In my opinion fixing gawk to comply is the correct choice (so consider filling this as a bug report with them). W. [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
signature.asc
Description: PGP signature
_______________________________________________ busybox mailing list [email protected] http://lists.busybox.net/mailman/listinfo/busybox
