The following reply was made to PR bin/162468; it has been noted by GNATS. From: Jilles Tjoelker <jil...@stack.nl> To: Eugene Grosbein <egrosb...@rdtc.ru> Cc: bug-follo...@freebsd.org Subject: Re: bin/162468: expr(1) false syntax errors Date: Sat, 12 Nov 2011 00:52:59 +0100
On Sat, Nov 12, 2011 at 01:58:55AM +0700, Eugene Grosbein wrote: > 11.11.2011 22:44, Jilles Tjoelker пиÑеÑ: > >> [expr treats any string that looks like an operator as an operator, > >> for example, expr '>' : '.*' fails] > > The current behaviour of expr is allowed by POSIX (SUSv4, XCU 4 > > Utilities, expr). If the application passes '>', this is not a string > > operand but an operator, even if that results in an invalid expression. > > This is also documented in the man page. > Yes. But I have reports that that NetBSD's and Linux's expr(1) > both work as expected. > > It would be a valid extension to allow such expressions but it is not > > immediately clear how it would work. For example, should > > expr \( = \) > > compare two strings ("0") or return a single string ("=")? And should > > expr \( + \) > > return "+" or raise an error? > It would be wise to take a look at more robust expr(1) implementations > and try to keep compatibility. For '<', your example may work. The expr from GNU coreutils 7.4 definitely fails your example for '(', ')' and '+'. In the case of '+', they added a unary plus operator that takes the next argument as a literal even if it looks like an operator so "fixing" it would be ugly. GNU expr also has "match", "substr", "index" and "length" operators. Trying some more, GNU expr appears inconsistent and unpredictable: it will accept strings that have the form of an operator as strings in some cases but not all and it is unclear why. NetBSD's expr supports the "length" operator that we do not, but not "match", "substr" or "index". It appears to try fairly hard to make wrong input work anyway. For example, it will treat an initial "--" as a string (rather than an end-of-options marker) if the next argument is not an operator. It also gives yacc the alternative to treat any operator except parentheses as a string instead. Because of the one-token lookahead of a yacc parser, this does not, however, allow it to recognize all possible expressions with such operators as strings. For example, if the first two tokens are "length" "<", it may be necessary to read all input to decide which of the two is an operator (consider the case where the subsequent tokens are zero or more colons). NetBSD's approach will lead to inconsistent results if we ever need to extend expr (such as with GNU's named operators). The extension will change the meaning of some expressions in an unpredictable way. One way to handle this is to add the GNU cruft; it is unlikely that expr's syntax will be extended ever again given that it is mostly a legacy tool. The GNU extensions are ugly, though. If it is accepted that parentheses are always special (which GNU and NetBSD expr appear to do, and which is one way to resolve expr \( = \) ambiguity) and that there are no named operators or GNU unary "+", then there are only binary operators and the first, third, fifth, ... arguments excluding parentheses must be operands while the second, fourth, sixth, ... must be operators. > > The test utility is different in that POSIX specifies how a similar > > ambiguity shall be resolved (for a limited set of cases). A similar approach could be applied to expr (e.g. if there are three arguments and the second is ":" then it is defined to be a matching expression without going into the grammar). The assumption is that expressions written without care for strings that look like operators will be very simple (one operator only). > > Oh, and if you want to find a string length in a shell script, why don't > > you just use > > ${#VAR} > > (given that the string is in $VAR)? If you must use expr(1), do > > expr \( "x$VAR" : '.*' \) - 1 > > as described in the man page. > That's just a simple test case. In fact, I need not string length > but evaluate regexp that has ()'s: > read string < file > expr -- "$string" : 'Key: \(.*\)' read string < file case $string in "Key: "*) printf '%s\n' "${string#Key: }" ;; *) echo false ;; esac (Of course, all the printf and false mess is likely unnecessary in a real script, but this matches your command very closely.) A limitation is that the case command and the #/##/%/%% substitutions work with shell patterns which are weaker than even basic regular expressions. > Then $string starts with '>' this fails (and $string may start with '>'). It should only fail if $string is exactly '>' or '>='. > I've found a workaround: expr -- "x$string" : 'xKey: \(.*\)' > But that's only workaround, not good solution. This is not really a workaround, it is the proper way to use expr. So poor is the design of expr. -- Jilles Tjoelker _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"