Date: Sun, 24 Jun 2018 19:09:58 +0200 From: Rhialto <rhia...@falu.nl> Message-ID: <20180624170958.gj8...@falu.nl>
| Are we to assume that NetBSD's sh(1) manual page is correct? Well, yes and no... | Since that clearly says that your example above should not match. Actually, it doesn't - it just kind of slides by this case... That is, it makes no mention of what happens if characters inside [ ] are quoted (partly because I don't much like the quoting solution, and never thought the ordering method was hard to get right...) The man page shouild probably be fixed to be more precise - there are all kinds of details it omits. But there are also limits on how much people are willing to read! | Pretty much ~always, descriptions of character classes (including | re_format(7)) have included words to the effect of | | To include a ``]'' in a character class, make it the first character | listed (after the ``!'', if any). To include a ``-'', make it the first | or last character listed. Yes, that's how this was originally designed in the vey earliest versions of re's and glob matching - from 5th edition or earlier (that's as far back as I go.) | so the example should always have been | case - in [az-]) ... Yes, of course, that should work, and there are tests for that kind of case as well (and [-az] of course). Those ones work. So does the simple [a\-z] case, if you accept that it is supposed to be a match of the 3 chars listed a - and z. This isn't anything new, it has been like that in the NetBSD sh for a long time (probably goes back to the original ash) and works the same way in every other sh I can find to test. | if you want this to match. If your version matches, I'd call that a | long-standing bug. That may be, I suspect this happened, as the original Bourne sh (which had to run on non split I-D pdp-11's) handled parsing by reading the input, and for any quoted (ascii only of course) char, simply set the top bit. Then it would compare against the operator chars ('<' etc) or the pattern magic ('*' etc) and with the top bit set, the chars were not equal, so not magic, just nornal chars. About the last thing it did was clear the top bits before handing off to wherever the data was to go next (it would also ignore that bit in cases it was doing a comparison where no magic was expected or possible.) Whether the quoting was intended to affect things the way it did, or whether that was just an accident, is immaterial now. There's no question that it has come to be a relied upon feature, and is not going to go away. | Strangely, the Ex Reference Manual | /usr/share/doc/reference/ref1/ex/reference.ps.gz on page 13 claims that | a backslash SHOULD be used within [] to escape characters, Yes, I remember that, and never understood why. I doubt even Bill Joy would remember now. | I found a V7 system (here: [...] | I think that this shows that the trailing - isn't quite managed yet; Yes, the code that does that from the original Bourne shell was quoted on the austin-group list (posix list) and it was obvious what the bug was there... | manpage indeed doesn't include the claim above phrasing about - and ], It was intended though I suspect. It also did not properly handle []xyz] to match a ']' in a [ ] expression. Maybe those were bugs, or maybe the code to handle it was omitted as a space saving measure (since the quoting method "just worked".) | You would think that at the time the "- ] blurb" was added, That's actually much older, older than Bourne shells, it is just that the original Bourne sh had LOTS of bugs (or perhaps this was omitted, someone could ask Steve Bourne if discovering the answer to this is important.) kre