Date: Sun, 24 Jun 2018 13:19:00 +0200 From: tlaro...@polynum.com Message-ID: <20180624111859.ga...@polynum.com>
First, thanks for reading the message, and looking at the tests and sending the comments/question - this is exactly the kind of response I was hoping for. Aside from fixing the NetBSD sh (correctly) this can also assist in getting the POSIX spec done properly. But in that regard, do note that "properly" in this context is not necessarily "rational", it has to take into account bugs (or design issues) in ancient Bourne shells that have faithfully been copied into more modern shells... (because no-one can know what scripts might be depending upon the behaviour that has been implemented). | > [97] var="[:alpha:]"; case "[" in (["$var"]) printf M;; (*) printf X;; esac | > [97] Expected output 'M', received 'X' | > | | Can you explain why you expect success ("M") in this case? I can try... | I expect: | | - Substitution of the value of $var in (["$var"]) resulting in: | (["[:alpha:]"]); Yes. | - [Suppression of the double quotes? This is, of course, the heart of the matter... In POSIX, quote removal is explicitly not done on case patterns. that is, the expansions that are done are listed, and quote removal is not one of them. So... | But this doesn't change anything in | the bracket expression]; It would, as, assuming the current literal text, an input string which was a double quote (as in '"' or \") would match, as the double quote character would appear in the [ ] expression in the pattern. Of course that is clearly absurd, and a bug report on the posix text was submitted a while ago to include quote removal in the list of operations to preform on case patterns. Unfortunately, it isn't that simple, as just doing quote removal on patterns would cause case x in ("*") echo match;; esac to match as the quote removal would leave the pattern being just an asterisk, which matches anything, which is not what is supposed to happen. So the current proposed new text (which had been accepted, but now is being discussed again, and will be changed) also specified that along with quote removal, any "pattern magic" characters in the quoted part of the pattern would be \ escaped so they remained literal, so "quote removal" of the "*" would produce \* not * and so the pattern matching would look for a literal asterisk rather than anything - which is what is wanted. But it turns out that this gets really messy, and makes case pattern processing different from filename expansion (glob) and variable expansion - as while those cases do specify quote removal, it doesn't happen until after the pattern is used, that is, in ls x"*"y we want a listing of the file named x-asterisk-y not all files with names starting x and ending y. That means that one way or another, quoting needs to be considered when matching patterns, and handled properly rather than the quotes just removed. How a shell chooses to implement that is up to the shell of course. Beyond that, when we get inside [ ] expressions, things get even messier, as POSIX (mostly to save paper, I think...) simply refers to the regular expression definition of how those are processed (with the exception of substituting ! for sh for the ^ in REs as the character to invert the match - because ^ was the "pipe" symbol in early shells - but that's not relevant here). But the effect of that way of doing the specification is that the \ which escapes magic characters in regular expressions does not work inside [ ] (and the text is explicit about that - and correct) which means the technique in the proposed revised posix text about replacing " and ' quoting with \ doesn't work at all as intended inside [ ] which is the case in test 97. But that can't be right either, as then case - in [a\-z]) ... would not match, and whether you believe it should or not, matching there is what all shells have done forever (that is, the quoted - is a literal minus/hyphen/dash (whatever you prefer to call it) and not the range indicator, where in a regular expression that would be an 'a' and a range with all chars from '\' to 'z'). | But then "[" is not an alpha, so it correctly fails... | | Could you explain why you think otherwise? The simple exlpanation of this is that because the '[' in the '[:alpha:]' is quoted, it is not a '[:' character class opening sequence, but a literal opening square bracket followed by a colon, an 'a' an 'l' ... which means that we have a bracket expression which includes a '[' character as one of its members, and so the test '[' matches. But as the long explanation above indicates, this is by no means a clear cut case, and more discussion is a good thing. Given that we need to retain some compatibility with shells of the past (and POSIX definitely wants that, and except where POSIX is stupid, we want POSIX compat) this is one of the issues that we want to work out what we should do. kre