Date: Mon, 25 Jun 2018 17:30:25 +0200 From: tlaro...@polynum.com Message-ID: <20180625153025.ga2...@polynum.com>
| About the POSIX description "2.13.1 Patterns Matching a Single | Character", have the draft assigned a precedence between "XBD RE Bracket | Expression" and the shell behavior, i.e. "XBD RE Bracket" | taking precedence over the shell behavior? There is some language, and more proposed to be added, which is designed to try and make the 3 user (regular expressions, sh glob, and fnmatch(3)) (which are all different...) be able to share the same specification, and be correct. As it is currently, it fails. Fixing it is still a work in progress, unless it becomes possible to convince the people who really count to simply split the specifiction, and describe each of them separately, then it is ging to be difficult to get it right, IMO. Do remember that the aim of POSIX is to specify what exists, and works at least in some systems (what works in NetBSD does not have all that much influence....) That is, POSIX is not a legislature, their job is not to tell us what we must do, but to document what works so the users can write portable code. Of course, they don't cater to every weirdness in every system, so sometimes something is spefied which is simply different than we're used to. Usually when that happens we just adapt - other systems do it the other way, we want to be as copatible as possible (less local patches) so ... and sometimes we can convince them thet they are specifying sub-optimal behaviour, and even if they can't simply specify the better way, they can at least allow it to co-exist. | Because one might interpret the reference to "XBD RE Bracket" as voiding | the quoting dance inside a bracket expression, It isn't intended to do that. The current new proposed text tries to fix it, but it is not right yet. | since in a RE the special characters loose their special meaning, They do in glob and fnmatch too - but different special characters appear, but there is no question but that [?] is a glob expression (RE too, but there it is not a surprise) that matches a '?' and nothing else just as [.] is an RE (and glob) expression that matches a period. Neither even match any character. | and one could argue, it seems, | that this is the case for the double quotes too? Yes, it would be, if you got the double quotes that far. It all gets very messy, as in something like ["${var}"] the quotes are not really literal characters, they're recognised by the lexer, and used to quote the characters in the expansion of ${var} ... POSIX actually specifies that the quotes remain as is, until they are removed later, but I think that's a bug - it is a way that appears nice to specify how quoting works, but no shell I'm aware of actually implements it like that, and it causes all kinds of weird (and incorrect) corner cases, like the one you mentioned. But if that were true, it would mean that var='[:alpha:]' case '"' in ["${var}"]) echo match;; esac would echo "match" and I doubt anyone expects (and nothing implements) that. Handling that kind of thing is why the first attempted fix simply specified that quote removal happen on case patterns. But that broke all kind of other things, so it became more complex to try and fix them ... But then that ignored dealing with things llike ls '*'* to list all files with names starting with an asterisk. The spec is quite clear that filename expansion happens before quote removal, so the quotes are still there when the pattern is matches against the filenames. Just as they are in the case above. In this case it is easy to specify, as the glob stuff that is not [ ] is quite different than REs, so it is all specified separately, and it can just say "unquoted" and stuff, and make it clear that the '*' just means a literal *. But with the way it is specified for [] using the RE spec for glob expressions, it all gets ugly. | "When pattern matching is | used where shell quote removal is not performed..." Yes, that is part of the attempt to make the same test work for all uses, it is not great... I really would not waste too much time on this - everyone agrees that what is in the currently publlished POSIX spec in this area is incorrect. The only debate relates to just what is the right way to fix it, which will actually describe the way things work (which actually, for something so badly specified, is fairly consistent amongst shells.) | the POSIX wording, | as you have already explained, should be adjusted to the de facto | uses and not the reverse, Yes, it should, and if we can work out how to do that in a way everyone was happy with, we would. Note that most of this is actually not all that hard (other than trying to merge it all into the RE description) - but when you get to things like var='\??' case "$x" in ${var}) ... it starts getting very messy indeed. If the ${var} there had been quoted (as it was in the example that started all this) it is all easy - the chars in the expansion are just literal chars, and that expression would (with the quotes which are not there above) match if x is equal to var (that is x='\??'). But as it is written, what it means is less clear, though the most common view is that the pattern ($var) matches a question mark followed by any other character (that is, the \ quotes the first ? and the second one is the meta-char). However, it is also clear that only \ works that way, if we had var='"?"?' then it would match a string starting with a double quote, then any single char, then another double quote, then any char. So something like "a"b On the other hand, if ghgiven literally case "$x" in "?"?) ... then it matches a 2 char sequence starting with a question mark. Sh syntax is just wonderful... (in the original sense, full of wonder, in that you stare at it, and wonder how did it get like that?). kre