Date: Tue, 31 Jan 2023 02:29:02 +0000 From: Taylor R Campbell <campbell+netbsd-tech-userle...@netbsd.org> Message-ID: <20230131022904.a3a4b60...@jupiter.mumble.net>
| In sh(1) pathname expansion, the pattern doesn't constrain ordering, | only inclusion criteria. It appears that NetBSD's sh(1) always sorts | lexicographically with strcmp(3), however. | | 1. Does POSIX sh guarantee strcmp(3) lexicographic ordering? What POSIX says is (from XCU 2.13.3): those filenames and pathnames, sorted according to the collating sequence in effect in the current locale. If this collating sequence does not have a total ordering of all characters (see XBD Section 7.3.2, on page 127), any filenames or pathnames that collate equally shall be further compared byte-by-byte using the collating sequence for the POSIX locale. So, "no". That quote is from a not publicly released draft of the next version (now expected to be 2024 I believe, but the time everything is done) so the section number (perhaps) and page number (certainly) will not match anything. The reference in the quote is to the section which defines LC_COLLATE. "XBD" is the "Basic Definitions" section of the standard (which contains definitions of all kinds of things, plus specs of all required header files). I think the actual section which contains that quote is new in the draft, lots of text relating to pattern matching has been revised. XCU is "Commands and Utilities", XCU 2 is the shell. | 2. Does NetBSD sh(1) guarantee strcmp(3) lexicographic ordering? That's what it does, because sh(1) (along with many NetBSD utilities) really knows nothing about locales. If anyone would like to work on that, feel free - but note that it is a minefield of contradictions, almost nothing about the way that the charset parts of locales are defined makes much sense at all, other than as a way to allow users to select a (single) character encoding and use that for everything. The Plan 9 solution was much better. | 3. Should the sh(1) man page be amended to specify the order? Probably not, because it isn't guaranteed not to change. That is, unless you're using the C (aka POSIX) locale (which is what you get when you're not explicitly using anything) - as for that locale, we are doing what is required. kre