Hi Brian, On Feb 13 20:37, Corinna Vinschen via Cygwin wrote: > On Feb 13 12:03, Brian Inglis via Cygwin wrote: > > On 2023-02-13 10:43, ASSI via Cygwin wrote: > > > Corinna Vinschen via Cygwin writes: > > > > Can you give me an example? I'm a bit puzzled because fnmatch as well > > > > as glob in Cygwin support native characters. > > > > But not locale dependent named character classes like regexp in paths. > > I checked the dash code of curent dash git, and while its internal glob > implementation supports character classes, they are no localized, using > standard singlebyte functions isalnum, isalpha, etc. under the hood. > > So, yeah, what you say further down this mail... looks like dash > supports locale dependent character classes only with glibc. > [...] > Either way, I don't care much for what a certain application provides by > itself. I'm talking about our libc, that is Cygwin, and what it > provides to processes calling its implementations of regcomp/regexec, > glob and fnmatch. > > All these functions have been taken from FreeBSD and all three suffer > shortcomings: > > - regcomp/regexec supports POSIX named character classes, collating > symbols, and equivalence class expressions, but all of them only work > for ASCII chars. > > - fnmatch and glob support neither of named character classes, > collating symbols, and equivalence class expressions. > > I checked the upstream code in FreeBSD, OpenBSD and NetBSD and none of > these functions are improved to support locales (regcomp) or any of > the character classes stuff (fnmatch/glob). > > So, if we want to add this support to Cygwin (and thus, to all > applications calling the libc implementation of these functions), > quite a bit of work is required. > > Being able to fetch the implementation from some other source > would reduce the effort enourmously :}
I took the liberty to add [:<class>:] support to Cygwin's fnmatch(3) and glob(3) functions. They also recognize collating symbols [.<coll.] and equivalence class expressions [=<equiv>=]. But the latter two are not implemented yet and fnmatch/glob simply skip them in the pattern. Given that glob and fnmatch use wide characters internally, the support for character classes is internationalized by default, albeit in a slightly differentt way than in glibc. The classes a unicode character belongs to is not locale dependent in Cygwin/newlib. All characters have their classes assigned all the time, so, for instance, the german character 'รค' is lower and alpha even in the en_US.utf8 locale. The currently building cygwin test release 3.5.0-0.174.gd6d4436145b8 contains the new code. Would you mind to build a dash for testing so we can see if and how it works? Thanks, Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple