Chet Ramey (<chet.ra...@case.edu>) wrote: > On 11/23/18 6:09 PM, Bize Ma wrote: > > > Bash Version: 4.4 > > Patch Level: 12 > > Release Status: release >
> > Description: > > > > Bash is removing characters not explicitly listed in a bracket > > expression (character range). > > In this example, it is removing digits from other languages. > > What is your locale? > > The locale used was en_US.utf-8 but also happens with 459 locales out of 868 available under Debian (not in C, for example). Also in all locales affected (except one), setting either LC_ALL=$loc or LC_COLLATE=$loc did the same. Except in zh_CN.gb18030 But IMO locale collation should not be used for an explicit list. I have been made aware that there is a cstart = cend = FOLD (cstart); inside the `sm_loop.c` file that will convert into a range many individual character. If that understanding is correct that is the source of the difference with other shells. I have the perception that a collation table *must have a "total order"*, in fact, an strict total order. If two characters `a` and `b` could sort as equal the order will fail to provide a confirmation that a character is absent from the list. Consider characters `a`, `b` and `c`, if a and b sort as equal, a sorted list in which we find `a` followed by `c` doesn't confirm that `b` is absent as the order could well be `b a c`. In this case, there must not be any other character than `a` in the range `a-a` and using a range `a-a` is equivalent (just slower and more complex) to the single character `a`. If this is not the case, the error is in the collation table, not in using single (faster) characters. And what should be updated is such collation table IMO.