On Wed, Jul 26, 2023 at 07:17:17PM +0200, tlaro...@polynum.com wrote: > On Wed, Jul 26, 2023 at 06:32:15PM +0200, Martin Husemann wrote: > > On Wed, Jul 26, 2023 at 12:19:39PM -0400, Mouse wrote: > > > > $ export LC_CTYPE=fr_FR.ISO8859-15 > > > > > > > $ echo "éé" | sed 's/é/\é/g' > > > > sed: 1: "s/é/\é/g": RE error: trailing backslash (\) > > > > > > I agree that's broken. > > > > > > > Since, to my knowledge, we do not support anything via iconv or > > > > whatever, shouldn't we assume simply a string of bytes \`a la C, that > > > > is: > > > > > > Seems to me there's a deeper problem. Even if something like iconv > > > _were_ available, fr_FR.ISO8859-15 is a single-octet character set, so > > > > > > > - (void) setlocale(LC_ALL, ""); > > > > + (void) setlocale(LC_ALL, "POSIX"); > > > > > > should, it seems to me, make no difference. Am I misunderstanding? > > > > Indeed - and it only does on architectures where char == signed char: > > Very good catch, indeed. > > And this is a regression vs 9.3 and I suspect the main difference is the > setlocale(3)---that allows not to solve, but to circumvent a more deeper > problem. > > PR sent as bin/57544
RVP has spotted the culprit (for this one; the whole code would need a review for a similar problem in other uses and with the interaction with the locales). The amended diff, more explanations (and caveats) have been put in bin/57544 and the correct behavior verified by compiling the libc with this diff and compiling statically sed(1) against this amended libc. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ http://kertex.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C