Re: From wchar_t to char32_t

2023-07-11 Thread Paul Eggert
On 7/11/23 15:32, Bruno Haible wrote: You are looking at GB18030. GB18030 and BIG5-HKSCS are completely unrelated. Ouch! Thanks for explaining.

mbsrtoc32s, mbsnrtoc32s: Small optimization

2023-07-11 Thread Bruno Haible
Another small optimization of the same kind: 2023-07-11 Bruno Haible mbsrtoc32s, mbsnrtoc32s: Small optimization. * lib/mbsrtoc32s.c (USES_C32): Set to 0 when the module 'mbrtoc32-regular' is in use. * lib/mbsnrtoc32s.c (USES_C32): Likewise. diff --git a/lib/m

Re: From wchar_t to char32_t

2023-07-11 Thread Bruno Haible
Paul Eggert wrote: > >* The locale encoding is BIG5-HKSCS, e.g. on a glibc system the > > zh_HK.BIG5-HKSCS the locale. > > > >* The input is one of the 4 characters in that encoding that map to > > a sequence of two Unicode characters: > > > > input maps to > >

Re: mbcel module for Gnulib?

2023-07-11 Thread Bruno Haible
[Removing diffutils-devel from CC.] Paul Eggert wrote: > However, mbiter's generality had a performance penalty. > > Some of the performance penalty is due to Gnulib's mbrtoc32 module > replacing mbrtoc32 on glibc. As I understand it, this is due to glibc's > mishandling of the C locale (it tre

Re: From wchar_t to char32_t

2023-07-11 Thread Paul Eggert
On 7/2/23 13:18, Bruno Haible wrote: Paul Eggert wrote: When can we get (size_t) -3 in a real-world system? It can/could occur if all of the following conditions are met: * The locale encoding is BIG5-HKSCS, e.g. on a glibc system the zh_HK.BIG5-HKSCS the locale. * The input is on

mbiter, mbuiter, mbswidth: Add benchmarks

2023-07-11 Thread Bruno Haible
These patches add benchmarks for the mbiter, mbuiter, mbswidth modules. To use them: $ ./gnulib-tool --create-testdir --dir=../testdir1 --single-configure --symlink \ mbswidth-bench-tests mbiter-bench-tests mbuiter-bench-tests $ cd ../testdir1 $ ./configure $ make $ valgrind --too

tests: Create new file bench.h

2023-07-11 Thread Bruno Haible
This patch moves generic benchmarking code into a separate file, to make it easy to reuse it in new benchmarks. 2023-07-11 Bruno Haible tests: Create new file bench.h. * tests/bench.h: New file, extracted from tests/bench-digest.h. * tests/bench-digest.h: Include it.

Re: argp test failure: test-argp-2.sh

2023-07-11 Thread Bruno Haible
Andrew Schulman wrote: > So hooray for automated testing! It found a problem. And that wasn't the only bug that this unit test found. A bigger one was reported in [1] and fixed in [2]. Bruno [1] https://lists.gnu.org/archive/html/bug-gnulib/2020-03/msg00085.html [2] https://lists.gnu.org/archive

Re: new module mbrtoc32-regular

2023-07-11 Thread Bruno Haible
Paul Eggert wrote: > > DEFINITION: We call an mbrtoc32 function_regular_ if > >- It never returns (size_t)-3. > >- When it returns < (size_t)-2, the mbstate_t is in the initial state. > > "the initial state" -> "an initial state". But even with that change > isn't the second part of this

Fix build errors on Linux/hppa

2023-07-11 Thread Bruno Haible
On a Linux/hppa machine (with Debian 12), I get build errors due to make: /bin/bash: Argument list too long errors; the the attached log file. On this platform, the maximum command line length appears to be around 3056 bytes. With the recipes from [1], I get: $ getconf ARG_MAX 2097152 $ /bin/ec

[PATCH] quotearg: update Solaris-related comments

2023-07-11 Thread Paul Eggert
* doc/solaris-versions: Modernize. * lib/quotearg.c: Update comments. --- ChangeLog| 6 ++ doc/solaris-versions | 24 +--- lib/quotearg.c | 7 ++- 3 files changed, 21 insertions(+), 16 deletions(-) diff --git a/ChangeLog b/ChangeLog index 41fa1a19b0

Re: argp test failure: test-argp-2.sh

2023-07-11 Thread Andrew Schulman
> Hi, > > Andrew Schulman wrote: > > I'm building argp for Cygwin. The build succeeds, and all of the tests > > pass except for one, test-argp-2.sh. > > For me, on Cygwin, this test passes. > > > The contents of test-suite.log are below. > > ... > > The error is pretty trivial, and I think it's

Re: new module mbrtoc32-regular

2023-07-11 Thread Paul Eggert
On 2023-07-10 15:45, Bruno Haible wrote: DEFINITION: We call an mbrtoc32 function_regular_ if - It never returns (size_t)-3. - When it returns < (size_t)-2, the mbstate_t is in the initial state. "the initial state" -> "an initial state". But even with that change isn't the second part

[PATCH] More than one initial mbstate_t

2023-07-11 Thread Paul Eggert
In commentary, say "an initial state" rather than "the initial state" for mbstate_t, as it is possible and indeed common for there to be more than one initial state. POSIX routinely says "an initial state". --- lib/c32srtombs-state.c| 2 +- lib/mbfile.h | 2 +- lib/mbiter

Re: From wchar_t to char32_t

2023-07-11 Thread Paul Eggert
On 2023-07-11 01:24, Bruno Haible wrote: Based on the comments in gnulib/lib/mbrtoc16.c, I think it should better clear the first 24, not 12, bytes of the struct. Otherwise it can be in a state where mbsinit() returns true but the mbrto* functions have undefined behaviour. For mbcel all all tha

Re: From wchar_t to char32_t

2023-07-11 Thread Bruno Haible
Paul Eggert wrote: > We can improve on that. I installed the attached two performance tweaks; > the second tweak cuts that initialization from 128 down to at most 12 > bytes on those platforms. Based on the comments in gnulib/lib/mbrtoc16.c, I think it should better clear the first 24, not 12, b

Re: From wchar_t to char32_t

2023-07-11 Thread Paul Eggert
On 2023-07-10 07:58, Bruno Haible wrote: - The rationale for defining and initializing the mbstate_t at the function scope was that on BSD and macOS systems, an mbstate_t is 128 bytes large, We can improve on that. I installed the attached two performance tweaks; the second tweak cuts