locale

Taylor R Campbell Tue, 20 Aug 2024 13:04:52 -0700

Module Name:    src
Committed By:   riastradh
Date:           Tue Aug 20 20:04:45 UTC 2024


Modified Files:
        src/lib/libc/locale: c16rtomb.3 c32rtomb.3 c8rtomb.3

Log Message:
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.


To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 src/lib/libc/locale/c16rtomb.3 \
    src/lib/libc/locale/c32rtomb.3
cvs rdiff -u -r1.7 -r1.8 src/lib/libc/locale/c8rtomb.3

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/lib/libc/locale/c16rtomb.3
diff -u src/lib/libc/locale/c16rtomb.3:1.9 src/lib/libc/locale/c16rtomb.3:1.10
--- src/lib/libc/locale/c16rtomb.3:1.9	Tue Aug 20 17:14:05 2024
+++ src/lib/libc/locale/c16rtomb.3	Tue Aug 20 20:04:45 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: c16rtomb.3,v 1.9 2024/08/20 17:14:05 riastradh Exp $
+.\"	$NetBSD: c16rtomb.3,v 1.10 2024/08/20 20:04:45 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -50,8 +50,8 @@
 The
 .Nm
 function decodes UTF-16 and converts it to multibyte characters in the
-current locale, keeping state so it can restart after incremental
-progress.
+current locale, keeping state to remember incremental progress if
+restarted.
 .Pp
 Each call to
 .Nm
@@ -69,27 +69,6 @@ or
 .Li (size_t)-1
 to denote error.
 .Pp
-Over successive calls to
-.Nm
-with the same state
-.Fa ps ,
-the sequence of
-.Fa c16
-values must be a well-formed UTF-16 code unit sequence, or an
-incomplete UTF-16 code unit sequence followed by null.
-If
-.Fa c16 ,
-when appended to the sequence of code units passed in previous calls,
-is not null and does not form a well-formed UTF-16 code unit sequence,
-then
-.Nm
-returns
-.Li (size_t)-1
-with
-.Xr errno 2
-set to
-.Er EILSEQ .
-.Pp
 If
 .Fa s
 is a null pointer, no output is stored, but the effects on
@@ -98,12 +77,12 @@ and the return value are unchanged.
 .Pp
 If
 .Fa c16
-is null,
+is zero,
 .Nm
 discards any pending incomplete UTF-16 code unit sequence in
 .Fa ps ,
 outputs a (possibly empty) shift sequence to restore the initial state
-followed by a null byte, and resets
+followed by a NUL byte, and resets
 .Fa ps
 to the initial conversion state.
 .Pp
@@ -117,13 +96,8 @@ object with static storage duration, dis
 .Vt mbstate_t
 objects
 .Po
-including those used by
-.Xr mbrtoc8 3 ,
-.Xr mbrtoc16 3 ,
-.Xr mbrtoc32 3 ,
-.Xr c8rtomb 3 ,
-and
-.Xr c32rtomb 3
+including those used by other functions such as
+.Xr mbrtoc16 3
 .Pc ,
 which is initialized at program startup to the initial conversion
 state.
@@ -173,12 +147,12 @@ which is a constant upper bound on the l
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-The
 .Fa c16
-input sequence does not encode a Unicode scalar value in UTF-16.
+is invalid as the next code unit in the conversion state
+.Fa ps .
 .It Bq Er EILSEQ
-The Unicode scalar value requested cannot be encoded as a multibyte
-sequence in the current locale.
+The input cannot be encoded as a multibyte sequence in the current
+locale.
 .It Bq Er EIO
 An error occurred in loading the locale's character conversions.
 .El
@@ -220,12 +194,13 @@ function first appeared in
 .Nx 11.0 .
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh BUGS
-The standard requires that a null code unit unconditionally reset the
-conversion state and output null:
+The standard requires that passing zero as
+.Fa c16
+unconditionally reset the conversion state and output a NUL byte:
 .Bd -filled -offset indent
 If
-.Fa c8
-is a null character, a null byte is stored, preceded by any shift
+.Fa c16
+is a null wide character, a null byte is stored, preceded by any shift
 sequence needed to restore the initial shift state; the resulting state
 described is the initial conversion state.
 .Ed
@@ -233,7 +208,7 @@ described is the initial conversion stat
 However, some implementations such as
 .Fx 14.0 ,
 .Ox 7.4 ,
-and glibc 2.36 ignore this clause and, if the null was preceded by an
+and glibc 2.36 ignore this clause and, if the zero was preceded by an
 incomplete UTF-16 code unit sequence, fail with
 .Er EILSEQ
 instead.
Index: src/lib/libc/locale/c32rtomb.3
diff -u src/lib/libc/locale/c32rtomb.3:1.9 src/lib/libc/locale/c32rtomb.3:1.10
--- src/lib/libc/locale/c32rtomb.3:1.9	Tue Aug 20 17:14:05 2024
+++ src/lib/libc/locale/c32rtomb.3	Tue Aug 20 20:04:45 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: c32rtomb.3,v 1.9 2024/08/20 17:14:05 riastradh Exp $
+.\"	$NetBSD: c32rtomb.3,v 1.10 2024/08/20 20:04:45 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -50,8 +50,8 @@
 The
 .Nm
 function converts Unicode scalar values to multibyte characters in the
-current locale, keeping state so it can restart after incremental
-progress.
+current locale, keeping state to remember incremental progress if
+restarted.
 .Pp
 Each call to
 .Nm
@@ -71,9 +71,11 @@ to denote error.
 .Pp
 The input
 .Fa c32
-is a UTF-32 code unit, representing represents a Unicode scalar value,
-i.e., a Unicode code point that is not a surrogate code point \(em in
-other words, an integer either in [0,0xd7ff] or in [0xe000,0x10ffff].
+is a UTF-32 code unit, representing a Unicode scalar value, i.e., a
+Unicode code point that is not a surrogate code point \(em in other
+words,
+.Fa c32
+is an integer either in [0,0xd7ff] or in [0xe000,0x10ffff].
 .Pp
 If
 .Fa s
@@ -83,10 +85,10 @@ and the return value are unchanged.
 .Pp
 If
 .Fa c32
-is null,
+is zero,
 .Nm
 outputs a (possibly empty) shift sequence to restore the initial state
-followed by a null byte and resets
+followed by a NUL byte and resets
 .Fa ps
 to the initial conversion state.
 .Pp
@@ -100,13 +102,8 @@ object with static storage duration, dis
 .Vt mbstate_t
 objects
 .Po
-including those used by
-.Xr mbrtoc8 3 ,
-.Xr mbrtoc16 3 ,
-.Xr mbrtoc32 3 ,
-.Xr c8rtomb 3 ,
-and
-.Xr c16rtomb 3
+including those used by other functions such as
+.Xr mbrtoc32 3
 .Pc ,
 which is initialized at program startup to the initial conversion
 state.
@@ -147,6 +144,7 @@ if (len == (size_t)-1)
 assert(len <= sizeof(buf) - (s - buf));
 printf("%s\en", buf);
 .Ed
+.Pp
 To avoid a variable-length array, this code uses
 .Dv MB_LEN_MAX ,
 which is a constant upper bound on the locale-dependent
@@ -160,8 +158,8 @@ is not a Unicode scalar value, i.e., it 
 the interval [0xd800,0xdfff] or it lies outside the Unicode codespace
 [0,0x10ffff] altogether.
 .It Bq Er EILSEQ
-The Unicode scalar value requested cannot be encoded as a multibyte
-sequence in the current locale.
+The input cannot be encoded as a multibyte sequence in the current
+locale.
 .It Bq Er EIO
 An error occurred in loading the locale's character conversions.
 .El

Index: src/lib/libc/locale/c8rtomb.3
diff -u src/lib/libc/locale/c8rtomb.3:1.7 src/lib/libc/locale/c8rtomb.3:1.8
--- src/lib/libc/locale/c8rtomb.3:1.7	Tue Aug 20 17:14:05 2024
+++ src/lib/libc/locale/c8rtomb.3	Tue Aug 20 20:04:45 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: c8rtomb.3,v 1.7 2024/08/20 17:14:05 riastradh Exp $
+.\"	$NetBSD: c8rtomb.3,v 1.8 2024/08/20 20:04:45 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -50,8 +50,8 @@
 The
 .Nm
 function decodes UTF-8 and converts it to multibyte characters in the
-current locale, keeping state so it can restart after incremental
-progress.
+current locale, keeping state to remember incremental progress if
+restarted.
 .Pp
 Each call to
 .Nm
@@ -61,35 +61,14 @@ with a UTF-8 code unit
 .Fa c8 ,
 writes up to
 .Dv MB_CUR_MAX
-bytes to
-.Fa s
-(possibly none), and returns either the number of bytes written to
+bytes (possibly none) to
+.Fa s ,
+and returns either the number of bytes written to
 .Fa s
 or
 .Li (size_t)-1
 to denote error.
 .Pp
-Over successive calls to
-.Nm
-with the same state
-.Fa ps ,
-the sequence of
-.Fa c8
-values must be a well-formed UTF-8 code unit sequence, or an
-incomplete UTF-8 code unit sequence followed by null.
-If
-.Fa c8 ,
-when appended to the sequence of code units passed in previous calls,
-is not null and does not form a well-formed UTF-8 code unit sequence,
-then
-.Nm
-returns
-.Li (size_t)-1
-with
-.Xr errno 2
-set to
-.Er EILSEQ .
-.Pp
 If
 .Fa s
 is a null pointer, no output is stored, but the effects on
@@ -98,12 +77,12 @@ and the return value are unchanged.
 .Pp
 If
 .Fa c8
-is null,
+is zero,
 .Nm
 discards any pending incomplete UTF-8 code unit sequence in
 .Fa ps ,
 outputs a (possibly empty) shift sequence to restore the initial state
-followed by a null byte, and resets
+followed by a NUL byte, and resets
 .Fa ps
 to the initial conversion state.
 .Pp
@@ -117,13 +96,8 @@ object with static storage duration, dis
 .Vt mbstate_t
 objects
 .Po
-including those used by
-.Xr mbrtoc8 3 ,
-.Xr mbrtoc16 3 ,
-.Xr mbrtoc32 3 ,
-.Xr c16rtomb 3 ,
-and
-.Xr c32rtomb 3
+including those used by other functions such as
+.Xr mbrtoc8 3
 .Pc ,
 which is initialized at program startup to the initial conversion
 state.
@@ -173,12 +147,12 @@ which is a constant upper bound on the l
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-The
 .Fa c8
-input sequence does not encode a Unicode scalar value in UTF-8.
+is invalid as the next code unit in the conversion state
+.Fa ps .
 .It Bq Er EILSEQ
-The Unicode scalar value cannot be encoded as a multibyte sequence in
-the current locale.
+The input cannot be encoded as a multibyte sequence in the current
+locale.
 .It Bq Er EIO
 An error occurred in loading the locale's character conversions.
 .El
@@ -220,8 +194,9 @@ function first appeared in
 .Nx 11.0 .
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh CAVEATS
-The standard requires that a null code unit unconditionally reset the
-conversion state and output null:
+The standard requires that passing zero as
+.Fa c8
+unconditionally reset the conversion state and output a NUL byte:
 .Bd -filled -offset indent
 If
 .Fa c8
@@ -231,7 +206,7 @@ described is the initial conversion stat
 .Ed
 .Pp
 However, some implementations such as glibc 2.36 ignore this clause
-and, if the null was preceded by an incomplete UTF-8 code unit
+and, if the zero was preceded by a nonempty incomplete UTF-8 code unit
 sequence, fail with
 .Er EILSEQ
 instead.

CVS commit: src/lib/libc/locale

Reply via email to