Module Name: src Committed By: riastradh Date: Tue Aug 20 20:04:45 UTC 2024
Modified Files: src/lib/libc/locale: c16rtomb.3 c32rtomb.3 c8rtomb.3 Log Message: c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language. To generate a diff of this commit: cvs rdiff -u -r1.9 -r1.10 src/lib/libc/locale/c16rtomb.3 \ src/lib/libc/locale/c32rtomb.3 cvs rdiff -u -r1.7 -r1.8 src/lib/libc/locale/c8rtomb.3 Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Modified files: Index: src/lib/libc/locale/c16rtomb.3 diff -u src/lib/libc/locale/c16rtomb.3:1.9 src/lib/libc/locale/c16rtomb.3:1.10 --- src/lib/libc/locale/c16rtomb.3:1.9 Tue Aug 20 17:14:05 2024 +++ src/lib/libc/locale/c16rtomb.3 Tue Aug 20 20:04:45 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: c16rtomb.3,v 1.9 2024/08/20 17:14:05 riastradh Exp $ +.\" $NetBSD: c16rtomb.3,v 1.10 2024/08/20 20:04:45 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -50,8 +50,8 @@ The .Nm function decodes UTF-16 and converts it to multibyte characters in the -current locale, keeping state so it can restart after incremental -progress. +current locale, keeping state to remember incremental progress if +restarted. .Pp Each call to .Nm @@ -69,27 +69,6 @@ or .Li (size_t)-1 to denote error. .Pp -Over successive calls to -.Nm -with the same state -.Fa ps , -the sequence of -.Fa c16 -values must be a well-formed UTF-16 code unit sequence, or an -incomplete UTF-16 code unit sequence followed by null. -If -.Fa c16 , -when appended to the sequence of code units passed in previous calls, -is not null and does not form a well-formed UTF-16 code unit sequence, -then -.Nm -returns -.Li (size_t)-1 -with -.Xr errno 2 -set to -.Er EILSEQ . -.Pp If .Fa s is a null pointer, no output is stored, but the effects on @@ -98,12 +77,12 @@ and the return value are unchanged. .Pp If .Fa c16 -is null, +is zero, .Nm discards any pending incomplete UTF-16 code unit sequence in .Fa ps , outputs a (possibly empty) shift sequence to restore the initial state -followed by a null byte, and resets +followed by a NUL byte, and resets .Fa ps to the initial conversion state. .Pp @@ -117,13 +96,8 @@ object with static storage duration, dis .Vt mbstate_t objects .Po -including those used by -.Xr mbrtoc8 3 , -.Xr mbrtoc16 3 , -.Xr mbrtoc32 3 , -.Xr c8rtomb 3 , -and -.Xr c32rtomb 3 +including those used by other functions such as +.Xr mbrtoc16 3 .Pc , which is initialized at program startup to the initial conversion state. @@ -173,12 +147,12 @@ which is a constant upper bound on the l .Sh ERRORS .Bl -tag -width Bq .It Bq Er EILSEQ -The .Fa c16 -input sequence does not encode a Unicode scalar value in UTF-16. +is invalid as the next code unit in the conversion state +.Fa ps . .It Bq Er EILSEQ -The Unicode scalar value requested cannot be encoded as a multibyte -sequence in the current locale. +The input cannot be encoded as a multibyte sequence in the current +locale. .It Bq Er EIO An error occurred in loading the locale's character conversions. .El @@ -220,12 +194,13 @@ function first appeared in .Nx 11.0 . .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh BUGS -The standard requires that a null code unit unconditionally reset the -conversion state and output null: +The standard requires that passing zero as +.Fa c16 +unconditionally reset the conversion state and output a NUL byte: .Bd -filled -offset indent If -.Fa c8 -is a null character, a null byte is stored, preceded by any shift +.Fa c16 +is a null wide character, a null byte is stored, preceded by any shift sequence needed to restore the initial shift state; the resulting state described is the initial conversion state. .Ed @@ -233,7 +208,7 @@ described is the initial conversion stat However, some implementations such as .Fx 14.0 , .Ox 7.4 , -and glibc 2.36 ignore this clause and, if the null was preceded by an +and glibc 2.36 ignore this clause and, if the zero was preceded by an incomplete UTF-16 code unit sequence, fail with .Er EILSEQ instead. Index: src/lib/libc/locale/c32rtomb.3 diff -u src/lib/libc/locale/c32rtomb.3:1.9 src/lib/libc/locale/c32rtomb.3:1.10 --- src/lib/libc/locale/c32rtomb.3:1.9 Tue Aug 20 17:14:05 2024 +++ src/lib/libc/locale/c32rtomb.3 Tue Aug 20 20:04:45 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: c32rtomb.3,v 1.9 2024/08/20 17:14:05 riastradh Exp $ +.\" $NetBSD: c32rtomb.3,v 1.10 2024/08/20 20:04:45 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -50,8 +50,8 @@ The .Nm function converts Unicode scalar values to multibyte characters in the -current locale, keeping state so it can restart after incremental -progress. +current locale, keeping state to remember incremental progress if +restarted. .Pp Each call to .Nm @@ -71,9 +71,11 @@ to denote error. .Pp The input .Fa c32 -is a UTF-32 code unit, representing represents a Unicode scalar value, -i.e., a Unicode code point that is not a surrogate code point \(em in -other words, an integer either in [0,0xd7ff] or in [0xe000,0x10ffff]. +is a UTF-32 code unit, representing a Unicode scalar value, i.e., a +Unicode code point that is not a surrogate code point \(em in other +words, +.Fa c32 +is an integer either in [0,0xd7ff] or in [0xe000,0x10ffff]. .Pp If .Fa s @@ -83,10 +85,10 @@ and the return value are unchanged. .Pp If .Fa c32 -is null, +is zero, .Nm outputs a (possibly empty) shift sequence to restore the initial state -followed by a null byte and resets +followed by a NUL byte and resets .Fa ps to the initial conversion state. .Pp @@ -100,13 +102,8 @@ object with static storage duration, dis .Vt mbstate_t objects .Po -including those used by -.Xr mbrtoc8 3 , -.Xr mbrtoc16 3 , -.Xr mbrtoc32 3 , -.Xr c8rtomb 3 , -and -.Xr c16rtomb 3 +including those used by other functions such as +.Xr mbrtoc32 3 .Pc , which is initialized at program startup to the initial conversion state. @@ -147,6 +144,7 @@ if (len == (size_t)-1) assert(len <= sizeof(buf) - (s - buf)); printf("%s\en", buf); .Ed +.Pp To avoid a variable-length array, this code uses .Dv MB_LEN_MAX , which is a constant upper bound on the locale-dependent @@ -160,8 +158,8 @@ is not a Unicode scalar value, i.e., it the interval [0xd800,0xdfff] or it lies outside the Unicode codespace [0,0x10ffff] altogether. .It Bq Er EILSEQ -The Unicode scalar value requested cannot be encoded as a multibyte -sequence in the current locale. +The input cannot be encoded as a multibyte sequence in the current +locale. .It Bq Er EIO An error occurred in loading the locale's character conversions. .El Index: src/lib/libc/locale/c8rtomb.3 diff -u src/lib/libc/locale/c8rtomb.3:1.7 src/lib/libc/locale/c8rtomb.3:1.8 --- src/lib/libc/locale/c8rtomb.3:1.7 Tue Aug 20 17:14:05 2024 +++ src/lib/libc/locale/c8rtomb.3 Tue Aug 20 20:04:45 2024 @@ -1,4 +1,4 @@ -.\" $NetBSD: c8rtomb.3,v 1.7 2024/08/20 17:14:05 riastradh Exp $ +.\" $NetBSD: c8rtomb.3,v 1.8 2024/08/20 20:04:45 riastradh Exp $ .\" .\" Copyright (c) 2024 The NetBSD Foundation, Inc. .\" All rights reserved. @@ -50,8 +50,8 @@ The .Nm function decodes UTF-8 and converts it to multibyte characters in the -current locale, keeping state so it can restart after incremental -progress. +current locale, keeping state to remember incremental progress if +restarted. .Pp Each call to .Nm @@ -61,35 +61,14 @@ with a UTF-8 code unit .Fa c8 , writes up to .Dv MB_CUR_MAX -bytes to -.Fa s -(possibly none), and returns either the number of bytes written to +bytes (possibly none) to +.Fa s , +and returns either the number of bytes written to .Fa s or .Li (size_t)-1 to denote error. .Pp -Over successive calls to -.Nm -with the same state -.Fa ps , -the sequence of -.Fa c8 -values must be a well-formed UTF-8 code unit sequence, or an -incomplete UTF-8 code unit sequence followed by null. -If -.Fa c8 , -when appended to the sequence of code units passed in previous calls, -is not null and does not form a well-formed UTF-8 code unit sequence, -then -.Nm -returns -.Li (size_t)-1 -with -.Xr errno 2 -set to -.Er EILSEQ . -.Pp If .Fa s is a null pointer, no output is stored, but the effects on @@ -98,12 +77,12 @@ and the return value are unchanged. .Pp If .Fa c8 -is null, +is zero, .Nm discards any pending incomplete UTF-8 code unit sequence in .Fa ps , outputs a (possibly empty) shift sequence to restore the initial state -followed by a null byte, and resets +followed by a NUL byte, and resets .Fa ps to the initial conversion state. .Pp @@ -117,13 +96,8 @@ object with static storage duration, dis .Vt mbstate_t objects .Po -including those used by -.Xr mbrtoc8 3 , -.Xr mbrtoc16 3 , -.Xr mbrtoc32 3 , -.Xr c16rtomb 3 , -and -.Xr c32rtomb 3 +including those used by other functions such as +.Xr mbrtoc8 3 .Pc , which is initialized at program startup to the initial conversion state. @@ -173,12 +147,12 @@ which is a constant upper bound on the l .Sh ERRORS .Bl -tag -width Bq .It Bq Er EILSEQ -The .Fa c8 -input sequence does not encode a Unicode scalar value in UTF-8. +is invalid as the next code unit in the conversion state +.Fa ps . .It Bq Er EILSEQ -The Unicode scalar value cannot be encoded as a multibyte sequence in -the current locale. +The input cannot be encoded as a multibyte sequence in the current +locale. .It Bq Er EIO An error occurred in loading the locale's character conversions. .El @@ -220,8 +194,9 @@ function first appeared in .Nx 11.0 . .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .Sh CAVEATS -The standard requires that a null code unit unconditionally reset the -conversion state and output null: +The standard requires that passing zero as +.Fa c8 +unconditionally reset the conversion state and output a NUL byte: .Bd -filled -offset indent If .Fa c8 @@ -231,7 +206,7 @@ described is the initial conversion stat .Ed .Pp However, some implementations such as glibc 2.36 ignore this clause -and, if the null was preceded by an incomplete UTF-8 code unit +and, if the zero was preceded by a nonempty incomplete UTF-8 code unit sequence, fail with .Er EILSEQ instead.