Module Name:    src
Committed By:   riastradh
Date:           Fri Aug 16 23:12:17 UTC 2024

Modified Files:
        src/lib/libc/locale: mbrtoc8.3

Log Message:
mbrtoc8(3): Work on deturgidifying prose.

PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb


To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/lib/libc/locale/mbrtoc8.3

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/lib/libc/locale/mbrtoc8.3
diff -u src/lib/libc/locale/mbrtoc8.3:1.3 src/lib/libc/locale/mbrtoc8.3:1.4
--- src/lib/libc/locale/mbrtoc8.3:1.3	Fri Aug 16 19:31:48 2024
+++ src/lib/libc/locale/mbrtoc8.3	Fri Aug 16 23:12:17 2024
@@ -1,4 +1,4 @@
-.\"	$NetBSD: mbrtoc8.3,v 1.3 2024/08/16 19:31:48 riastradh Exp $
+.\"	$NetBSD: mbrtoc8.3,v 1.4 2024/08/16 23:12:17 riastradh Exp $
 .\"
 .\" Copyright (c) 2024 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -30,7 +30,7 @@
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh NAME
 .Nm mbrtoc8
-.Nd Restartable multibyte to UTF-8 code unit conversion
+.Nd Restartable multibyte to UTF-8 conversion
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh LIBRARY
 .Lb libc
@@ -50,20 +50,37 @@
 .Sh DESCRIPTION
 The
 .Nm
-function attempts to decode a multibyte character sequence at
-.Fa s
-of up to
+decodes multibyte characters in the current locale and converts them to
+UTF-8, keeping state so it can restart after incremental progress.
+.Pp
+Each call to
+.Nm :
+.Bl -enum -compact
+.It
+examines up to
 .Fa n
-bytes in the current locale, and yield the content as UTF-8 code
-units via the output parameter
-.Fa pc8 .
-.Fa pc8
-may be null, in which case no output is stored.
+bytes starting at
+.Fa s ,
+.It
+yields a UTF-8 code unit if available by storing it at
+.Li * Ns Fa pc8 ,
+.It
+saves state at
+.Fa ps ,
+and
+.It
+returns either the number of bytes consumed if any or a special return
+value.
+.El
+.Pp
+Specifically:
 .Bl -bullet
 .It
 If the multibyte sequence at
 .Fa s
-is invalid or an error occurs in decoding,
+is invalid after any previous input saved at
+.Fa ps ,
+or if an error occurs in decoding,
 .Nm
 returns
 .Li (size_t)-1
@@ -75,7 +92,7 @@ If the multibyte sequence at
 .Fa s
 is still incomplete after
 .Fa n
-bytes, including any previously processed input saved in
+bytes, including any previous input saved in
 .Fa ps ,
 .Nm
 saves its state in
@@ -85,53 +102,33 @@ after all the input so far and returns
 .It
 If
 .Nm
-finds the null scalar value at
-.Fa s ,
-then it stores zero at
+had previously decoded a multibyte character but has not yet yielded
+all the code units of its UTF-8 encoding, it stores the next UTF-8 code
+unit at
 .Li * Ns Fa pc8
-and returns zero.
+and returns
+.Li "(size_t)-3" .
 .It
 If
 .Nm
-finds a nonnull scalar value in the US-ASCII range, i.e., a 7-bit
-scalar value, then it stores the scalar value at
-.Li * Ns Fa pc8 ,
-and returns the number of bytes it read from the input.
+decodes the null multibyte character, then it stores zero at
+.Li * Ns Fa pc8
+and returns zero.
 .It
-If
+Otherwise,
 .Nm
-finds a scalar value outside the US-ASCII range, it:
-.Bl -dash -compact
-.It
-stores the leading byte in the scalar value's UTF-8 encoding at
-.Li * Ns Fa pc8 ;
-.It
-stores conversion state in
-.Fa ps
-to remember the rest of the pending scalar value; and
-.It
-returns the number of bytes it read from the input.
+decodes a single multibyte character, stores the first (and possibly
+only) code unit in its UTF-8 encoding at
+.Li * Ns Fa pc8 ,
+and returns the number of bytes consumed to decode the first multibyte
+character.
 .El
-.It
+.Pp
 If
-.Nm
-had previously found a scalar value outside the US-ASCII range, then,
-instead of any of the above options, it:
-.Bl -dash -compact
-.It
-stores the next byte in the scalar value's UTF-8 encoding at
-.Li * Ns Fa pc8 ;
-.It
-updates the conversion state in
+.Fa pc8
+is a null pointer, nothing is stored, but the effects on
 .Fa ps
-to consume this byte; and
-.It
-returns
-.Li (size_t)-3
-to indicate that no bytes were consumed but a code unit was yielded
-nevertheless.
-.El
-.El
+and the return value are unchanged.
 .Pp
 If
 .Fa s
@@ -174,6 +171,14 @@ and
 which is initialized at program startup to the initial conversion
 state.
 .\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+.Sh IMPLEMENTATION NOTES
+On well-formed input, the
+.Nm
+function yields either a Unicode scalar value in US-ASCII range, i.e.,
+a 7-bit Unicode code point, or, over two to four successive calls, the
+leading and trailing code units in order of the UTF-8 encoding of a
+Unicode scalar value outside the US-ASCII range.
+.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 .Sh RETURN VALUES
 The
 .Nm
@@ -197,26 +202,21 @@ if
 consumed
 .Ar i
 bytes of input to decode the next multibyte character, yielding a
-(nonnull) UTF-8 code unit, either a Unicode scalar value in the
-US-ASCII range or a leading byte in the UTF-8 encoding of a scalar
-value.
+UTF-8 code unit.
 .It Li (size_t)-3
 .Bq continuation
 if
 .Nm
-consumed no bytes of input but yielded a (nonnull) UTF-8 code unit, the
-next trailing byte in the UTF-8 encoding of a Unicode scalar value
-previously decoded by
-.Nm
-with
-.Fa ps .
+consumed no new bytes of input but yielded a UTF-8 code unit that was
+pending from previous input.
 .It Li (size_t)-2
 .Bq incomplete
 if
 .Nm
-found an incomplete multibyte character after all
+found only an incomplete multibyte sequence after all
 .Fa n
-bytes of input, and saved its state to restart in the next call with
+bytes of input and any previous input, and saved its state to restart
+in the next call with
 .Fa ps .
 .It Li (size_t)-1
 .Bq error
@@ -262,7 +262,8 @@ while (n) {
 .Sh ERRORS
 .Bl -tag -width Bq
 .It Bq Er EILSEQ
-The multibyte sequence cannot be decoded as a Unicode scalar value.
+The multibyte sequence cannot be decoded in the current locale as a
+Unicode scalar value.
 .It Bq Er EIO
 An error occurred in loading the locale's character conversions.
 .El

Reply via email to