rewrite printf(3) manual page

Ingo Schwarze Wed, 08 Jul 2020 19:38:17 -0700

Hi,

the following is the execution of an idea brought up by deraadt@.


There are four problems with the printf(3) manual page:

 1. While some modifiers (like "argno$" and "width") do the same
    for all conversion specifiers, the effects of some other
    modifiers varies wildly.  For example, "precision" does rather
    different things for %d, %s, %e, %f, and %g.  Some modifiers
    do not apply to all specifiers in the first place.  Yet, all
    modifiers are explained in one place, as if each one did only
    one consistent job.  This leads to a jungle of confusing
    forward references.

 2. All conversion specifiers have their own syntax, but it is
    not clearly presented for any of them.

 3. It isn't properly explained what %lc and %ls do, and in particular
    the fact that their operation is locale-dependent isn't made
    explicit.

 4. Part of the RETURN VALUES section is outright wrong.  These
    functions do not return numbers of characters printed but numbers
    of *bytes* printed.

The following patch attempts to solve these issues by rewriting
most of the page in a substantially different style.  deraadt@ and
millert@ have already said that they like the general direction and
jmc@ is also fine with it.  But given that this API probably is
among the most widely used ones and that my changes are quite large
and radical, i consider it fair to show it to a wider audience
before commit.  Both general feedback and scrutiny of the details
are now welcome, to make sure that i didn't introduce any errors.

During my own checks of the accuracy of these descriptions, i already
committed a collection of new unit tests to
/usr/src/regress/lib/libc/printf/.

Note that our current version of this manual page is very close to
what is in the POSIX standard, and this patch moves the wording and
structure quite far away from it (but not the content, of course).
I think that is OK.  The audience of a standard is language lawyers,
it is optimized to support making money from disputes whether some
implementation is right or wrong.  It is not necessarily optimized
to be easy to understand, nor to avoid users getting confused and
consequently misusing the API.  For that reason, most of our manual
pages are worded quite differently than the standard.

Also note that there are a few other pages with similar content,
for example wprintf(3) and printf(1).  But this patch is large
enough as it stands, so i'm not going to change other pages in the
same commit.  Once this is in and the dust has settled, it becomes
much easier to decide what to do with the others.

Finally, note that the first few paragraphs probably also allow
improvements.  But this patch deliberately focusses on one topic,
the description of the syntax and semantics of conversion specifications.
Other polishing can be done another day.

OK?
  Ingo


Index: printf.3
===================================================================
RCS file: /cvs/src/lib/libc/stdio/printf.3,v
retrieving revision 1.85
diff -u -r1.85 printf.3
--- printf.3    6 Jul 2020 17:24:59 -0000       1.85
+++ printf.3    8 Jul 2020 18:49:15 -0000
@@ -159,487 +159,580 @@
 which are copied unchanged to the output stream,
 and conversion specifications, each of which results
 in fetching zero or more subsequent arguments.
-Each conversion specification is introduced by the character
-.Cm % .
 The arguments must correspond properly (after type promotion)
-with the conversion specifier.
-After the
-.Cm % ,
-the following appear in sequence:
-.Bl -bullet
-.It
-An optional field, consisting of a decimal digit string followed by a
-.Cm $
-specifying the next argument to access.
-If this field is not provided, the argument following the last
-argument accessed will be used.
-Arguments are numbered starting at
-.Cm 1 .
-.It
-Zero or more of the following flags:
-.Bl -hyphen
-.It
-A hash
-.Sq Cm #
-character
-specifying that the value should be converted to an
-.Dq alternate form .
-For
-.Cm o
-conversions, the precision of the number is increased to force the first
-character of the output string to a zero (except if a zero value is printed
-with an explicit precision of zero).
-For
-.Cm x
-and
-.Cm X
-conversions, a non-zero result has the string
-.Ql 0x
-(or
-.Ql 0X
-for
-.Cm X
-conversions) prepended to it.
-For
-.Cm a ,
-.Cm A ,
-.Cm e ,
-.Cm E ,
-.Cm f ,
-.Cm F ,
-.Cm g ,
-and
-.Cm G
-conversions, the result will always contain a decimal point, even if no
-digits follow it (normally, a decimal point appears in the results of
-those conversions only if a digit follows).
-For
-.Cm g
-and
-.Cm G
-conversions, trailing zeros are not removed from the result as they
-would otherwise be.
-For all other formats, behaviour is undefined.
-.It
-A zero
-.Sq Cm \&0
-character specifying zero padding.
-For all conversions except
-.Cm n ,
-the converted value is padded on the left with zeros rather than blanks.
-If a precision is given with a numeric conversion
-.Pf ( Cm d ,
-.Cm i ,
-.Cm o ,
-.Cm u ,
-.Cm x ,
-and
-.Cm X ) ,
-the
-.Sq Cm \&0
-flag is ignored.
-.It
-A negative field width flag
-.Sq Cm \-
-indicates the converted value is to be left adjusted on the field boundary.
-Except for
-.Cm n
-conversions, the converted value is padded on the right with blanks,
-rather than on the left with blanks or zeros.
-A
-.Sq Cm \-
-overrides a
-.Sq Cm \&0
-if both are given.
-.It
-A space, specifying that a blank should be left before a positive number
-produced by a signed conversion
-.Pf ( Cm d ,
-.Cm a ,
-.Cm A ,
-.Cm e ,
-.Cm E ,
-.Cm f ,
-.Cm F ,
-.Cm g ,
-.Cm G ,
+with the conversion specifiers.
+.Pp
+The overall syntax of a conversion specification is:
+.Bd -filled -offset indent
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Ar flags
+.Op Ar width
+.Op . Ar precision
+.Op Ar size
+.Ar conversion
+.Sm on
+.Ed
+.Pp
+Not all combinations of these parts are meaningful;
+see the description of the individual
+.Ar conversion
+specifiers for details.
+.Pp
+The parts of a conversion specification are as follows:
+.Bl -tag -width Ds
+.It Cm %
+A literal percent character begins a conversion specification.
+.It Ar argno Ns Cm $
+An unsigned decimal digit string followed by a dollar character
+specifies the index of the next argument to access.
+By default, the argument following the last argument accessed is used.
+Arguments are numbered starting at 1.
+.It Ar flags
+Zero or more of the following flag characters can be given:
+.Bl -tag -width 11n
+.It Cm # Pq hash
+Use an alternate form for the output.
+The effect differs depending on the conversion specifier.
+.It So \~ Sc Pq space
+For signed conversions, print a space character before a positive number.
+.It Cm + Pq plus
+For signed conversions, always print a sign before the number,
+even if it is positive.
+This overrides the space flag if both are specified.
+.It Cm 0 Pq zero
+Pad numbers with leading zeros instead of space characters
+to fill the field
+.Ar width .
+This flag is ignored if
+.Ar mindigits
+is also given.
+.It Cm \- Pq minus
+Left adjust: pad to the field
+.Ar width
+with space characters on the right rather than on the left.
+This overrides the
+.Sq Cm 0
+flag if both are specified.
+.El
+.It Ar width
+An unsigned decimal digit string specifies a minimum field width in bytes.
+Unless the
+.Sq Cm 0
 or
-.Cm i ) .
-.It
-A
-.Sq Cm +
-character specifying that a sign always be placed before a
-number produced by a signed conversion.
-A
-.Sq Cm +
-overrides a space if both are used.
+.Sq Cm \-
+flag is given, the value is right adjusted in the field and
+padded with space characters on the left.
+By default, no padding is added.
+In no case does a non-existent or small field
+.Ar width
+cause truncation of a field; if the result of a conversion is wider
+than the field width, the field is expanded to contain the conversion
+result.
+.It Pf . Ar precision
+The meaning of an unsigned decimal digit string prefixed with a
+period character depends on the conversion specifier:
+it provides the minimum number of digits for integer conversions,
+of decimals for some floating point conversions and of significant
+digits for others, or the maximum number of bytes to print for
+string conversions.
+.Pp
+A field
+.Ar width
+or
+.Ar precision ,
+or both, may alternatively be indicated as
+.Cm * Ns Op Ar argno Ns Cm $ ,
+i.e. as an asterisk optionally followed
+by an unsigned decimal digit string and a dollar sign.
+In this case, an additional
+.Vt int
+argument supplies the field width or precision.
+If a single conversion specification tries to use arguments
+both with and without
+.Ar argno Ns Cm $
+modifiers, the result is undefined.
+.It Ar size
+An argument size modifier.
+The syntax, the precise meaning, and the default size of the argument
+depend on the following
+.Ar conversion
+character.
+.It Ar conversion
+Each conversion specification ends with a conversion specifier,
+which is a single letter determining which argument type is expected
+and how it is formatted.
 .El
-.It
-An optional decimal digit string specifying a minimum field width.
-If the converted value has fewer characters than the field width, it will
-be padded with spaces on the left (or right, if the left-adjustment
-flag has been given) to fill out
-the field width.
-.It
-An optional precision, in the form of a period
-.Sq Cm \&.
-followed by an
-optional digit string.
-If the digit string is omitted, the precision is taken as zero.
-This gives the minimum number of digits to appear for
-.Cm d ,
-.Cm i ,
-.Cm o ,
-.Cm u ,
-.Cm x ,
-and
-.Cm X
-conversions, the number of digits to appear after the decimal-point for
-.Cm a ,
-.Cm A ,
-.Cm e ,
-.Cm E ,
-.Cm f ,
+.Pp
+The conversion specifiers are:
+.Bl -tag -width Ds
+.It Cm %a
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm #
+.Op Cm \~ | +
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar hexadecimals
+.Op Cm L | l
+.Cm a
+.Sm on
+.Pp
+The
+.Vt double
+argument is converted to the hexadecimal notation
+.Sm off
+.Oo \- Oc Sy 0x No h.hhh Sy p No \(+-d
+.Sm on
+with one digit before the hexadecimal point.
+If specified, the number is rounded to
+.Ar hexadecimals
+after the hexadecimal point; otherwise,
+enough digits are printed to represent it exactly.
+The hexadecimal point is only printed if at least one digit follows it
+or if the
+.Sq Cm #
+flag is given.
+.Pp
+The exponent is expressed in base 2, not in base 16.
+Consequently, there are multiple ways to represent a number in this format.
+For example, 0x3.24p+0, 0x6.48p-1, and 0xc.9p-2 are all equivalent.
+The format chosen depends on the internal representation of the
+number, but the implementation guarantees that the length of the
+mantissa is minimized.
+Zeroes are always represented with a mantissa of
+.Ql 0
+(preceded by a sign if appropriate) and an exponent of
+.Ql +0 .
+.Pp
+If the argument is infinity, it is converted to
+.Ql [-]inf .
+If the argument is not-a-number (NaN), it is converted to
+.Ql [-]nan .
+.It Cm \&%A
+Identical to
+.Cm %a
+except that upper case is used, i.e.\&
+.Ql 0X
+for the prefix,
+.Ql 0123456789ABCDEF
+for the digits,
+.Ql P
+to introduce the exponent,
 and
-.Cm F
-conversions, the maximum number of significant digits for
-.Cm g
+.Ql [-]INF
 and
-.Cm G
-conversions, or the maximum number of characters to be printed from a
-string for
-.Cm s
-conversions.
-.It
-An optional length modifier, that specifies the size of the argument.
-The following length modifiers are valid for the
-.Cm d , i , n ,
-.Cm o , u , x ,
-or
-.Cm X
-conversions:
-.Bl -column "(deprecated)" "signed char" "unsigned long long" "long long *"
-.It Sy Modifier Ta Sy "d, i" Ta Sy "o, u, x, X" Ta Sy n
-.It hh Ta "signed char" Ta "unsigned char" Ta "signed char *"
-.It h Ta short Ta "unsigned short" Ta "short *"
-.It "l (ell)" Ta long Ta "unsigned long" Ta "long *"
-.It "ll (ell ell)" Ta "long long" Ta "unsigned long long" Ta "long long *"
-.It j Ta intmax_t Ta uintmax_t Ta "intmax_t *"
-.It t Ta ptrdiff_t Ta (see note) Ta "ptrdiff_t *"
-.It z Ta "(see note)" Ta size_t Ta "(see note)"
-.It "q (deprecated)" Ta quad_t Ta u_quad_t Ta "quad_t *"
-.El
+.Ql [-]NAN
+for infinity and not-a-number, respectively.
+.It Cm %c
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm \-
+.Op Ar width
+.Cm c
+.Sm on
 .Pp
-Note:
-the
-.Cm t
-modifier, when applied to an
-.Cm o , u , x ,
-or
-.Cm X
-conversion, indicates that the argument is of an unsigned type
-equivalent in size to a
-.Vt ptrdiff_t .
 The
-.Cm z
-modifier, when applied to a
+.Vt int
+argument is converted to an
+.Vt unsigned char ,
+and the resulting single-byte character is written, with optional padding.
+.It Cm %lc
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm \-
+.Op Ar width
+.Cm lc
+.Sm on
+.Pp
+The
+.Vt wint_t
+argument is converted to a multibyte character according to the current
+.Dv LC_CTYPE
+.Xr locale 1 ,
+and that character is written.
+For example, under a UTF-8 locale on
+.Ox ,
+.Ql printf("%lc", 0x03c0)
+writes the greek letter pi, whereas the same call fails
+under the default POSIX locale.
+Padding assures at least
+.Ar width
+bytes are printed; the number of characters printed may be smaller,
+and the number of display columns occupied may be smaller or larger.
+.It Cm %d
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm \~ | +
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar mindigits
+.Op Ar size
 .Cm d
-or
-.Cm i
-conversion, indicates that the argument is of a signed type equivalent in
-size to a
-.Vt size_t .
-Similarly, when applied to an
-.Cm n
-conversion, it indicates that the argument is a pointer to a signed type
-equivalent in size to a
-.Vt size_t .
-.Pp
-The following length modifiers are valid for the
-.Cm a ,
-.Cm A ,
-.Cm e ,
-.Cm E ,
-.Cm f ,
-.Cm F ,
-.Cm g ,
-or
-.Cm G
-conversions:
-.Bl -column "Modifier" "e, E, f, F, g, G"
-.It Sy Modifier Ta Sy "e, E, f, F, g, G"
-.It "l (ell)" Ta double (ignored: same behavior as without it)
-.It L Ta "long double"
-.El
+.Sm on
 .Pp
-The following length modifier is valid for the
-.Cm c
-or
-.Cm s
-conversions:
-.Bl -column "Modifier" "wint_t" "wchar_t *"
-.It Sy Modifier Ta Sy c Ta Sy s
-.It "l (ell)" Ta wint_t Ta "wchar_t *"
-.El
-.It
-A character that specifies the type of conversion to be applied.
+The
+.Vt int
+argument is converted to signed decimal notation.
+If specified, at least
+.Ar mindigits
+are printed, padding with leading zeros if needed.
+The following are similar to
+.Cm %d
+except that they take an argument of a different size:
+.Bl -column %hhd
+.It Cm %hhd Ta Vt signed char
+.It Cm %hd  Ta Vt signed short
+.It Cm %d   Ta Vt signed int
+.It Cm %ld  Ta Vt signed long Pq percent ell dee
+.It Cm %lld Ta Vt signed long long Pq percent ell ell dee
+.It Cm %jd  Ta Vt intmax_t
+.It Cm %td  Ta Vt ptrdiff_t
+.It Cm %zd  Ta Vt ssize_t
+.It Cm %qd  Ta Vt quad_t Pq deprecated
 .El
+.It Cm \&%D
+A deprecated alias for
+.Cm %ld .
+.It Cm %e
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm #
+.Op Cm \~ | +
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar decimals
+.Op Cm L | l
+.Cm e
+.Sm on
 .Pp
-A field width or precision, or both, may be indicated by
-an asterisk
-.Ql *
-or an asterisk followed by one or more decimal digits and a
-.Ql $
-instead of a
-digit string.
-In this case, an
-.Li int
-argument supplies the field width or precision.
-A negative field width is treated as a left adjustment flag followed by a
-positive field width; a negative precision is treated as though it were
-missing.
-If a single format directive mixes positional (nn$) and
-non-positional arguments, the results are undefined.
-.Pp
-The conversion specifiers and their meanings are:
-.Bl -tag -width "diouxX"
-.It Cm diouxX
-The
-.Li int
-(or appropriate variant) argument is converted to signed decimal
-.Pf ( Cm d
-and
-.Cm i ) ,
-unsigned octal
-.Pq Cm o ,
-unsigned decimal
-.Pq Cm u ,
-or unsigned hexadecimal
-.Pf ( Cm x
-and
-.Cm X )
-notation.
-The letters
-.Cm abcdef
-are used for
-.Cm x
-conversions; the letters
-.Cm ABCDEF
-are used for
-.Cm X
-conversions.
-The precision, if any, gives the minimum number of digits that must
-appear; if the converted value requires fewer digits, it is padded on
-the left with zeros.
-.It Cm DOU
-The
-.Li long int
-argument is converted to signed decimal, unsigned octal, or unsigned
-decimal, as if the format had been
-.Cm ld ,
-.Cm lo ,
-or
-.Cm lu
-respectively.
-These conversion characters are deprecated, and will eventually disappear.
-.It Cm eE
-The
-.Li double
-argument is rounded and converted in the style
-.Sm off
-.Pf [\-]d Cm \&. No ddd Cm e No \(+-dd
-.Sm on
-where there is one digit before the
-decimal-point character
-and the number of digits after it is equal to the precision;
-if the precision is missing,
-it is taken as 6; if the precision is
-zero, no decimal-point character appears.
-An
-.Cm E
-conversion uses the letter
-.Cm E
-(rather than
-.Cm e )
-to introduce the exponent.
+The
+.Vt double
+argument is rounded and converted to the scientific notation
+.Pf [\-]d.dddddd Sy e Ns \(+-dd
+with one digit before the decimal point and
+.Ar decimals ,
+or six digits by default, after it.
+If
+.Ar decimals
+is zero and the
+.Sq Cm #
+flag is not given, the decimal point is omitted.
 The exponent always contains at least two digits; if the value is zero,
-the exponent is 00.
+the exponent is
+.Ql +00 .
+If the argument is infinity, it is converted to
+.Ql [-]inf .
+If the argument is not-a-number (NaN), it is converted to
+.Ql [-]nan .
+.Pp
+.Cm %Le
+is similar to
+.Cm %e
+except that it takes an argument of
+.Vt long double .
+.Cm %le Pq ell e
+is an alias for
+.Cm %e .
+.It Cm \&%E
+Identical to
+.Cm %e
+except that upper case is used, i.e.\&
+.Ql E
+instead of
+.Ql e
+to introduce the exponent and
+.Ql [-]INF
+and
+.Ql [-]NAN
+for infinity and not-a-number, respectively.
+.It Cm %f
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm #
+.Op Cm \~ | +
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar decimals
+.Op Cm L | l
+.Cm f
+.Sm on
 .Pp
-If the argument is infinity, it will be converted to [-]inf
-.Pq Cm e
-or [-]INF
-.Pq Cm E ,
-respectively.
-If the argument is not-a-number (NaN), it will be converted to
-[-]nan
-.Pq Cm e
-or [-]NAN
-.Pq Cm E ,
-respectively.
-.It Cm fF
-The
-.Li double
-argument is rounded and converted to decimal notation in the style
-.Sm off
-.Pf [-]ddd Cm \&. No ddd ,
-.Sm on
-where the number of digits after the decimal-point character
-is equal to the precision specification.
-If the precision is missing, it is taken as 6; if the precision is
-explicitly zero, no decimal-point character appears.
+The
+.Vt double
+argument is rounded and converted to the decimal notation [\-]ddd.dddddd with
+.Ar decimals ,
+or six digits by default, after the decimal point.
+If
+.Ar decimals
+is zero and the
+.Sq Cm #
+flag is not given, the decimal point is omitted.
 If a decimal point appears, at least one digit appears before it.
+If the argument is infinity, it is converted to
+.Ql [-]inf .
+If the argument is not-a-number (NaN), it is converted to
+.Ql [-]nan .
+.Pp
+.Cm %Lf
+is similar to
+.Cm %f
+except that it takes an argument of
+.Vt long double .
+.Cm %lf Pq ell eff
+is an alias for
+.Cm %f .
+.It Cm \&%F
+Identical to
+.Cm %f
+except that upper case is used, i.e.\&
+.Ql [-]INF
+and
+.Ql [-]NAN
+for infinity and not-a-number, respectively.
+.It Cm %g
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm #
+.Op Cm \~ | +
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar significant
+.Op Cm L | l
+.Cm g
+.Sm on
 .Pp
-If the argument is infinity, it will be converted to [-]inf
-.Pq Cm f
-or [-]INF
-.Pq Cm F ,
-respectively.
-If the argument is not-a-number (NaN), it will be converted to
-[-]nan
-.Pq Cm f
-or [-]NAN
-.Pq Cm F ,
-respectively.
-.It Cm gG
 The
-.Li double
+.Vt double
 argument is converted in style
-.Cm f
-or
-.Cm e
-for
-.Cm g ,
+.Cm %f
 or
-.Cm F
-or
-.Cm E
-for
-.Cm G
-.Pq general floating point notation .
-The precision specifies the number of significant digits.
-If the precision is missing, 6 digits are given; if the precision is zero,
-it is treated as 1.
+.Cm %e
+.Pq general floating point notation
+with
+.Ar significant
+digits, or six significant digits by default.
+If
+.Ar significant
+is zero, one is used instead.
 Style
-.Cm e
-is used if the exponent from its conversion is less than -4 or greater than
-or equal to the precision.
-Trailing zeros are removed from the fractional part of the result; a
-decimal point appears only if it is followed by at least one digit.
-.Pp
-If the argument is infinity, it will be converted to [-]inf
-.Pq Cm g
-or [-]INF
-.Pq Cm G ,
-respectively.
-If the argument is not-a-number (NaN), it will be converted to
-[-]nan
-.Pq Cm g
-or [-]NAN
-.Pq Cm G ,
-respectively.
-.It Cm aA
-The
-.Li double
-argument is rounded and converted to hexadecimal notation in the style
-.Sm off
-.Pf [\-]0xh Cm \&. No hhh Cm p No [\(+-]d
-.Sm on
-where the number of digits after the hexadecimal-point character
-is equal to the precision specification.
-If the precision is missing, it is taken as enough to represent
-the floating-point number exactly, and no rounding occurs.
-If the precision is zero, no hexadecimal-point character appears.
-The
-.Cm p
-is a literal character
-.Ql p ,
-and the exponent consists of a positive or negative sign
-followed by a decimal number representing an exponent of 2.
-The
-.Cm A
-conversion uses the prefix
-.Dq Li 0X
-(rather than
-.Dq Li 0x ) ,
-the letters
-.Dq Li ABCDEF
-(rather than
-.Dq Li abcdef )
-to represent the hex digits, and the letter
-.Ql P
-(rather than
-.Ql p )
-to separate the mantissa and exponent.
-.Pp
-Note that there may be multiple valid ways to represent floating-point
-numbers in this hexadecimal format.
-For example,
-.Li 0x3.24p+0 , 0x6.48p-1
-and
-.Li 0xc.9p-2
-are all equivalent.
-The format chosen depends on the internal representation of the
-number, but the implementation guarantees that the length of the
-mantissa will be minimized.
-Zeroes are always represented with a mantissa of 0 (preceded by a
-.Ql -
-if appropriate) and an exponent of
-.Li +0 .
-.Pp
-If the argument is infinity, it will be converted to [-]inf
-.Pq Cm a
-or [-]INF
-.Pq Cm A ,
-respectively.
-If the argument is not-a-number (NaN), it will be converted to
-[-]nan
-.Pq Cm a
-or [-]NAN
-.Pq Cm A ,
-respectively.
-.It Cm c
-The
-.Li int
-argument is converted to an
-.Li unsigned char ,
-and the resulting character is written.
-.It Cm s
-The
-.Li char *
-argument is expected to be a pointer to an array of character type (pointer
-to a string).
-Characters from the array are written up to (but not including)
-a terminating NUL character;
-if a precision is specified, no more than the number specified are
-written.
-If a precision is given, no NUL character need be present;
-if the precision is not specified, or is greater than the size
-of the array, the array must contain a terminating NUL character.
-.It Cm p
-The
-.Li void *
-pointer argument is printed in hexadecimal (as if by
-.Ql %#x
-or
-.Ql %#lx ) .
-.It Cm n
-The number of characters written so far is stored into the
-integer indicated by the
-.Li int *
-(or variant) pointer argument.
+.Cm %e
+is used if the exponent from its conversion is less than \-4
+or greater than or equal to
+.Ar significant .
+Unless the
+.Sq Cm #
+flag is given, trailing zeros are removed from the fractional
+part of the result, and the decimal point only appears if it is
+followed by at least one digit.
+.Pp
+.Cm %Lg
+is similar to
+.Cm %g
+except that it takes an argument of
+.Vt long double .
+.Cm %lg Pq ell gee
+is an alias for
+.Cm %g .
+.It Cm \&%G
+Identical to
+.Cm %g
+except that upper case is used, i.e.\&
+.Ql E
+instead of
+.Ql e
+to introduce the exponent and
+.Ql [-]INF
+and
+.Ql [-]NAN
+for infinity and not-a-number, respectively.
+.It Cm %i
+An alias for
+.Cm %d .
+.It Cm %n
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Ar size
+.Cm n
+.Sm on
+.Pp
+The number of bytes written so far is stored
+into the integer variable indicated by the
+.Vt int *
+pointer argument.
 No argument is converted.
-.It Cm %
-A
-.Ql %
+A pointer to a signed integer type of a different size
+can be passed when specifying the
+.Ar size
+modifier as described for the
+.Cm %d
+conversion specifier.
+.It Cm %o
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm #
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar mindigits
+.Op Ar size
+.Cm o
+.Sm on
+.Pp
+Similar to
+.Cm %u
+except that the
+.Vt unsigned int
+argument is converted to unsigned octal notation.
+If the
+.Sq Cm #
+flag is given,
+.Ar mindigits
+is increased such that the first digit printed is a zero,
+except if a zero value is printed with an explicit
+.Ar mindigits
+of zero.
+.It Cm \&%O
+A deprecated alias for
+.Cm %lo .
+.It Cm %p
+The
+.Vt void *
+pointer argument is printed in hexadecimal, similar to
+.Cm %#x
+or
+.Cm %#lx
+depending on the size of pointers.
+.It Cm %s
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm \-
+.Op Ar width
+.Op . Ar maxbytes
+.Cm s
+.Sm on
+.Pp
+Characters from the
+.Vt char * Pq string
+argument are written up to (but not including) a terminating NUL character.
+If
+.Ar maxbytes
+is specified, at most
+.Ar maxbytes
+bytes are written; in that case, no NUL character needs to be present.
+.It Cm %ls
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm \-
+.Op Ar width
+.Op . Ar maxbytes
+.Cm ls
+.Sm on
+.Pp
+The
+.Vt wchar_t * Pq wide character string
+argument is converted to a multibyte character string
+according to the current
+.Dv LC_CTYPE
+.Xr locale 1
+up to (but not including) a terminating NUL character,
+and that multibyte character string is written.
+If
+.Ar maxbytes
+is specified, at most
+.Ar maxbytes
+bytes are written; in that case, no NUL character needs to be present.
+If a multibyte character does not fit into the rest of
+.Ar maxbytes ,
+it is omitted together with the rest of the argument string;
+partial characters are not written.
+Locale dependency and padding work in the same way as for
+.Cm %lc .
+.It Cm %u
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar mindigits
+.Op Ar size
+.Cm u
+.Sm on
+.Pp
+The
+.Vt unsigned int
+argument is converted to unsigned decimal notation.
+If specified, at least
+.Ar mindigits
+are printed, padding with leading zeros if needed.
+The following are similar to
+.Cm %u
+except that they take an argument of a different size:
+.Bl -column %hhu
+.It Cm %hhu Ta Vt unsigned char
+.It Cm %hu  Ta Vt unsigned short
+.It Cm %u   Ta Vt unsigned int
+.It Cm %lu  Ta Vt unsigned long Pq percent ell u
+.It Cm %llu Ta Vt unsigned long long Pq percent ell ell u
+.It Cm %ju  Ta Vt uintmax_t
+.It Cm %tu  Ta unsigned type of same size as Vt ptrdiff_t
+.It Cm %zu  Ta Vt size_t
+.It Cm %qu  Ta Vt u_quad_t Pq deprecated
+.El
+.It Cm \&%U
+A deprecated alias for
+.Cm %lu .
+.It Cm %x
+.Sm off
+.Cm %
+.Op Ar argno Cm $
+.Op Cm #
+.Op Cm \- | 0
+.Op Ar width
+.Op . Ar mindigits
+.Op Ar size
+.Cm x
+.Sm on
+.Pp
+Similar to
+.Cm %u
+except that the
+.Vt unsigned int
+argument is converted to unsigned hexadecimal notation using the digits
+.Ql 0123456789abcdef .
+If the
+.Sq Cm #
+flag is given, the string
+.Ql 0x
+is prepended unless the value is zero.
+.It Cm \&%X
+Identical to
+.Cm %x
+except that upper case is used, i.e.\&
+.Ql 0X
+for the optional prefix and
+.Ql 0123456789ABCDEF
+for the digits.
+.It Cm %%
+A single percent sign
+.Pq Ql %
 is written.
 No argument is converted.
 The complete conversion specification is
-.Ql %% .
+.Ql %% ;
+no modifiers can be inserted between the two percent signs.
 .El
-.Pp
-In no case does a non-existent or small field width cause truncation of
-a field; if the result of a conversion is wider than the field width, the
-field is expanded to contain the conversion result.
 .Sh RETURN VALUES
 For all these functions if an output or encoding error occurs, a value
 less than 0 is returned.
@@ -657,7 +750,7 @@
 and
 .Fn vasprintf
 functions
-return the number of characters printed
+return the number of bytes printed
 (not including the trailing
 .Ql \e0
 used to end output to strings).
@@ -666,7 +759,7 @@
 .Fn snprintf
 and
 .Fn vsnprintf
-functions return the number of characters that would have
+functions return the number of bytes that would have
 been output if the
 .Fa size
 were unlimited
@@ -683,7 +776,7 @@
 .Fn asprintf
 and
 .Fn vasprintf
-functions return the number of characters that were output
+functions return the number of bytes that were output
 to the newly allocated string
 (excluding the final
 .Ql \e0 ) .
@@ -706,6 +799,28 @@
 pointer, but other implementations may leave
 .Fa ret
 unchanged.
+.Sh ENVIRONMENT
+.Bl -tag -width LC_CTYPE
+.It Ev LC_CTYPE
+The character encoding
+.Xr locale 1 .
+It decides which
+.Vt wchar_t
+values represent valid wide characters for the
+.Cm %lc
+and
+.Cm %ls
+conversion specifiers and how they are encoded into multibyte characters.
+If unset or set to
+.Qq C ,
+.Qq POSIX ,
+or an unsupported value,
+.Cm %lc
+and
+.Cm %ls
+only work correctly for ASCII characters
+and fail for arguments greater than 255.
+.El
 .Sh EXAMPLES
 To print a date and time in the form `Sunday, July 3, 10:02',
 where

rewrite printf(3) manual page

Reply via email to