Date:        Mon, 8 Dec 2025 20:45:00 +0100
    From:        Anders Magnusson <[email protected]>
    Message-ID:  <[email protected]>

  | I just stumbled over something which may be a bug in scanf()...?
  | This example is in C99 7.19.6.2 clause 20.

Yes, but nowhere there does it say what the result should be in the
case that you give, and I think you're arriving at an incorrect
conclusion.

  |  ��� int count; float quant; char units[21], item[21];
  |  ��� count = sscanf("100ergs of energy\n",� "%f%20s of %20s", &quant, 
  | units, item);
  |
  | Should have a count of 0;
  |  ��� count = 0; // "100e" fails to match "%f"

It does, but "100" does not.   What the standards say about %f conversions
is that the data needs to match what strtod() would accept as its subject
sequence.

The spec of strtod() 7.24.1.6 says, after the part about the input string
being optional whitespace, the subject sequence, and optional other stuff:

        3 The expected form of the subject sequence is an optional plus
          or minus sign, then one of the following:

         -- a nonempty sequence of decimal digits optionally containing
            a decimal-point character, then an optional exponent part as
            defined in 6.4.4.2, excluding any digit separators (6.4.4.1)

The other potential "one of the following" are not relevant here (hex
floats, and Inf/NaN etc).

6.4.4.2 defines floating constants, and the definition of the exponent-part
is given there as:

        exponent-part:
                e signopt digit-sequence
                E signopt digit-sequence

(the "opt" after sign should be a subscript, my cut&paste lost that markup,
but it just means the sign is optional).   What isn't optional is the 'e'
and the "digit-sequence" (the latter has the obvious definition, which
requires at least one digit).

What this tells us is that 'e' alone is not an exponent-part, but that's
ok, as as above, the exponent part is optional.

The strtod() spec goes on (a bit later)...

        The subject sequence is defined as the longest initial subsequence
        of the input string, starting with the first non-white-space character,
        that is of the expected form. The subject sequence contains no
        characters if the input string is not of the expected form.

In the example you gave, the longest initial subsequence of "100ergs" that
is of the expected form is "100" - the leading decimal string of decimal
digits.   It cannot include the "e" as that cannot be the optional exponent
part, as it doesn't have any digits following (after the optional sign,
which is also not there).

  | but; our sscanf return 3 and sets the values like this:
  |  ��� count=3 quant=100.000000 units=ergs item=energy

which is exactly as it should do.

  | Huh?
  | And this bug seems to have been in the scanf code ~forever.

Since it is not a bug, that's good.

  | Same problem on MacOS (have the same scanf I suppose).

and since it isn't a problem either, that's also good.

But:
  | Linux (with glibc) also returns 3 but eats the 'e' in ergs.

that would be a bug.

kre

ps: I actually quoted from C23, just in case any of the section numbers
have altered in the meantime, and now I look, I see strtod() is 7.20.1.3
in C99.  No relevant part of the text about it differs however.


Reply via email to