Tao Xu <tao3...@intel.com> writes: > On 12/5/19 11:29 PM, Markus Armbruster wrote: >> Tao Xu <tao3...@intel.com> writes: >> >>> Parse input string both as a double and as a uint64_t, then use the >>> method which consumes more characters. Update the related test cases. >>> >>> Signed-off-by: Tao Xu <tao3...@intel.com> >>> --- >> [...] >>> diff --git a/util/cutils.c b/util/cutils.c >>> index 77acadc70a..b08058c57c 100644 >>> --- a/util/cutils.c >>> +++ b/util/cutils.c >>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char >>> **end, >>> const char default_suffix, int64_t unit, >>> uint64_t *result) >>> { >>> - int retval; >>> - const char *endptr; >>> + int retval, retd, retu; >>> + const char *suffix, *suffixd, *suffixu; >>> unsigned char c; >>> int mul_required = 0; >>> - double val, mul, integral, fraction; >>> + bool use_strtod; >>> + uint64_t valu; >>> + double vald, mul, integral, fraction; >> >> Note for later: @mul is double. >> >>> + >>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu);
Note for later: passing 0 to base accepts octal and hexadecimal integers. >>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>> + >>> + /* >>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>> + * which consumes more characters. >>> + */ >> >> The comment is in a funny place. I'd put it right before the >> qemu_strtod_finite() line. >> >>> + if (use_strtod) { >>> + suffix = suffixd; >>> + retval = retd; >>> + } else { >>> + suffix = suffixu; >>> + retval = retu; >>> + } >>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>> if (retval) { >>> goto out; >>> } >> >> This is even more subtle than it looks. >> >> A close reading of the function contracts leads to three cases for each >> conversion: >> >> * parse error (including infinity and NaN) >> >> @retu / @retd is -EINVAL >> @valu / @vald is uninitialized >> @suffixu / @suffixd is @nptr >> >> * range error >> >> @retu / @retd is -ERANGE >> @valu / @vald is our best approximation of the conversion result >> @suffixu / @suffixd points to the first character not consumed by the >> conversion. >> >> Sub-cases: >> >> - uint64_t overflow >> >> We know the conversion result exceeds UINT64_MAX. >> >> - double overflow >> >> we know the conversion result's magnitude exceeds the largest >> representable finite double DBL_MAX. >> >> - double underflow >> >> we know the conversion result is close to zero (closer than DBL_MIN, >> the smallest normalized positive double). >> >> * success >> >> @retu / @retd is 0 >> @valu / @vald is the conversion result >> @suffixu / @suffixd points to the first character not consumed by the >> conversion. >> >> This leads to a matrix (parse error, uint64_t overflow, success) x >> (parse error, double overflow, double underflow, success). We need to >> check the code does what we want for each element of this matrix, and >> document any behavior that's not perfectly obvious. >> >> (success, success): we pick uint64_t if qemu_strtou64() consumed more >> characters than qemu_strtod_finite(), else double. "More" is important >> here; when they consume the same characters, we *need* to use the >> uint64_t result. Example: for "18446744073709551615", we need to use >> uint64_t 18446744073709551615, not double 18446744073709551616.0. But >> for "18446744073709551616.", we need to use the double. Good. Also fun: for "0123", we use uint64_t 83, not double 123.0. But for "0123.", we use 123.0, not 83. Do we really want to accept octal and hexadecimal integers? >> (success, parse error) and (parse error, success): we pick the one that >> succeeds, because success consumes characters, and failure to parse does >> not. Good. >> >> (parse error, parse error): neither consumes characters, so we pick >> uint64_t. Good. >> >> (parse error, double overflow), (parse error, double underflow) and >> (uint64_t overflow, parse error): we pick the range error, because it >> consumes characters. Good. >> >> These are the simple combinations. The remainder are hairier: (success, >> double overflow), (success, double underflow), (uint64_t overflow, >> success). I lack the time to analyze them today. Must be done before >> we take this patch. Any takers? > > (success, double overflow), (success, double underflow), pick double > overflow error, return -ERANGE. Because it consumes > characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1", > and prases as uint64_t; but qemu_strtod_finite return -ERANGE and > consumes all characters. It is OK. The only way to have double overflow when uint64_t succeeds is an exponent. Double consumes the characters making up the exponent, uint64_t does not. We use double. The only way to have double underflow is with an exponent or a decimal point. Double consumes their characters, uint64_t does not. We use double. Okay. > (uint64_t overflow, success), consume the same characters, use the > uint64_t return -ERANGE. Note that even if qemu_strtod_finite can > parse these cases such as "18446744073709551617", but the result is > uint64_t so we also need to return -ERANGE. It is OK. That's just one of two cases, I think. The other one is when the overflowing integer is followed by an exponent or decimal point. We use double then. Converting the double to uint64_t overflows, except when a negative exponent brings the number into range. Examples: "18446744073709551617" picks uint64_t overflow, "18446744073709551617.0" picks double success (but converting it to uint64_t below overflows), and "18446744073709551617e-10" picks double success (converted to 1844674407 below). Okay. > Thank you for your analysis and suggestion. I will add more test cases > to cover some of these analysis. Good move.