On Wed, May 17, 2023 at 04:05:23PM -0500, Eric Blake wrote: > On Wed, May 17, 2023 at 11:06:54AM +0100, Richard W.M. Jones wrote: > > In nbdkit-error-filter we need to parse parameters as probabilities. > > This is useful enough to add to nbdkit, since we will use it in > > another filter in future. > > --- > > docs/nbdkit-plugin.pod | 19 +++++++ > > plugins/python/nbdkit-python-plugin.pod | 6 ++ > > include/nbdkit-common.h | 2 + > > server/nbdkit.syms | 1 + > > server/public.c | 37 +++++++++++++ > > server/test-public.c | 73 +++++++++++++++++++++++++ > > plugins/ocaml/NBDKit.mli | 11 ++-- > > plugins/ocaml/NBDKit.ml | 2 + > > plugins/ocaml/bindings.c | 16 ++++++ > > plugins/python/modfunctions.c | 21 +++++++ > > tests/test-python-plugin.py | 12 ++++ > > tests/test_ocaml_plugin.ml | 1 + > > 12 files changed, 196 insertions(+), 5 deletions(-) > > > > diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod > > index 860c5cecb..e8d30a98e 100644 > > --- a/docs/nbdkit-plugin.pod > > +++ b/docs/nbdkit-plugin.pod > > @@ -1433,6 +1433,25 @@ C<str> can be a string containing a case-insensitive > > form of various > > common toggle values. The function returns 0 or 1 if the parse was > > successful. If there was an error, it returns C<-1>. > > > > +=head2 Parsing probabilities > > + > > +Use the C<nbdkit_parse_probability> utility function to parse > > +probabilities. Common formats understood include: C<"0.1">, C<"10%"> > > +or C<"1:10">, which all mean a probability of 1 in 10. > > + > > + int nbdkit_parse_probability (const char *what, const char *str, > > + double *ret); > > + > > +The C<what> parameter is printed in error messages to provide context. > > +The C<str> parameter is the probability string. > > + > > +On success the function returns C<0> and sets C<*ret>. B<Note> that > > +the probability returned may be outside the range S<[ 0.0..1.0 ]>, for > > +example if C<str == "200%">. If you want to clamp the result you must > > +check that yourself. > > Should we at least guarantee that the return value is non-negative > (other than potentially -0.0 which compares equal to 0.0) and finite? > I don't see how a negative or infinite probability will ever be > useful (with apologies to Douglas Adams' infinite improbability drive).
Looking at this in the cold light of day I think my confusion is around probabilities versus percentages/fractions. Actual probabilities can't be outside the range [0..1]. But it's usual to have a percentage above 100%, eg. you could have a base rate and then express 200% (2 times) this rate. We can use this function to parse percentages as well as probabilties, eg in an imaginary rate filter: bandwidth=10000 burst=200% or: bandwidth=10000 burst=2:1 But I don't think negative percentages make sense. So yes we can limit this to returning only numbers >= 0 (and also non NaN, not infinite). > > +++ b/server/public.c > > @@ -421,6 +421,43 @@ nbdkit_parse_size (const char *str) > > return size * scale; > > } > > > > +NBDKIT_DLL_PUBLIC int > > +nbdkit_parse_probability (const char *what, const char *str, > > + double *retp) > > +{ > > + double d, d2; > > + char c; > > + int n; > > + > > + if (sscanf (str, "%lg%[:/]%lg%n", &d, &c, &d2, &n) == 3 && > > + strcmp (&str[n], "") == 0) { /* N:M or N/M */ > > + if (d == 0 && d2 == 0) /* 0/0 is OK */ > > I'd write this 'if (d == 0.0 && d2 == 0.0)' to make it obvious that we > know we are doing floating point comparisons here (semantics are the > same either way, though, because of how integer 0 promotes under > standard arithmetic promotion). OK. > > + ; > > + else if (d2 == 0) /* N/0 is bad */ > > + goto bad_parse; > > + else > > + d /= d2; > > + } > > + else if (sscanf (str, "%lg%n", &d, &n) == 1) { > > + if (strcmp (&str[n], "%") == 0) /* percentage */ > > + d /= 100.0; > > + else if (strcmp (&str[n], "") == 0) /* probability */ > > + ; > > + else > > + goto bad_parse; > > + } > > + else > > + goto bad_parse; > > Here's where it might be worth adding: > > if (d < 0.0 || !isfinite (d)) > goto bad_parse; > > to filter out the cases where the user passed in "inf", "nan", or even > some form of large/small that results in overflow. > > You caould also use 'signbit (d)' instead of 'd < 0.0' if you want to > prevent -0.0 from causing surprises (while IEEE 754 and therefore the > compiler treats '0.0 == -0.0' as true despite being different bit > patterns, x/0.0 and x/-0.0 for finite x differ in behavior). OK, but added to nbdkit_parse_probability itself. > > + > > + if (retp) > > + *retp = d; > > + return 0; > > + > > + bad_parse: > > + nbdkit_error ("%s: could not parse '%s'", what, str); > > Should this say "could not parse '%s' as a probability" to help the > user search the documntation for what forms can be parsed? Yes, it would also be good to mention the function name. > > + return -1; > > If you get here, *retp was unchanged. [1] > > > +} > > + > > /* Parse a string as a boolean, or return -1 after reporting the error. > > */ > > NBDKIT_DLL_PUBLIC int > > diff --git a/server/test-public.c b/server/test-public.c > > index 676411290..0d84abdd2 100644 > > --- a/server/test-public.c > > +++ b/server/test-public.c > > @@ -200,6 +200,78 @@ test_nbdkit_parse_size (void) > > return pass; > > } > > > > +static bool > > +test_nbdkit_parse_probability (void) > > +{ > > + size_t i; > > + bool pass = true; > > + struct pair { > > + const char *str; > > + int result; > > + double expected; > > + } tests[] = { > > + /* Bogus strings */ > > + { "", -1 }, > > + { "garbage", -1 }, > > + { "0garbage", -1 }, > > + { "1X", -1 }, > > + { "1%%", -1 }, > > + { "1:", -1 }, > > + { "1:1:1", -1 }, > > + { "1:0", -1 }, /* format is valid but divide by zero is not allowed */ > > + { "1/", -1 }, > > + { "1/2/3", -1 }, > > If we add my proposed check above for filtering out negatives and > non-finite values, we could also test that things like "-1", "inf", > "nan", "1e200/1e-200" are rejected. Likewise worth adding a test for > "-0" (whether you decide to permit or reject it). OK. > > + > > + /* Numbers. */ > > + { "0", 0, 0 }, > > + { "1", 0, 1 }, > > + { "2", 0, 2 }, /* values outside [0..1] range are allowed */ > > + { "0.1", 0, 0.1 }, > > + { "0.5", 0, 0.5 }, > > + { "0.9", 0, 0.9 }, > > + { "1.0000", 0, 1 }, > > + > > + /* Percentages. */ > > + { "0%", 0, 0 }, > > + { "50%", 0, 0.5 }, > > + { "100%", 0, 1 }, > > + { "90.25%", 0, 0.9025 }, > > + > > + /* N in M */ > > + { "1:1000", 0, 0.001 }, > > + { "1/1000", 0, 0.001 }, > > + { "2:99", 0, 2.0/99 }, > > + { "2/99", 0, 2.0/99 }, > > + { "0:1000000", 0, 0 }, > > Since you document in patch 5 that we have a default percentage of > "1e-8", it is worth testing that as an explicitly supported string. OK. > > + }; > > + > > + for (i = 0; i < ARRAY_SIZE (tests); i++) { > > + int r; > > + double d; > > Uninitialized... > > > + > > + error_flagged = false; > > + r = nbdkit_parse_probability ("test", tests[i].str, &d); > > ...and per [1] above, there are code paths where d is not assigned... > > > > + if (r != tests[i].result) { > > + fprintf (stderr, > > + "Wrong return value for %s, got %d, expected %d\n", > > + tests[i].str, r, tests[i].result); > > + pass = false; > > + } > > + if (r == 0 && d != tests[i].expected) { > > + fprintf (stderr, > > + "Wrong result for %s, got %g, expected %g\n", > > + tests[i].str, d, tests[i].expected); > > ...so this can end up printing garbage. You may want to consider > having nbdkit_parse_probability assign into d on all code paths, not > just success. I don't think I understand this. Surely if r == 0 then d has been assigned, and if r == -1 we don't print d? > > + pass = false; > > + } > > + if ((r == -1) != error_flagged) { > > + fprintf (stderr, "Wrong error message handling for %s\n", > > tests[i].str); > > + pass = false; > > + } > > + } > > + > > + return pass; > > +} > > + > > static bool > > test_nbdkit_parse_ints (void) > > { > > @@ -503,6 +575,7 @@ main (int argc, char *argv[]) > > { > > bool pass = true; > > pass &= test_nbdkit_parse_size (); > > + pass &= test_nbdkit_parse_probability (); > > pass &= test_nbdkit_parse_ints (); > > pass &= test_nbdkit_read_password (); > > /* nbdkit_absolute_path and nbdkit_nanosleep not unit-tested here, but > > > +++ b/plugins/ocaml/NBDKit.ml > > @@ -160,6 +160,8 @@ let register_plugin ~name > > (* Bindings to nbdkit server functions. *) > > external set_error : Unix.error -> unit = "ocaml_nbdkit_set_error" > > [@@noalloc] > > external parse_size : string -> int64 = "ocaml_nbdkit_parse_size" > > +external parse_probability : string -> string -> float = > > + "ocaml_nbdkit_parse_probability" > > Is OCaml 'float' required to be any specific floating point (such as > IEEE 754 binary64), or is it some nebulous hardware-specific floating > point (possibly 32-bit instead of 64-bit, or even permitting VAX > instead of IEEE)? But how OCaml maps floating point is not a > show-stopper to this patch, since we already state OCaml bindings > don't have ABI stability like C bindings. (I understand why modern > languages have picked 'float' to mean floating-point while still > favoring the IEEE 754 binary64 representations, where C is the odd one > out that picked 'float' as binary32 and is stuck with the type name > 'double' which has no resemblance to floating-point but rather to its > size difference.) I believe it's the same as a C double. There's no standard for OCaml so it's just whatever the current implementation does. > [Side note: if you really want a trip, read the 2023 SIGBOVIK article > on "GradIEEEnt half decent" about 16-bit floating point values being > exploited for their non-linear rounding properties as a way to create > non-monotonic functions that can in turn form the basis of a Turing > complete system capable of running a 36-second solution of Mario level > 1-1 in 19k minutes of wall time using only half-precision > floating-point operations... https://sigbovik.org/2023/, > http://tom7.org/grad/murphy2023grad.pdf] Will do :-/ > > +++ b/plugins/python/modfunctions.c > > @@ -122,11 +122,32 @@ parse_size (PyObject *self, PyObject *args) > > return PyLong_FromSize_t ((size_t)size); > > } > > > > +/* nbdkit.parse_probability */ > > +static PyObject * > > +parse_probability (PyObject *self, PyObject *args) > > +{ > > + const char *what, *str; > > + double d; > > + > > + if (!PyArg_ParseTuple (args, "ss:parse_probability", &what, &str)) > > + return NULL; > > + > > + if (nbdkit_parse_probability (what, str, &d) == -1) { > > + PyErr_SetString (PyExc_ValueError, > > + "Unable to parse string as probability"); > > + return NULL; > > + } > > + > > + return PyFloat_FromDouble (d); > > Another language binding that does not directly guarantee that 'Float' > is an IEEE 64-bit value, but where we are probably safe: > https://stackoverflow.com/questions/70184494/on-what-systems-does-python-not-use-ieee-754-double-precision-floats Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs