Emilio G. Cota <c...@braap.org> writes: > On Tue, Mar 27, 2018 at 11:13:01 +0100, Alex Bennée wrote: >> >> Emilio G. Cota <c...@braap.org> writes: >> >> > This will allow us to run correctness tests against our >> > FP implementation. The test can be run in two modes (called >> > "testers"): host and soft. With the former we check the results >> > and FP flags on the host machine against the model. >> > With the latter we check QEMU's fpu primitives against the >> > model. Note that in soft mode we are not instantiating any >> > particular CPU (hence the HW_POISON_H hack to avoid macro poisoning); >> > for that we need to run the test in host mode under QEMU. >> > >> > The input files are taken from IBM's FPGen test suite: >> > https://www.research.ibm.com/haifa/projects/verification/fpgen/ >> > >> > I see no license file in there so I am just downloading them >> > with wget. We might want to keep a copy on a qemu server though, >> > in case IBM takes those files down in the future. >> >> Hmm the files themselves have: >> >> Copyright of IBM Corp. 2005 >> >> So I'm not sure we can take them into any of our source trees. > > Yes I don't think committing these would be appropriate. I guess keeping > our own copy of the tarball somewhere would be OK, though. My worry is > that at some point IBM's server will die and we'll have no more test > files. > >> However what are we testing here? > > fp-test allows you to test against a model. This model can be anything; > for now, I grabbed model files from IBM's fpgen, which tests for > IEEE compliance. > > We can add more model files; the muladd tests are an example of that. > >> Do we just want to test our implementation is IEEE compliant or should >> we generate our own test cases on validated hardware to check that our >> emulation of a guest is correct (e.g. correct modulo the valid >> variations in the standard)? > > I think what we want is a large core of tests that test the standard, > plus a smaller set of tests that cover ISA particularities. I think > the IBM models are a good starting point for the former -- note > though that some operations are not covered in the models (despite > the syntax description specifying an op for them, e.g. int-to-float > conversions.)
OK that sounds sane. >> > The "IBM" syntax of those files (for now the only syntax supported >> > in fp-test) is documented here: >> > https://www.research.ibm.com/haifa/projects/verification/fpgen/papers/ieee-test-suite-v2.pdf >> > >> > Note that the syntax document has some inaccuracies; the appended >> > parsing code works around some of those. >> > >> > The exception flag (-e) is important: many of the optimizations >> > included in the following commits assume that the inexact flag >> > is set, so "-e x" is necessary in order to test those code paths. >> > >> > The whitelist flag (-w) points to a file with test cases to be ignored. >> > I have put some whitelist files online, but we should have them >> > on a QEMU-related server. >> > >> > Thus, a typical of fp-test is as follows: >> > >> > $ cd qemu/build/tests/fp-test >> > $ make -j && \ >> > ./fp-test -t soft ibm/*.fptest \ >> > -w whitelist.txt \ >> > -e x >> >> So this is a unit test of our code rather than a test program running >> under QEMU? > > Having the -t host/soft flags allows you flexibility in what to test. > > With "host" mode, you're generating a binary that knows nothing > about QEMU, i.e. all its FP operations are native. You can use this > to (1) figure out whether your host diverts from the model [hopefully > it doesn't in anything substantial], and (2) test whether QEMU mimics > the corresponding host by running the binary under *-linux-user. OK - there is no reason why we can't cross compile the single source file for multiple tests/tcg/targets. I'll look at that when I have another run at the cross compile stuff. I wonder if we should disable the soft mode for these builds so we don't have to deal with cross compiling the softfloat.o's as well? > With "soft" mode, you're testing QEMU's soft-fp implementation. This > allows you to check it against the model. Note that it doesn't let > you check anything specific about a target CPU (hence the HW_POISON_H > hack); for this you'd have to go to point (2) above. Here instead we're > checking QEMU's FP implementation directly against the models. > >> I noted running under x86-64-linux-user fails pretty quick. > > As I wrote below, I think this is due to bugs in the i386 target. > > On a host x86_64 machine: > $ ./fp-test -t host ibm/* -w whitelist.txt -w whitelist-tininess-after.txt > All tests OK. > Tests passed: 74426. Not handled: 53297, whitelisted: 2748 > > $ ../../x86_64-linux-user/qemu-x86_64 ./fp-test -t host \ > ibm/* -w whitelist.txt -w whitelist-tininess-after.txt -n 2>/dev/null > Tests failed: 57479. Parsing: 0, result:14, flags:57465 > Tests passed: 16947. Not handled: 53297, whitelisted: 2748 > > The results are different when run under QEMU, which means > this target is not doing things correctly (despite -t soft > passing). > >> If so we really should be building this automatically in make check. > > Yes, passing -soft mode would certainly be valuable and trivial > to integrate since there is nothing built that is target-dependent. So the two bugs I'm currently fixing are guest dependent so can't be caught by soft mode: - ARM FP16 alternative format behaviour - round_to_int_and_pack refactor broke TriCore ftoi insns (1759264) I assume your bug was? - fix {min,max}nummag for same-abs-value inputs (from your series) > > (snip) >> > --- /dev/null >> > +++ b/tests/fp-test/fp-test.c > (snip) >> > +enum precision { >> > + PREC_FLOAT, >> > + PREC_DOUBLE, >> > + PREC_QUAD, >> > + PREC_FLOAT_TO_DOUBLE, >> > +}; >> >> Again we will want to include half-precision. > > Sure. We'll need to come up with model files for that though. > > (snip) >> > + } >> > + if (unlikely(!strcmp("-inf", p) || !strcmp("-Inf", p))) { >> > + ret->val = fp_choose(prec, 0xff800000, 0xfff0000000000000); >> > + return 0; >> > + } >> > + >> > + len = strlen(p); >> > + >> > + if (strchr(p, 'P')) { >> >> Given we have already strlen'ed should we use memchr(p, 'P', len) or >> even strchrnul(p,'P') with a check we haven't run over the end? OR maybe >> use glib's string stuff instead. > > I think in this case strchr is warranted; p points to a valid string > (otherwise strlen would also be dangerous). Yeah that's fair. I guess it also help cross-compilation to avoid too many dependencies. > > (snip) >> > +/* Syntax of IBM FP test cases: >> > + * >> > https://www.research.ibm.com/haifa/projects/verification/fpgen/syntax.txt >> > + */ >> > +static enum error ibm_test_line(const char *line) >> > +{ >> > + struct test_op t; >> > + /* at most nine fields; this should be more than enough for each >> > field */ >> > + char s[9][64]; >> > + char *p; >> > + int n, field; >> > + int i; >> > + >> > + /* data lines start with either b32 or d(64|128) */ >> > + if (unlikely(line[0] != 'b' && line[0] != 'd')) { >> > + return ERROR_COMMENT; >> > + } >> > + n = sscanf(line, "%63s %63s %63s %63s %63s %63s %63s %63s %63s", >> > + s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8]); >> > + if (unlikely(n < 5 || n > 9)) { >> > + return ERROR_INPUT; >> > + } >> >> Personally I find g_strsplit(line, " ", 9) and g_strfreev() handy for >> this sort of parsing. > > I figured the above might be simpler since I wouldn't have to worry > about freeing memory. Also, scanf directly returns n, which is used > later. /me nods -- Alex Bennée