Richard Sandiford <rdsandif...@googlemail.com> writes: > "Maciej W. Rozycki" <ma...@linux-mips.org> writes: > > On Wed, 14 Jan 2015, Richard Sandiford wrote: > >> I think we just have to accept that there are so many possible > >> combinations that we can't test everything that's potentially > relevant. > >> I think it's more useful to be flexible than prescribe a particular > list. > > > > Of course flexibility is needed, I absolutely agree. I consider the > > list I quoted the base set, I've used it for all recent submissions. > > Then for each individual change I've asked myself: does it make sense > > to run all this testing? If for example a change touched `if > (TARGET_MICROMIPS)' > > code only, then clearly running any non-microMIPS testing adds no > value. > > And then: will this testing provide enough coverage? If not, then > > what else needs to be covered? > > > > As I say, testing is cheap, you can fire a bunch of test suites in > > parallel under Linux started on QEMU run in the system emulation mode. > > From my experience on decent x86 hardware whole GCC/G++ testing across > > the > > 5 configurations named will complete in just a few hours, that you can > > spend doing something else. And if any issues are found then the > > patch submitter, who's often the actual author and knows his code the > > best, is in the best position to understand what happened. > > > > OTOH chasing down a problem later on is expensive (difficult), first > > it has to be narrowed down, often based on a user bug report rather > > than the discovery of a test-suite regression. Even making a > > reproducible test case from such a report may be tough. And then you > > have the choice of either debugging the problem from scratch, or (if > > you have an easy way to figure out it is a regression, such as by > > passing the test case through an older version of the compiler whose > > binary you already have handy) bisecting the tree to find the > > offending commit (not readily available with SVN AFAIK, but I had > > cases I did it manually in the past) and starting from there. Both > ways are tedious and time consuming. > > > >> Having everyone test the same multilib combinations on the same > >> target isn't necessarily a good thing anyway. Diversity in testing > >> (between > >> developers) is useful too. > > > > Sure, people will undoubtedly use different default options, I'm sure > > folks at Cavium will compile for Octeon rather than the base > > architecture for example. Other people may have DSP enabled. Etc., > > etc... That IMHO does not preclude testing across more than just a > single configuration. > > Yeah, but that's just the way it goes. By trying to get everyone to > test with the options that matter to you, you're reducing the amount of > work you have to do when tracking regressions on those targets, but > you're saying that people who care about Octeon or the opposite > floatness have to go through the process you describe as "tedious and > time consuming". > > And you don't avoid that process anyway, since people making changes to > target-independent parts of GCC are just as likely to introduce a > regression as those making changes to MIPS-only code. If testing is > cheap and takes only a small number of hours, and if you want to make it > less tedious to track down a regression, continuous testing would give > you a narrow window for each regression. > > Submitters should be free to test on what matters to them rather than > have to test a canned set of multilibs on specific configurations.
One of my main concerns is in enabling contribution from less experienced developers and those that don't have the infrastructure available to perform wide regression testing. I would not want to instil fear in anyone that because they didn't test a specific ISA/revision then they shouldn't bother submitting their patch. The review process is fairly intense in GNU projects and the retest of code can easily stack up with just with a few configurations. Frankly, I dread having to do anything remotely like FPXX ever again as the testing drove me bonkers. I believe there is a point where we have to accept that some issues may have to be fixed after the initial patch is committed. There have been several configuration related issues addressed after FPXX was committed but having the code in tree and getting feedback from other people's favourite configuration testing can actually help speed up development as well. The majority of test failures for different MIPS configurations tend to come from the tests with expected output. Trying to ensure a test builds correctly for any set of test options and has the correct output is exceptionally hard and there is a general theme of not over-specifying test options so that the test does take on the personality of the test options if possible. Personally I am happy to go through at regular intervals and look at the results for a wide range of configurations and fix them up. It takes significantly less time to do one pass through for that kind of issue than for every new test. None of that reduces the need for thorough testing of a change to prove it is functionally correct though. So in that sense I agree that multiple configurations have to be tried if there is risk of breaking code-gen in different ways for different configs. On the original issue of micromips... I did manage to get round to a test run of micromips for mips.exp today and although I haven't checked back in history it looks like we have had micromips expected output failures for a significant time. I'll try to address some of the failures but don't want to destabilise the testsuite for non-micromips at this late stage. Thanks, Matthew