On 08/31/2015 04:21 PM, Ilia Mirkin wrote: > On Mon, Aug 31, 2015 at 7:06 PM, Ian Romanick <i...@freedesktop.org> wrote: >> ping. :) >> >> On 08/10/2015 11:48 AM, Matt Turner wrote: >>> On Mon, Aug 10, 2015 at 10:12 AM, Ian Romanick <i...@freedesktop.org> wrote: >>>> From: Ian Romanick <ian.d.roman...@intel.com> >>>> >>>> On many CPU-limited applications, this is *the* hot path. The idea is >>>> to generate per-API versions of brw_draw_prims that elide some checks. >>>> This patch removes render-mode and "is everything in VBOs" checks from >>>> core-profile contexts. >>>> >>>> On my IVB laptop (which may have experienced thermal throttling): >>>> >>>> Gl32Batch7: 3.70955% +/- 1.11344% >>> >>> I'm getting 3.18414% +/- 0.587956% (n=113) on my IVB, , which probably >>> matches your numbers depending on your value of n. >>> >>>> OglBatch7: 1.04398% +/- 0.772788% >>> >>> I'm getting 1.15377% +/- 1.05898% (n=34) on my IVB, which probably >>> matches your numbers depending on your value of n. >> >> This is another thing that make me feel a little uncomfortable with the >> way we've done performance measurements in the past. If I run my test >> before and after this patch for 121 iterations, which I have done, I can >> cut the data at any point and oscillate between "no difference" or X% >> +/- some-large-fraction-of-X%. Since the before and after code for the >> compatibility profile path should be identical, "no difference" is the >> only believable result. >> >> Using a higher confidence threshold (e.g., -c 98) results in "no >> difference" throughout, as expected. I feel like 90% isn't a tight >> enough confidence interval for a lot of what we do, but I'm unsure how >> to determine what confidence level we should use. We could >> experimentally determine it by running a test some number of times and >> finding the interval that detects no change in some random partitioning >> of the test results. Ugh. > > (sorry, statistics rant below, can't help myself) > > AFAIK the standard in statistics is to use a 95% confidence interval.
I had misremembered the default CI in ministat. It does use 95%. So, s/90%/95%/g in my previous message. :) > Unless you have 'statistician' in your job title [or someone with that > job title has indicated otherwise], that's what you should probably > use. Using anything lower than that is a way of saying "This > scientific study isn't turning out the way I wanted, I'm going to have > to take matters into my own hands". > > Of course note that if you do run the same experiment 20 times, you > should expect one of those 20 times to yield a confidence interval > that does not include the true mean. And in general I'd be very > suspicious of results where the change is near the confidence interval > boundary. > > And lastly, all this statistics stuff assumes that you're evaluating > the same normal distribution repeatedly. This isn't exactly true, but > it's true enough. However you can try to get more accurate by > experimentally determining a fudge factor on the CI width. You could > run (literal) no-op experiments lots of times and fudge the output of > the CI width calculation until it matches up with empirical results, > e.g. if you run the same experiment 100x and use a 95% CI, fudge the > outcome until you end up with 5 significant results and 95 > insignificant ones. Ideally such a fudge factor should not be too > different from 1, or else you have a very non-normal distribution, and > fudge factors ain't gonna help you. Note that any computed fudge > factors could not be shared among machine groups that weren't used to > determine them, so I'm not seriously recommending this approach, but > thought I'd mention it. > > -ilia > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev