Yuta, Thanks for the response.
Yuta wrote: > > You've got to state the problem little bit more clear. > > What do you mean by "set"? Is it a list of certain possible values, > available as outcomes of each single measurement (variate)? Or is it > something else? > How many variates do you have inside each sample? > What is it exactly that you want to find? Sorry, I should have been more clear. My team is working on a software system. This system comes with a set of benchmarks that exercise specific functionality. I am attempting to measure the performance impact of the changes made my my team. Each of the samples in my previous post represents a particular "build" of this software system and corresponding to it there are five measurements of a benchmark execution (each benchmark is executed five times for each build). Each measurement is time in seconds, so there isn't a list of all possible values as such. However, it seems that for specific benchmarks, the execution times seem to vary by at least some minimal amount (4.17e-07 for the samples i've posted), so the distribution of the measurements is essentially becoming discrete. Yuta wrote: > Do you want just to compare sample #1 and #2? I want to be able to compare any pair of samples (that is, "builds"). Yuta wrote: > There seems to be not enough variates for reliable result. Yes, unfortunately, the full set of benchmarks takes a while to run, and this ties up resources, etc. So the number of variates available for a particular build is limited. Yuta wrote: > Still, you may want to look at central tendencies (mean, median), i.e. > location shift of samples, homogeneity of their variances, or the overall > shape of empirical distributions. Yes, I'm basically looking at the difference between the means of the five runs between two samples. But I need an indicator of whether the difference is significant. At the moment, I'm doing a t-test, and that sort-of works, but from the results I'm getting, I'm not sure how accurate it is, so I've started to wonder if I'm doing something wrong. Yuta wrote: > If your data are NOT normally distributed The way the benchmarks are calculated, each measurement itself is a mean. I believe the mean of the five means should be normally distributed (at least, if they weren't "discrete-ized", as described above)? I guess, the crux of my question is -- does the t-test apply in this case, or should I be doing something else? Yuta wrote: > All in all it seems like you need to consult some statistical textbook = ) > Socal and Rolf is a good choice Yes, it seems so. Thanks for the recommendation. Looks like I'll be stopping by the book shop on the way home this evening :). Regards, setro -- View this message in context: http://r.789695.n4.nabble.com/Significance-test-tp3836155p3836770.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.