[R] Significance test

2011-09-23 Thread setrofim
I have a bunch of benchmark measurements that look something like this:

sample.10.000.0625000.0583300.058330 
0.058330
sample.20.0583300.0583300.0583300.058330 
0.058330
sample.30.0625000.0625000.0708300.062500 
0.00

i.e each measurement take on one of a set of values. The set values isn't
fixed, but they seem to go up increments; in this case, it appears to be
about 4.17e-07 (e.g. it would be impossible for a measurement to be
0.066440).

What is way to test for significant differences between two samples? 

Sorry if this is a noob question, but I'm kinda new to this. The two tests
I'm aware of are the Student's t and Wilcoxon Rank Sum; neither seems to
apply here. I've tried Googling this, but haven't found anything useful
(maybe I'm not using the right terms...).

Any help would be greatly appreciated.

Regards,
setro





--
View this message in context: 
http://r.789695.n4.nabble.com/Significance-test-tp3836155p3836155.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Significance test

2011-09-23 Thread setrofim
Yuta,

Thanks for the response.


Yuta wrote:
> 
> You've got to state the problem little bit more clear.
> 
> What do you mean by "set"? Is it a list of certain possible values,
> available as outcomes of each single measurement (variate)? Or is it
> something else?
> How many variates do you have inside each sample?
> What is it exactly that you want to find? 

Sorry, I should have been more clear. My team is working on a software
system. This system comes with a set of benchmarks that exercise specific
functionality. I am attempting to measure the performance impact of the
changes made my my team. 

Each of the samples in my previous post represents a particular "build" of
this software system and corresponding to it there are five measurements of
a benchmark execution (each benchmark is executed five times for each
build). 

Each measurement is time in seconds, so there isn't a list of all possible
values as such. However, it seems that for specific benchmarks, the
execution times seem to vary by at least some minimal amount (4.17e-07 for
the samples i've posted), so the distribution of the measurements is
essentially becoming discrete.


Yuta wrote:
> Do you want just to compare sample #1 and #2?
I want to be able to compare any pair of samples (that is, "builds"). 


Yuta wrote:
>  There seems to be not enough variates for reliable result.
Yes, unfortunately, the full set of benchmarks takes a while to run, and
this ties up resources, etc. So the number of variates available for a
particular build is limited. 


Yuta wrote:
>  Still, you may want to look at central tendencies (mean, median), i.e.
> location shift of samples, homogeneity of their variances, or the overall
> shape of empirical distributions.
Yes, I'm basically looking at the difference between the means of the five
runs  between two samples. But I need an indicator of whether the difference
is significant. At the moment, I'm doing a t-test, and that sort-of works,
but from the results I'm getting, I'm not sure how accurate it is, so I've
started to wonder if I'm doing something wrong.


Yuta wrote:
>  If your data are NOT normally distributed
The way the benchmarks are calculated, each measurement itself is a mean. I
believe the mean of the five means should be normally distributed (at least,
if they weren't "discrete-ized", as described above)? I guess, the crux of
my question is -- does the t-test apply in this case, or should I be doing
something else?


Yuta wrote:
> All in all it seems like you need to consult some statistical textbook = )
> Socal and Rolf is a good choice 
Yes, it seems so. Thanks for the recommendation. Looks like I'll be stopping
by the book shop on the way home this evening :).

Regards,
setro

--
View this message in context: 
http://r.789695.n4.nabble.com/Significance-test-tp3836155p3836770.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.