Thank you very much Joshua. I was thinking to use logistic regression with glm(). But this will pool the individual factories which share the same factor levels together. I was puzzled by how to deal with individual factory. Any idea? I will try your method anyway.
Xiang On Mon, Mar 22, 2010 at 12:55 PM, Joshua Wiley <jwiley.ps...@gmail.com>wrote: > Dear Xiang, > > Now I understand what you meant. If you are only interested in > comparing the Good Samples, I think you would have to use the > proportion (Good Sample/Total Sample) or something similar. Another > thought would be to dummy code the data (e.g., Good = +1, Fair = 0, > Bad = -1). Then you could compare the means. Obviously in my example > the mean would be relatively less impacted by Fair than Bad samples. > Another benefit of this approach (over comparing mean number of good > samples from each city) is that you estimate variability within > factories (based of the dummy codes) which controls for many variables > relative to within city. > > Once you dummy code, the test itself is not difficult. Below is a > sample function. You can either include the mean, standard deviation, > and n of each group OR include the raw data with each factory being a > separate object in a list that you use as "x". The lambdas are the > weights to apply to the means. If you want to compare City A to City > B, you could use -1 for every factory in A and +1 for every factory in > B (same idea as a two sample t-test, you just estimate the variability > more finely because it is by factory and then pooled). A cautionary > note, I just wrote this function myself. I tested that it gives the > same result as the t.test() function on two samples (mine uses > one-tailed p-values) for simple numeric vectors; however, I have no > idea what it would do with other types of data and it may even appear > to have worked but returned wrong results. > > > ###################################### > t.contrast.test <- function(x=NA, lambda, m=NULL, s=NULL, n=NULL, > raw=TRUE, na.rm=TRUE) { > ifelse(identical(raw, TRUE), { > for(i in seq_along(x)) {m[i] <- mean(x[[i]], na.rm=na.rm)}; > for(i in seq_along(x)) {s[i] <- sd(x[[i]], na.rm=na.rm)}; > for(i in seq_along(x)) {n[i] <- length(x[[i]])}; > NA}, { > NA}) > df <- sum(n-1) > effect <- sum(m*lambda) > s2.pooled <- weighted.mean(x=s^2, w=n-1) > sample.correction <- sum((lambda^2)/n) > variability <- sqrt(sample.correction*s2.pooled) > t.score <- effect/variability > p.score <- pt(q=t.score, df=df, lower.tail=F) > r.score <- t.score/sqrt((t.score^2)+df) > value <- list(t.score, p.score, r.score, s2.pooled, df) > names(value) <- c("t.contrast", "p.value", "r.contrast", > "pooled.variance", "df") > return(value)} > ###################################### > > > I hope that all made sense. > > Best Regards, > > Joshua > > -- > Joshua Wiley > Senior in Psychology > University of California, Riverside > http://www.joshuawiley.com/ > -- Xiang Gao, Ph.D. Department of Biology University of North Texas [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.