Dear Xiang, Now I understand what you meant. If you are only interested in comparing the Good Samples, I think you would have to use the proportion (Good Sample/Total Sample) or something similar. Another thought would be to dummy code the data (e.g., Good = +1, Fair = 0, Bad = -1). Then you could compare the means. Obviously in my example the mean would be relatively less impacted by Fair than Bad samples. Another benefit of this approach (over comparing mean number of good samples from each city) is that you estimate variability within factories (based of the dummy codes) which controls for many variables relative to within city.
Once you dummy code, the test itself is not difficult. Below is a sample function. You can either include the mean, standard deviation, and n of each group OR include the raw data with each factory being a separate object in a list that you use as "x". The lambdas are the weights to apply to the means. If you want to compare City A to City B, you could use -1 for every factory in A and +1 for every factory in B (same idea as a two sample t-test, you just estimate the variability more finely because it is by factory and then pooled). A cautionary note, I just wrote this function myself. I tested that it gives the same result as the t.test() function on two samples (mine uses one-tailed p-values) for simple numeric vectors; however, I have no idea what it would do with other types of data and it may even appear to have worked but returned wrong results. ###################################### t.contrast.test <- function(x=NA, lambda, m=NULL, s=NULL, n=NULL, raw=TRUE, na.rm=TRUE) { ifelse(identical(raw, TRUE), { for(i in seq_along(x)) {m[i] <- mean(x[[i]], na.rm=na.rm)}; for(i in seq_along(x)) {s[i] <- sd(x[[i]], na.rm=na.rm)}; for(i in seq_along(x)) {n[i] <- length(x[[i]])}; NA}, { NA}) df <- sum(n-1) effect <- sum(m*lambda) s2.pooled <- weighted.mean(x=s^2, w=n-1) sample.correction <- sum((lambda^2)/n) variability <- sqrt(sample.correction*s2.pooled) t.score <- effect/variability p.score <- pt(q=t.score, df=df, lower.tail=F) r.score <- t.score/sqrt((t.score^2)+df) value <- list(t.score, p.score, r.score, s2.pooled, df) names(value) <- c("t.contrast", "p.value", "r.contrast", "pooled.variance", "df") return(value)} ###################################### I hope that all made sense. Best Regards, Joshua -- Joshua Wiley Senior in Psychology University of California, Riverside http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.