Dear Xiang, Unequal sample size is not a problem for t-tests. If I understand correctly, you do not want to pool your data because you believe the variance of individual factories is heterogenous. Are you willing to pool the means? You could calculate the variance for factories individually and then pool the variances using the weighted.mean() function (variance of each factory weighted by its sample size minus 1). Then you could just compare the means between all the factories from City A and B or Big and Small factories. Another option could be to use an ANOVA (see ?aov). This should let you keep your data broken down into subgroups.
If you have specific theories, I would also recommend looking into using contrast weights. With contrasts, you would end up basically doing a one-sample t-test but it would be testing whether your theory (given by the weights you assigned) fit the data well. The nice thing about it, is you can include a lot of predictions (e.g., that there will be more good samples than bad samples and that big factories will be better than small factories and that City A will be better than City B) all in one test. HTH, Joshua On Mon, Mar 22, 2010 at 7:47 AM, Xiang Gao <xianggao2...@gmail.com> wrote: > Hi, Please suggest a method to answer below questions: > > > Factory_ID Factory_Location Factory_Size Total_Sample > Good_Sample Fair_Sample Bad_Sample > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > 1 City_A Big > 100 90 10 10 > 2 City_A Big > 120 55 35 30 > 3 City_A Small > 80 40 25 15 > > 4 City_A Small > 75 50 15 10 > 5 City_B Big > 150 80 30 40 > 6 City_B Big > 120 55 25 40 > 7 City_B Big > 125 40 80 5 > 8 City_B Big > 100 60 25 15 > 9 City_B Small > 70 45 15 10 > 10 City_B Small > 85 65 5 15 > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > (1) Is there a statistically significant different between City_A and City_B > for the amount of Good_Quality_Sample that they produce? > (2) Is there a statistically significant different between Big and Small > factories for the amount of Good_Quality_Sample that they produce? > > I don't think that t-test works here because the Total_Sample (i.e., the > total number of samples) from each factories is different. > I don't like to pool data from individual factory together. For example, I > don't like to pool Factory 1 and 2 together, because the variance among > individual Factory can be quite big in real data. > > > Thank you > > Xiang > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Senior in Psychology University of California, Riverside http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.