Dear Joshua, Thank you so much for such fast reply. Here is my thought:
I don't know if it is fair to compare means because the total samples from each factory can be very different (like, In Factory_5 with 150 total samples vs. Factory_9 with 70 total samples). Maybe it is more fair to compare frequency of Good_samples than compare means. But the frequency is bounded by 100%. Is there any way to deal with frequency? I appreciate your input! Xiang On Mon, Mar 22, 2010 at 10:41 AM, Joshua Wiley <jwiley.ps...@gmail.com>wrote: > Dear Xiang, > > Unequal sample size is not a problem for t-tests. If I understand > correctly, you do not want to pool your data because you believe the > variance of individual factories is heterogenous. Are you willing to > pool the means? You could calculate the variance for factories > individually and then pool the variances using the weighted.mean() > function (variance of each factory weighted by its sample size minus > 1). Then you could just compare the means between all the factories > from City A and B or Big and Small factories. Another option could be > to use an ANOVA (see ?aov). This should let you keep your data broken > down into subgroups. > > If you have specific theories, I would also recommend looking into > using contrast weights. With contrasts, you would end up basically > doing a one-sample t-test but it would be testing whether your theory > (given by the weights you assigned) fit the data well. The nice thing > about it, is you can include a lot of predictions (e.g., that there > will be more good samples than bad samples and that big factories will > be better than small factories and that City A will be better than > City B) all in one test. > > HTH, > > > Joshua > > > On Mon, Mar 22, 2010 at 7:47 AM, Xiang Gao <xianggao2...@gmail.com> wrote: > > Hi, Please suggest a method to answer below questions: > > > > > > Factory_ID Factory_Location Factory_Size Total_Sample > > Good_Sample Fair_Sample Bad_Sample > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > 1 City_A Big > > 100 90 10 10 > > 2 City_A Big > > 120 55 35 30 > > 3 City_A Small > > 80 40 25 15 > > > > 4 City_A Small > > 75 50 15 10 > > 5 City_B Big > > 150 80 30 40 > > 6 City_B Big > > 120 55 25 40 > > 7 City_B Big > > 125 40 80 5 > > 8 City_B Big > > 100 60 25 15 > > 9 City_B Small > > 70 45 15 10 > > 10 City_B Small > > 85 65 5 > 15 > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > (1) Is there a statistically significant different between City_A and > City_B > > for the amount of Good_Quality_Sample that they produce? > > (2) Is there a statistically significant different between Big and Small > > factories for the amount of Good_Quality_Sample that they produce? > > > > I don't think that t-test works here because the Total_Sample (i.e., the > > total number of samples) from each factories is different. > > I don't like to pool data from individual factory together. For example, > I > > don't like to pool Factory 1 and 2 together, because the variance among > > individual Factory can be quite big in real data. > > > > > > Thank you > > > > Xiang > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Joshua Wiley > Senior in Psychology > University of California, Riverside > http://www.joshuawiley.com/ > -- Xiang Gao, Ph.D. Department of Biology University of North Texas [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.