Dear Joshua,

Thank you so much for such fast reply. Here is my thought:

I don't know if it is fair to compare means because the total samples from
each factory can be very different (like, In Factory_5 with 150 total
samples vs. Factory_9 with 70 total samples). Maybe it is more fair to
compare frequency of Good_samples than compare means. But the frequency is
bounded by 100%. Is there any way to deal with frequency? I appreciate your
input!

Xiang

On Mon, Mar 22, 2010 at 10:41 AM, Joshua Wiley <jwiley.ps...@gmail.com>wrote:

> Dear Xiang,
>
> Unequal sample size is not a problem for t-tests.  If I understand
> correctly, you do not want to pool your data because you believe the
> variance of individual factories is heterogenous.  Are you willing to
> pool the means?  You could calculate the variance for factories
> individually and then pool the variances using the weighted.mean()
> function (variance of each factory weighted by its sample size minus
> 1).  Then you could just compare the means between all the factories
> from City A and B or Big and Small factories.  Another option could be
> to use an ANOVA (see ?aov).  This should let you keep your data broken
> down into subgroups.
>
> If you have specific theories, I would also recommend looking into
> using contrast weights.  With contrasts, you would end up basically
> doing a one-sample t-test but it would be testing whether your theory
> (given by the weights you assigned) fit the data well.  The nice thing
> about it, is you can include a lot of predictions (e.g., that there
> will be more good samples than bad samples and that big factories will
> be better than small factories and that City A will be better than
> City B) all in one test.
>
> HTH,
>
>
> Joshua
>
>
> On Mon, Mar 22, 2010 at 7:47 AM, Xiang Gao <xianggao2...@gmail.com> wrote:
> > Hi, Please suggest a method to answer below questions:
> >
> >
> > Factory_ID   Factory_Location   Factory_Size       Total_Sample
> > Good_Sample   Fair_Sample   Bad_Sample
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > 1                  City_A                      Big
> > 100                      90                        10                 10
> > 2                  City_A                      Big
> > 120                     55                        35                 30
> > 3                  City_A                      Small
> > 80                      40                         25                15
> >
> > 4                  City_A                      Small
> > 75                      50                         15                10
> > 5                  City_B                      Big
> > 150                      80                         30                40
> > 6                  City_B                      Big
> > 120                      55                         25                40
> > 7                  City_B                      Big
> > 125                      40                         80                  5
> > 8                  City_B                      Big
> > 100                     60                         25                15
> > 9                  City_B                      Small
> > 70                       45                         15                 10
> > 10                City_B                      Small
> > 85                       65                           5
> 15
> >
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > (1) Is there a statistically significant different between City_A and
> City_B
> > for the amount of Good_Quality_Sample that they produce?
> > (2) Is there a statistically significant different between Big and Small
> > factories for the amount of Good_Quality_Sample that they produce?
> >
> > I don't think that t-test works here because the Total_Sample (i.e., the
> > total number of samples) from each factories is different.
> > I don't like to pool data from individual factory together. For example,
> I
> > don't like to pool Factory 1 and 2 together, because the variance among
> > individual Factory can be quite big in real data.
> >
> >
> > Thank you
> >
> > Xiang
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Senior in Psychology
> University of California, Riverside
> http://www.joshuawiley.com/
>



-- 
Xiang Gao, Ph.D.
Department of Biology
University of North Texas

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to