Re: [R] a "simple" statistic question

2010-03-22 Thread Xiang Gao
Maybe I should simplify the problem with the following smaller table. And I just want to ask whether there is any significant difference in the proportion of "Good_Sample" produced by factories located in "City_A" and "City_B". Factory_ID Factory_Location Total_Sample Good_Sample -

Re: [R] a "simple" statistic question

2010-03-22 Thread Xiang Gao
Thank you very much Joshua. I was thinking to use logistic regression with glm(). But this will pool the individual factories which share the same factor levels together. I was puzzled by how to deal with individual factory. Any idea? I will try your method anyway. Xiang On Mon, Mar 22, 2010 at

Re: [R] a "simple" statistic question

2010-03-22 Thread Joshua Wiley
Dear Xiang, Now I understand what you meant. If you are only interested in comparing the Good Samples, I think you would have to use the proportion (Good Sample/Total Sample) or something similar. Another thought would be to dummy code the data (e.g., Good = +1, Fair = 0, Bad = -1). Then you co

Re: [R] a "simple" statistic question

2010-03-22 Thread Xiang Gao
Dear Joshua, Thank you so much for such fast reply. Here is my thought: I don't know if it is fair to compare means because the total samples from each factory can be very different (like, In Factory_5 with 150 total samples vs. Factory_9 with 70 total samples). Maybe it is more fair to compare f

Re: [R] a "simple" statistic question

2010-03-22 Thread Joshua Wiley
Dear Xiang, Unequal sample size is not a problem for t-tests. If I understand correctly, you do not want to pool your data because you believe the variance of individual factories is heterogenous. Are you willing to pool the means? You could calculate the variance for factories individually and

[R] a "simple" statistic question

2010-03-22 Thread Xiang Gao
Hi, Please suggest a method to answer below questions: Factory_ID Factory_Location Factory_Size Total_Sample Good_Sample Fair_Sample Bad_Sample ---