Re: [R] Multiple sets of proportion tests

David Winsemius Sat, 25 Nov 2017 09:51:59 -0800

> On Nov 24, 2017, at 3:35 PM, Allaisone 1 <allaiso...@hotmail.com> wrote:
> 
> Thank you for clarifying this point but my main question was about how to 
> modify my code to do the analysis correctly.


You need to first clarify what your proposed statistical hypothesis might be. 
If you are doing prop.test on 300 columns you have a serious multiple 
comparisons issue in your analysis plan that you have not recognized. Removing 
the columns that "fail" a test set at nominal level of 0.05 is statistical 
malpractice.


> The code I mentioned :-
> 
> MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))


The code as written appears to have the obvious error of using `Mydata` as an 
argument inside the prop.test function. Should almost certainly be `x` instead. 
(I suspect the length of the 'x'-argument to prop.test will be on the order of 
200 and the length of n is 2, hence the error.)

It would also be ideal if you could post the output of dput(Mydata[,1:3] ).


> Results in this error : 'x' and 'n' must have the same length in the 
> prop.test(x,n).
> 
> 
> How can I modify "x' or "n" arguments so the analysis gives me the desired 
> output

You desperately need to read the help page for the function you are using. This 
need was pointed out to you, but it appears to me that you have ignored 
Thierry's advice. (Going back to your original example ... The x variable is 
supposed to be the number of success and the n variable is the number of 
trials. So in all instances n MUST be greater than or equal to x. Your data 
example is going to fail that requirement even after you correct the semantic 
error noted above.)

(And do learn to post with plain text.)
-- 
David.
> 
> shown in my previous post ?
> 
> ________________________________
> From: Thierry Onkelinx <thierry.onkel...@inbo.be>
> Sent: 24 November 2017 21:06:39
> To: Allaisone 1
> Cc: r-help@r-project.org
> Subject: Re: [R] Multiple sets of proportion tests
> 
> Hi anonymous,
> 
> ?prop.test states that it returns a list. And one of the element is
> 'p.value'.  str() on the output of prop.test() reveals that too. So
> prop.test()$p.value or prop.test()["p.value"] should work.
> 
> Best regards,
> 
> ir. Thierry Onkelinx
> Statisticus / Statistician
> 
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkel...@inbo.be
> Kliniekstraat 25, B-1070 Brussel
> www.inbo.be<http://www.inbo.be>
> 
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data. ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
> 
> 
> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
> Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
> Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 
> Brussel.
> 
> ///////////////////////////////////////////////////////////////////////////////////////////
> 
> 
> 
> 2017-11-24 12:09 GMT+01:00 Allaisone 1 <allaiso...@hotmail.com>:
>> 
>> Hi all ,
>> 
>> 
>> I have a dataframe  of 200 columns and 2 rows. The first row in each column 
>> contains the frequency of cases in group I . The second row in each column 
>> contains the frequency of cases in group II. The frequency of trails is a 
>> fixed value for group I(e.g.200) and it is also another fixed values for 
>> group II (e.g. 100). The dataset looks like this :-
>> 
>> 
>>> Mydata
>> 
>> 
>>                                      variable I      variable II    Variable 
>> III  ......... 200
>> 
>> Freq.of cases (gp I)      6493               9375               5524
>> 
>> Freq. of cases (gpII)     509                  462                 54
>> 
>> 
>> 
>> The result I need for the first column can be given using this code :
>> 
>> 
>> MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
>> for the second  column :-
>> MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..
>> 
>> 
>> I need to do the analysis for all columns and have only the columns with 
>> significant p-value results to be written in the the third row under each 
>> column so the final output has to be something like this :-
>> 
>> 
>>                                      variable I        Variable III  
>> .........
>> 
>> Freq.of cases (gp I)      6493                   5524
>> 
>> Freq. of cases (gpII)     509                      54
>> 
>> p-values                          0.02               0.010
>> 
>> Note, for example, that the 2nd column has bee removed as it resulted in a 
>> non-significant p-value result while col 1 and col 3 were included since 
>> p-value is less than 0.05.
>> 
>> I'm not sure how to get the p-values only without other details but for the 
>> analysis itself , I believe it can be done with apply() function but its not 
>> clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.
>> 
>> MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))
>> 
>> How can I modify the "n" argument part to solve the issue of non-equivalent 
>> length between "x" and "n" ?. How can I modify this further to return only 
>> significant p-values results ?. Any help would be very appreciated ..
>> 
>> Regards
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple sets of proportion tests

Reply via email to