On Sat, 14 Aug 2010, ted.hard...@manchester.ac.uk wrote:

Hi Thomas,
I'm not too sure about your interpretation. Consider:

It seems hard to interpret "The formula interface is only applicable for the 
2-sample tests." any other way


Johannes' original query was about differences when there
are NAs, corresponding to different settings of "na.action".
It is perhaps possible that 'na.action="na.pass"' and
'na.action="na.exclude"' result in different pairings in the
case "paired=TRUE". However, it seems to me that the differences
he observed are, shall we say, obscure!

No, they are perfectly straightforward.  Johannes's data had two missing 
values, one in each group, but not in the same pair.

With na.omit or na.exclude, model.frame() removes the NAs. If there are the 
same number of NAs in each group, this leaves the same number of observations 
in each group. t.test.formula() splits these according to the group variable 
and passes them to t.test.default. Because of the (invalid) paired=TRUE 
argument, t.test.default assumes these are nine pairs and gets bogus answers.

On the other hand with na.pass, model.frame() does not remove NAs. 
t.test.formula() passes two sets of ten observations (including missing 
observations) to t.test.default().  Because of the paired=TRUE argument, 
t.test.default() assumes these are ten pairs, which happens to be true in this 
case, and after deleting the two pairs with missing observations it gives the 
right answer.

Regardless of the details, however, t.test.formula() can't reliably work with 
paired=TRUE because the user interface provides no way to specify which 
observations are paired. It would be possible (though bad idea in my opinion) 
to specify that paired=TRUE is allowed and that the pairing is done in the 
order the observations appear in the data. The minimal change would be to stop 
doing missing-value removal in t.test.formula, although that would be 
undesirable if a user wanted to supply some sort of na.impute() option.

I would strongly prefer having an explicit indication of pairing, eg 
paired=variable.name, or even better, paired=~variable.name. Relying on data 
frame ordering seems a really bad idea.

   -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to