Hi all,
as John pointed out, there is a way to create settings where the
studentized residuals are undefined. However, after cross-checking it
seems that the residuals are getting calculated without any error. The
problem comes up when I use outlierTest to assign a p,q value,respectively.
Below is the code and some printouts:
non.zero.orga.vector <-
c(1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,13.98) #targervector
print(meta.Matrix[,3])
x1 x28 x2 x3 x4 x51 x52 x5
x6 x7 x82
4.6 35.8 4.1 5.4 5.2 27.9 48.2 4.5
5.7 4.2 100.5
print(meta.Matrix[,10])
x1 x28 x2 x3
0.2990466 1.0497156 0.2805028 0.3517993
x4 x51 x52 x5
0.3543678 1.0178604 0.4933182 0.2810865
x6 x7 x82
0.4349550 0.3269192 3.0889747
glModel <- glm(non.zero.orga.vector ~ meta.Matrix[,3]+meta.Matrix[,10])
#create glm for testing
print(glModel$residuals) #check if residuals are calculated for every
entry in the target vector
x1 x28 x2 x3
x4 x51 x52
0.6600696 -2.5816334 0.7438969 0.4129873 0.3976498 -2.5350861
0.3128240
x5 x6 x7 x82
0.7465766 0.0102020 0.5181337 1.3143796
test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf) #test
the glm for outlier, cutoff=Inf/n.max=Inf == report everything
print(test.Res$bonf.p) #check if q-value exists
x28 x51 x52 x5 x2
x1 x7 x3
0.1915995 0.2240759 2.1637438 6.0211575 6.0306110 6.4618792 7.1932613
7.7595619
x4 x6
7.8394477 9.9437848
As you see the glm calculates residuals for x82 (which is in fact 1.314)
but the outlierTest does not assign a p/q value to it. Does anyone know why?
Thanks in advance,
Phil
On 10/17/2014 08:54 PM, John Fox wrote:
Dear Phil,
Yes, that's a bit clearer. One can invent data configurations where certain
studentized residuals are undefined. For example, try the following:
y <- c(0, 0, 0, 0, 0, 1)
x <- 1:6
xx <- (1:6 - 3.5)^2
rstudent(lm(y ~ x))
rstudent(lm(y ~ xx))
plot(x, y)
plot(xx, y)
The plots should clarify what's going on.
I'm copying to r-help since the discussion began there.
I hope this helps,
John
On Fri, 17 Oct 2014 18:35:59 +0200
Philip Stevens <chobop...@gmail.com> wrote:
Dear John,
Thank you for your fast reply. Unfortunately I am out of office right now
but I will get back to you on Monday asap with a toy example and some code.
Meanwhile let me try to explain further.
Basically not the glm but the outlierTest afterwards fails. And it only
fails if all values, used to set up the glm, are exactly 1 except for one
value which can be arbitrary large. I construct the glm from a vector of
doubles (the target values) and a vector of other numeric values(metadata).
(So this should be fit <- glm(targetVector ~ metadata) ) And want to check
if one of those doubles(in the target vector) is an outlier using
outlierTest(fit). The residuals are calculated by the glm but the
outlierTest does not report a p nor a q value for the one value in the
target vector which is not 1. And I can't figure out why...
I hope this makes it a bit clearer.
Anyways I will come back to you on Monday.
Best,
Phil
Am 17.10.2014 17:43 schrieb "John Fox" <j...@mcmaster.ca>:
Dear Phil,
After reading your posting several times, I still don't understand what
you did. As usual, having a reproducible example illustrating the error
would be a great help. I do have a guess about the source of the error:
glm() failed in some way for the problematic case.
Best,
John
------------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
On Fri, 17 Oct 2014 12:53:30 +0200
Phil <chobop...@gmail.com> wrote:
Hi guys,
I came across a strange phenomena and can't figure out why it happens by
myself so here we go.
I got a dataframe which consists of double numbers which I want to
check, row-wise if there are outliers in the rows.
So I iterate over the rows and create a glm using the numbers of that
particular row. Which might look like this:
case1)
x1 x2 x3 x4 x5 x6 x7
x8 x9 x10 x11
0.00 3.91 0.00 0.00 0.00 68.03 40.39 0.00
0.00 0.00 4.11
or like this:
case2)
x1 x2 x3 x4 x5 x6 x7
x8 x9 x10 x11
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 5.34
or any other combination of double numbers...
however, using a glm like this:
glModel <- glm(vector ~ some_other_meta_data_which_is_double_numbers)
and testing it with:
test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf)
I always get a result consisting of the desired p and q values but not
if the vector I use looks like case2. There is no error message and the
computation does not stop either.
However, all p and q values are produced except for the last value x11.
Any idea why this particular value gets dropped from the output of the
outlierTest Method in the car package.
Here is the sessioninfo:
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_1.0.0 car_2.0-21 RColorBrewer_1.0-5 iNEXT_1.0
[5] vegan_2.0-10 lattice_0.20-29 permute_0.8-3
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 compiler_3.1.1 digest_0.6.4 grid_3.1.1
[5] gtable_0.1.2 labeling_0.3 MASS_7.3-33 munsell_0.4.2
[9] nnet_7.3-8 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2
[13] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.1.1
Any help is highly appreciated.
Thanks
Phil
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.