On Thu, 17 Feb 2011, Andrew Ziem wrote:
After ctree builds a tree, how would I determine the direction missing values
follow by examining the BinaryTree-class object? For instance in the example
below Bare.nuclei has 16 missing values and is used for the first split, but
the missing values are not listed in either set of factors. (I have the same
question for missing values among numeric [non-factor] values, but I assume the
answer is similar.)
Hi Andrew,
ctree() doesn't treat missings in factors as a category in its own right.
Instead, it uses surrogate splits to determine the daughter node
observations with missings in the primary split variable are send to (you
need to specify `maxsurrogates' in ctree_control()).
However, you can recode your factor and add NA to the levels. This will
lead to the intended behaviour.
Best,
Torsten
require(party)
require(mlbench)
data(BreastCancer)
BreastCancer$Id <- NULL
ct <- ctree(Class ~ . , data=BreastCancer, controls = ctree_control(maxdepth =
1))
ct
Conditional inference tree with 2 terminal nodes
Response: Class
Inputs: Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size,
Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses
Number of observations: 699
1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
2)* weights = 448
1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
3)* weights = 251
sum(is.na(BreastCancer$Bare.nuclei))
[1] 16
nodes(ct, 1)[[1]]$psplit
Bare.nuclei == {1, 2}
nodes(ct, 1)[[1]]$ssplit
list()
Based on below, the answer is node 2, but I don't see it in the object.
sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
[1] 448
sum(BreastCancer$Bare.nuclei %in% c(1,2))
[1] 432
sum(BreastCancer$Bare.nuclei %in% c(3:10))
[1] 251
Andrew
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.