Hello, Isn't it totally counter-intuitive that if you penalize the error less the tree finds it?
See: experience <- as.factor(c(rep("good",90), rep("bad",10))) cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5),rep("yes",5))) foo <- function( i ){ tmp <- rpart(cancel ~ experience, parms=list(loss=matrix(c(0,i,1,0), byrow=TRUE,nrow=2))) nrow( tmp$frame ) } sapply( 1:20, foo ) The ouput I get is: [1] 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 So, something unexpected happens after penalization exceeds 16... Should it be? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Sun, 2009-08-02 at 08:41 +1000, Graham Williams wrote: > 2009/7/27 Robert Smith <robertpsmith2...@gmail.com> > > > Hi, > > > > I am using rpart decision trees to analyze customer churn. I am finding > > that > > the decision trees created are not effective because they are not able to > > recognize factors that influence churn. I have created an example situation > > below. What do I need to do to for rpart to build a tree with the variable > > experience? My guess is that this would happen if rpart used the loss > > matrix > > while creating the tree. > > > > > experience <- as.factor(c(rep("good",90), rep("bad",10))) > > > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5), > > rep("yes",5))) > > > table(experience, cancel) > > cancel > > experience no yes > > bad 5 5 > > good 85 5 > > > rpart(cancel ~ experience) > > n= 100 > > node), split, n, loss, yval, (yprob) > > * denotes terminal node > > 1) root 100 10 no (0.9000000 0.1000000) * > > > > I tried the following commands with no success. > > rpart(cancel ~ experience, control=rpart.control(cp=.0001)) > > rpart(cancel ~ experience, parms=list(split='information')) > > rpart(cancel ~ experience, parms=list(split='information'), > > control=rpart.control(cp=.0001)) > > rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,10000,0), nrow=2, > > ncol=2))) > > > > Thanks a lot for your help. > > > > Best regards, > > Robert > > > > Hi Robert, > > Perhaps try a less extreme loss matrix: > > rpart(cancel ~ experience, parms=list(loss=matrix(c(0,5,1,0), byrow=TRUE, > nrow=2))) > > Output from Rattle: > > Summary of the Tree model for Classification (built using rpart): > > n= 100 > > node), split, n, loss, yval, (yprob) > * denotes terminal node > > 1) root 100 50 no (0.90000000 0.10000000) > 2) experience=good 90 25 no (0.94444444 0.05555556) * > 3) experience=bad 10 5 yes (0.50000000 0.50000000) * > > Classification tree: > rpart(formula = cancel ~ ., data = crs$dataset, method = "class", > parms = list(loss = matrix(c(0, 5, 1, 0), byrow = TRUE, nrow = 2)), > control = rpart.control(cp = 0.0001, usesurrogate = 0, maxsurrogate = > 0)) > > Variables actually used in tree construction: > [1] experience > > Root node error: 50/100 = 0.5 > > n= 100 > > CP nsplit rel error xerror xstd > 1 0.4000 0 1.0 1.0 0.30 > 2 0.0001 1 0.6 0.6 0.22 > > TRAINING DATA Error Matrix - Counts > > Actual > Predicted no yes > no 85 5 > yes 5 5 > > > TRAINING DATA Error Matrix - Percentages > > Actual > Predicted no yes > no 85 5 > yes 5 5 > > Time taken: 0.01 secs > > Generated by Rattle 2009-08-02 08:24:50 gjw > ====================================================================== > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.