2009/7/27 Robert Smith <robertpsmith2...@gmail.com> > Hi, > > I am using rpart decision trees to analyze customer churn. I am finding > that > the decision trees created are not effective because they are not able to > recognize factors that influence churn. I have created an example situation > below. What do I need to do to for rpart to build a tree with the variable > experience? My guess is that this would happen if rpart used the loss > matrix > while creating the tree. > > > experience <- as.factor(c(rep("good",90), rep("bad",10))) > > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5), > rep("yes",5))) > > table(experience, cancel) > cancel > experience no yes > bad 5 5 > good 85 5 > > rpart(cancel ~ experience) > n= 100 > node), split, n, loss, yval, (yprob) > * denotes terminal node > 1) root 100 10 no (0.9000000 0.1000000) * > > I tried the following commands with no success. > rpart(cancel ~ experience, control=rpart.control(cp=.0001)) > rpart(cancel ~ experience, parms=list(split='information')) > rpart(cancel ~ experience, parms=list(split='information'), > control=rpart.control(cp=.0001)) > rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,10000,0), nrow=2, > ncol=2))) > > Thanks a lot for your help. > > Best regards, > Robert >
Hi Robert, Perhaps try a less extreme loss matrix: rpart(cancel ~ experience, parms=list(loss=matrix(c(0,5,1,0), byrow=TRUE, nrow=2))) Output from Rattle: Summary of the Tree model for Classification (built using rpart): n= 100 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 100 50 no (0.90000000 0.10000000) 2) experience=good 90 25 no (0.94444444 0.05555556) * 3) experience=bad 10 5 yes (0.50000000 0.50000000) * Classification tree: rpart(formula = cancel ~ ., data = crs$dataset, method = "class", parms = list(loss = matrix(c(0, 5, 1, 0), byrow = TRUE, nrow = 2)), control = rpart.control(cp = 0.0001, usesurrogate = 0, maxsurrogate = 0)) Variables actually used in tree construction: [1] experience Root node error: 50/100 = 0.5 n= 100 CP nsplit rel error xerror xstd 1 0.4000 0 1.0 1.0 0.30 2 0.0001 1 0.6 0.6 0.22 TRAINING DATA Error Matrix - Counts Actual Predicted no yes no 85 5 yes 5 5 TRAINING DATA Error Matrix - Percentages Actual Predicted no yes no 85 5 yes 5 5 Time taken: 0.01 secs Generated by Rattle 2009-08-02 08:24:50 gjw ====================================================================== [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.