2009/7/27 Robert Smith <robertpsmith2...@gmail.com>

> Hi,
> I am using rpart decision trees to analyze customer churn. I am finding
> that
> the decision trees created are not effective because they are not able to
> recognize factors that influence churn. I have created an example situation
> below. What do I need to do to for rpart to build a tree with the variable
> experience? My guess is that this would happen if rpart used the loss
> matrix
> while creating the tree.
> > experience <- as.factor(c(rep("good",90), rep("bad",10)))
> > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5),
> rep("yes",5)))
> > table(experience, cancel)
>          cancel
> experience no yes
>      bad   5   5
>      good 85   5
> > rpart(cancel ~ experience)
> n= 100
> node), split, n, loss, yval, (yprob)
>      * denotes terminal node
> 1) root 100 10 no (0.9000000 0.1000000) *
> I tried the following commands with no success.
> rpart(cancel ~ experience, control=rpart.control(cp=.0001))
> rpart(cancel ~ experience, parms=list(split='information'))
> rpart(cancel ~ experience, parms=list(split='information'),
> control=rpart.control(cp=.0001))
> rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,10000,0), nrow=2,
> ncol=2)))
> Thanks a lot for your help.
> Best regards,
> Robert

Hi Robert,

Perhaps try a less extreme loss matrix:

rpart(cancel ~ experience, parms=list(loss=matrix(c(0,5,1,0), byrow=TRUE,

Output from Rattle:

Summary of the Tree model for Classification (built using rpart):

n= 100

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 100 50 no (0.90000000 0.10000000)
  2) experience=good 90 25 no (0.94444444 0.05555556) *
  3) experience=bad 10  5 yes (0.50000000 0.50000000) *

Classification tree:
rpart(formula = cancel ~ ., data = crs$dataset, method = "class",
    parms = list(loss = matrix(c(0, 5, 1, 0), byrow = TRUE, nrow = 2)),
    control = rpart.control(cp = 0.0001, usesurrogate = 0, maxsurrogate =

Variables actually used in tree construction:
[1] experience

Root node error: 50/100 = 0.5

n= 100

      CP nsplit rel error xerror xstd
1 0.4000      0       1.0    1.0 0.30
2 0.0001      1       0.6    0.6 0.22

TRAINING DATA Error Matrix - Counts

Predicted no yes
      no  85   5
      yes  5   5

TRAINING DATA Error Matrix - Percentages

Predicted no yes
      no  85   5
      yes  5   5

Time taken: 0.01 secs

Generated by Rattle 2009-08-02 08:24:50 gjw

        [[alternative HTML version deleted]]

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to