[R] Decision Tree: Am I Missing Anything?

Vik Rubenfeld Thu, 20 Sep 2012 19:49:01 -0700

I'm working with some data from which a client would like to make a decision 
tree predicting brand preference based on inputs such as price, speed, etc.  
After running the decision tree analysis using rpart, it appears that this data 
is not capable of predicting brand preference.


Here's the data set:

BRND      PRI       PROM      FORM      FAMI      DRRE      FREC      MODE      
SPED      REVW
Brand 1       0.6989    0.4731    0.7849    0.6989    0.7419    0.6022    
0.8817    0.9032    0.6452
Brand 2       0.8621    0.3793    0.8621     0.931    0.7586    0.6897    
0.8966    0.9655    0.8276
Brand 3          0.6       0.1       0.6       0.7       0.9       0.7       
0.7       0.8       0.6
Brand 4       0.6429      0.25    0.5714       0.5    0.6071       0.5      
0.75    0.8214       0.5
Brand 5       0.7586    0.4224    0.7328    0.6638    0.7328    0.6379    
0.8621    0.8621    0.6897
Brand 6         0.75    0.0833    0.5833    0.4167       0.5    0.4167      
0.75    0.6667       0.5
Brand 7       0.7742    0.4839    0.6129    0.5161    0.8065    0.6452    
0.7742    0.9032    0.6129
Brand 8       0.6429    0.2679    0.6964    0.7143     0.875    0.5536    
0.8036    0.9464    0.6607
Brand 9        0.575     0.175      0.65      0.55     0.625     0.375     
0.825      0.85     0.475
Brand 10      0.8095    0.5238    0.6667    0.6429    0.6667    0.5952    
0.8571    0.8095    0.5714
Brand 11      0.6308       0.3    0.6077    0.5846    0.6769    0.5231    
0.7462    0.8846       0.6
Brand 12      0.7212    0.3152    0.7152    0.6545    0.6606     0.503    
0.8061    0.8909       0.6
Brand 13      0.7419    0.2258    0.6129    0.5806    0.7097    0.6129     
0.871    0.9677    0.3226
Brand 14      0.7176    0.2706    0.6353    0.5647    0.6941    0.4471    
0.7176    0.9412    0.5176
Brand 15      0.7287    0.3437    0.5995    0.5788    0.8527    0.5478    
0.8217    0.8941    0.6227
Brand 16         0.7       0.4       0.6       0.4         1       0.4       
0.9       0.9       0.5
Brand 17      0.7193    0.3333    0.6667    0.6667    0.7018    0.5263    
0.7719    0.8596    0.7018
Brand 18      0.7778    0.4127    0.6508    0.6349    0.7937    0.6032    
0.8571    0.9206     0.619
Brand 19      0.8028    0.2817    0.6197    0.4366    0.7042    0.4366    
0.7183    0.9155    0.5634
Brand 20      0.7736    0.2453    0.6226    0.3774    0.5849    0.3019     
0.717    0.8679    0.4717
Brand 21      0.8481    0.2152    0.6329    0.4051    0.6329    0.4557    
0.6962    0.8481    0.3418
Brand 22        0.75    0.3333    0.6667       0.5    0.6667    0.5833    
0.9167    0.9167    0.4167

Here are my R commands:

> test.df = read.csv("test.csv")
> head(test.df)
     BRND    PRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000

> testTree = rpart(BRAND~PRI  + PROM  + FORM +  FAMI+   DRRE +  FREC  + MODE +  
> SPED +  REVW, method="class", data=test.df)

> printcp(testTree)

Classification tree:
rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
    MODE + SPED + REVW, data = test.df, method = "class")

Variables actually used in tree construction:
[1] FORM

Root node error: 21/22 = 0.95455

n= 22 

        CP nsplit rel error xerror xstd
1 0.047619      0   1.00000 1.0476    0
2 0.010000      1   0.95238 1.0476    0

I note that only one variable (FORM) was actually used in tree construction. 
When I run a plot using:

> plot(testTree)
> text(testTree)

...I get a tree with one branch.  

It looks to me like I'm doing everything right, and this data is just not 
capable of predicting brand preference. 

Am I missing anything?

Thanks very much in advance for any thoughts!

-Vik





        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decision Tree: Am I Missing Anything?

Reply via email to