Hi R users! I'm new to R, so I'm starting with a basic exercise in rpart.
I'm predicting if a user will churn based on past order history. I've calculated the probabilities in excel, and if user is a single order customer (1), then their probability of churn is 90%, if there are multiple orders(0) then the probability of churning is 70%. In the R model, the probability looks like it's 100% and 53%. In excel I used the count of shopper_key to calculate probabilities. So I'm wondering if R has needs a shopper_key to count? It would be helpful if someone could suggest where I'm going wrong. Thank you! Code - m1 <- rpart( churn ~ single_order , data = data2, method="anova" ) Output- n= 22041 node), split, n, deviance, yval * denotes terminal node 1) root 22041 3229.265 0.8216959 2) single_order< 0.5 8407 2092.852 0.5325324 * 3) single_order>=0.5 13634 0.000 1.0000000 * shopper_key churn single_order 1 1 0 2 1 1 3 0 0 4 1 0 5 1 1 6 1 1 7 1 0 8 1 1 9 0 1 10 1 1 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.