Try the function ctree() in the package party or earth() in earth. You can use factor variable as is, or you can transform the factor to binary variables (i.e., is_P is 0 or 1, is_D is 0 or 1). In the second case, you can use any algorithm, and earth() automatically transforms factors to binary features.
However, you may find 120 variables is not much data. Andrew -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lorenzo Isella Sent: Thursday, February 17, 2011 7:14 AM To: r-help Subject: [R] Categorical Variables and Machine Learning Dear All, Please consider a dataframe like the one below (I am showing only a few rows). > role degree strength weight count disparity intermittency > P 10 82 18017 2 2.317073 5.550314e-05 > P 7 529 4345 60 5.178466 6.904488e-03 > P 8 609 4382 10 6.204535 1.141031e-03 > D 42 230 6910 88 1.791153 6.367583e-03 You have a categorical variable (the role variable) which can assume only a few values ("P","D","C","N","A") referring to different individuals for whom you collect some extra properties (namely, degree, strength, weight, disparity and intermittency, like in the table above). My goal is to find the most suitable property (or combination of properties) to guess the role of an individual. It looks like a typical machine learning problem, but I have categorical variables to predict. I am drowning in the wealth of R packages for machine learning, but I really would like something simple and easy to use (consider that the dataset covers only 120 individuals, so performance is not a problem). Any suggestion is appreciated. Cheers Lorenzo ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.