[R] Hints for Data Mining

Lorenzo Isella Wed, 14 Sep 2011 16:13:59 -0700

Dear All,

I am recycling a previous email of mine where I asked some questionsabout clustering mixed numerical/categorical data. This time I am moreinto data mining. I am given a set of known statistical indexes {s_i},i=1,2...N for a N countries. These indexes in general are a bothnumerical and categorical variables. For each country, I also have aproperty x_i whose value is known, but that I also would like to be ableto predict correctly using a model. This is needed in order to assessthe importance of the various indexes in determining {x_i}.

There are two cases of interest


(1) all the {x_i} are numerical variables, e.g. the average life expectancy

(2) all the {x_i} are categorical variables (e.g. the fact that thecountry joins treaty A, B or C). This reminds me of discrete choice models.

Any suggestions about how to tackle this problems? In the past I usedmclust, but it is limited to all the {s_i} being numerical variables.


I saw an example of the use of glm for predicting binary variables

http://www.ats.ucla.edu/stat/R/dae/probit.htm

which may be relevant for (2). In general I know that some people useWeka for this sort of tasks, but I wonder if I can use R to get adecision tree and a confusion matrix and to be able to predict how the{x_i} would change by varying the value of one statistical index.

Many thanks for your suggestions

Lorenzo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hints for Data Mining

Reply via email to