I have a huge data set with thousands of variable and one binary variable. I know that most of the variables are correlated and are not good predictors... but...
It is very hard to start modeling with such a huge dataset. What would be your suggestion. How to make a first cut... how to eliminate most of the variables but not to ignore potential interactions... for example, maybe variable A is not good predictor and variable B is not good predictor either, but maybe A and B together are good predictor... Any suggestion is welcomed ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.