Thanks Martin for the lead. A new avenue to explore. Dhruv
-----Original Message----- From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] Sent: Wednesday, March 10, 2010 6:18 AM To: kMan Cc: Sharma, Dhruv; r-help@r-project.org Subject: Re: [R] r code to generate interaction columns >>>>> "k" == kMan <kchambe...@gmail.com> >>>>> on Tue, 9 Mar 2010 19:52:40 -0700 writes: k> Dear Dhruv, Your clarification helps, and I'm k> stumped. Sorry I cannot be of more help. k> Sincerely, KeithC. I'd say *The* answer is to use model.matrix() This allows to use R's powerful model formula language and produce the 'model matrix' aka 'design matrix' X for you. [ The Matrix package even contains a sparse.model.matrix() function which can be useful for really largish problems. E.g., the glmnet package using Lasso-like methods {{instead of Randomforest; and Trevor Hastie has quite a host of examples where glmnet methods perform better than Randomforest.}} can make use of "Matrix sparse matrices" like that. ] Read help(model.matrix) and also look at the examples there which you can run in R by examples(model.matrix) Regards, Martin Maechler, ETH Zurich k> -----Original Message----- k> From: Sharma, Dhruv [mailto:dhruv.sha...@penfed.org] k> Sent: Monday, March 08, 2010 7:51 AM k> To: kMan; r-help@r-project.org k> Subject: RE: [R] r code to generate interaction columns k> thanks Kieth. I wanted something generic code to check column data type k> and loop through and create the interaction columns automatically as I want k> to test this out as a new algorithm for data mining. k> Traditional regression may give misleading results with multi-collinearity k> and thus I wanted to take interaction terms and run them through random k> forests and rpart as they would need interaction terms to be manually k> created. k> Hope that clarifies. k> Dhruv k> -----Original Message----- k> From: kMan [mailto:kchambe...@gmail.com] k> Sent: Sunday, March 07, 2010 8:08 PM k> To: Sharma, Dhruv; r-help@r-project.org k> Subject: RE: [R] r code to generate interaction columns k> Dear Dhruv, k> You could create interaction variables manually (assuming A is your k> dependent variable). Just multiply the variables together. k> cd.int<-C*D k> ce.int<-C*E k> cde.int<-C*D*E # what about D*E, or interactions with B? k> Include those in your model, such as k> A~B+C+D+E+cd.int+cd.int+ce.int+cde.int. k> Then you can compare those models to the results you get when you specify k> the interaction in the model formula directly using the documented syntax. k> In your R-console, type ?formula, or help("formula") for details. k> Sincerely, k> KeithC. k> -----Original Message----- k> From: Sharma, Dhruv [mailto:dhruv.sha...@penfed.org] k> Sent: Saturday, March 06, 2010 10:30 AM k> To: r-help@r-project.org k> Subject: [R] r code to generate interaction columns k> Hi, k> is there a way to take a dataset and extract numeric columns and create k> interaction columns from it automatically? k> For e.g. there are 5 columns of data: A,B,C,D,E. k> CDE are numeric. k> Can someone provide code to automatically create more columns such k> as: k> 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero), k> (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by k> zero)) k> ? k> I know in glm multiplying can create terms but i want the columns to be part k> of the data set so that i can feed this into Random forest to pick out k> predictive interaction terms as regression cannot reliably handle correlated k> interaction terms. k> if anyone has some simple code that can do this that would be helpful. k> thanks k> Dhruv ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.