Hi R experts
I am just wondering if something is already available (or easily adaptable) to
do the following.
I am planning to build linear models for all possible combinations of terms, so
for example if the terms are sent into a function as this string
" X1 + X2 + X3 + X4 + X1:X2"
I would want to build models for all possible combinations of these 5 terms,
e.g.
m1 <- lm( y ~ X1 + X3 )
and capture at least the residual sum of squares and total number of model
parameters from each model produced. This will become part of a Bayesian
approach to infer actual model probabilities when specialist prior knowledge is
also introduced into the problem.
At a high level this particular problem requires something like:
1) the term 'string' to be broken down into it's elements which are separated
by "+" and, I suppose, stored in a list for easier manipulation
2) a matrix with 2^5 rows and 5 columns to be formed with a 0 present if the
term is not included and 1 if it is. Then a model will be fitted to represent
every row of this matrix and the key statistics stored in vectors of length 2^5
For N terms of course the number of models will be 2^N.
Is there anything available already? This is a very similar problem to all
subsets regression.
My skill at manipulating strings in R is very limited; can anyone recommend
some links or available functions which would make the separations and
constructions required easy to achieve?
Thanks in advance to all
Michael Hopkins
Algorithm and Statistical Modelling Expert
Upstream
23 Old Bond Street
London
W1S 4PZ
Mob +44 0782 578 7220
DL +44 0207 290 1326
Fax +44 0207 290 1321
[email protected]
www.upstreamsystems.com
IMPORTANT NOTICE
The information in this e-mail and any attached files is...{{dropped:22}}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.