And if you do have many variables in one dataframe, you might
wish to construct the formula first using paste():
nm <- c("0", names(d)[-c(1,2)])
fo <- as.formula(paste("~", paste(nm, collapse= "+")))
d <- cbind(d, model.matrix(fo, data=d)
-Peter Ehlers
On 2010-05-16 15:30, Thomas Stewart wrote:
Maybe this will lead you to an acceptable solution. Note that changed how
the data set is created. (In your example, the numeric variables were being
converted to factor variables. It seems to me that you want something
different.) The key difference between my code and yours is that I use the
variable name in the model matrix function; that is, I use ~0+grp instead of
~0+d[,3]. As seen below, this change creates non-ugly results.
grp<- c("A", "B","B","C","C","C")
a<- c(1,4,3,4,5,6)
b<- c(5,4,5,3,4,5)
d<- data.frame(a=a,b=b,grp=grp)
str(d)
'data.frame': 6 obs. of 3 variables:
$ a : num 1 4 3 4 5 6
$ b : num 5 4 5 3 4 5
$ grp: Factor w/ 3 levels "A","B","C": 1 2 2 3 3 3
d<-cbind(d,model.matrix(~0+grp,data=d))
d
a b grp grpA grpB grpC
1 1 5 A 1 0 0
2 4 4 B 0 1 0
3 3 5 B 0 1 0
4 4 3 C 0 0 1
5 5 4 C 0 0 1
6 6 5 C 0 0 1
str(d)
'data.frame': 6 obs. of 6 variables:
$ a : num 1 4 3 4 5 6
$ b : num 5 4 5 3 4 5
$ grp : Factor w/ 3 levels "A","B","C": 1 2 2 3 3 3
$ grpA: num 1 0 0 0 0 0
$ grpB: num 0 1 1 0 0 0
$ grpC: num 0 0 0 1 1 1
If you are trying to automate the process---convert factor variables to
dummy variables without direct user input of variables names---you have
several options. Here is a quick function I wrote that you may have to
alter for your own needs.
-tgs
grp<- c("A", "B","B","C","C","C")
sex<-c("m","m","m","f","f","f")
educ<-c("none","some","some","grad","law","med")
a<- c(1,4,3,4,5,6)
b<- c(5,4,5,3,4,5)
d<- data.frame(a=a,b=b,grp=grp,sex=sex,educ=educ)
Factors.to.dummies<-function(data){
Factor.Flag<-sapply(data,is.factor)
formula<-paste("~0+",paste(colnames(data)[Factor.Flag],collapse="+"),sep="")
data2<-model.matrix(as.formula(formula),data=data)
return(cbind(data,data2))}
Factors.to.dummies(d)
a b grp sex educ grpA grpB grpC sexm educlaw educmed educnone educsome
1 1 5 A m none 1 0 0 1 0 0 1 0
2 4 4 B m some 0 1 0 1 0 0 0 1
3 3 5 B m some 0 1 0 1 0 0 0 1
4 4 3 C f grad 0 0 1 0 0 0 0 0
5 5 4 C f law 0 0 1 0 1 0 0 0
6 6 5 C f med 0 0 1 0 0 1 0 0
On Sun, May 16, 2010 at 2:24 PM, Noah Silverman<n...@smartmediacorp.com>wrote:
I could, but with close to 100 columns, its messy.
On 5/16/10 11:22 AM, Peter Ehlers wrote:
On 2010-05-16 11:06, Noah Silverman wrote:
Update,
I have it working, but now its producing really ugly labels. Must be a
small adjustment to the code. Any ideas??
##Create example data.frame
group<- c("A", "B","B","C","C","C")
a<- c(1,4,3,4,5,6)
b<- c(5,4,5,3,4,5)
d<- data.frame(cbind(a,b,group))
#create new frame with discretized group
cbind(d[,1:2], model.matrix(~0+d[,3]) )
a b d[, 3]A d[, 3]B d[, 3]C
1 1 5 1 0 0
2 4 4 0 1 0
3 3 5 0 1 0
4 4 3 0 0 1
5 5 4 0 0 1
6 6 5 0 0 1
So, as you can see, it works, but the labels for the groups don't
I then tried using the column name instead of number and still got ugly
results:
cbind(d[,1:2], model.matrix(~0+d[,"group"]) )
a b d[, "group"]A d[, "group"]B d[, "group"]C
1 1 5 1 0 0
2 4 4 0 1 0
3 3 5 0 1 0
4 4 3 0 0 1
5 5 4 0 0 1
6 6 5 0 0 1
Any ideas?
Can't you just use names(...)<- c() on your final dataframe?
-Peter Ehlers
-N
On 5/15/10 11:02 AM, Noah Silverman wrote:
Hi,
I'm looking for an easy way to discretize factors in R
I've noticed that the lm function does this automatically with a nice
result.
If I have
group<- c("A", "B","B","C","C","C")
and run:
lm(result ~ x1 + group)
The lm function has split the group into separate binary variables
{0,1}
before performing the regression. I now have:
groupA
groupB
groupC
Some of the other models that I want to try won't accept factors, so
they need to be discretized this way.
Is there a command in R for this, or some easy shortcut? (I tried
digging into the lm code, but couldn't find where this is being done.)
Thanks!
-N
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.