Re: [R] Discretize factors?

Thomas Stewart Sun, 16 May 2010 14:32:34 -0700

Maybe this will lead you to an acceptable solution.  Note that changed how
the data set is created.  (In your example, the numeric variables were being
converted to factor variables.  It seems to me that you want something
different.)  The key difference between my code and yours is that I use the
variable name in the model matrix function; that is, I use ~0+grp instead of
~0+d[,3].  As seen below, this change creates non-ugly results.


> grp <- c("A", "B","B","C","C","C")
> a <- c(1,4,3,4,5,6)
> b <- c(5,4,5,3,4,5)
> d <- data.frame(a=a,b=b,grp=grp)
>
> str(d)
'data.frame':   6 obs. of  3 variables:
 $ a  : num  1 4 3 4 5 6
 $ b  : num  5 4 5 3 4 5
 $ grp: Factor w/ 3 levels "A","B","C": 1 2 2 3 3 3
>
> d<-cbind(d,model.matrix(~0+grp,data=d))
>
> d
  a b grp grpA grpB grpC
1 1 5   A    1    0    0
2 4 4   B    0    1    0
3 3 5   B    0    1    0
4 4 3   C    0    0    1
5 5 4   C    0    0    1
6 6 5   C    0    0    1
> str(d)
'data.frame':   6 obs. of  6 variables:
 $ a   : num  1 4 3 4 5 6
 $ b   : num  5 4 5 3 4 5
 $ grp : Factor w/ 3 levels "A","B","C": 1 2 2 3 3 3
 $ grpA: num  1 0 0 0 0 0
 $ grpB: num  0 1 1 0 0 0
 $ grpC: num  0 0 0 1 1 1

If you are trying to automate the process---convert factor variables to
dummy variables without direct user input of variables names---you have
several options.  Here is a quick function I wrote that you may have to
alter for your own needs.

-tgs

grp <- c("A", "B","B","C","C","C")
sex<-c("m","m","m","f","f","f")
educ<-c("none","some","some","grad","law","med")
a <- c(1,4,3,4,5,6)
b <- c(5,4,5,3,4,5)
d <- data.frame(a=a,b=b,grp=grp,sex=sex,educ=educ)

Factors.to.dummies<-function(data){
Factor.Flag<-sapply(data,is.factor)
formula<-paste("~0+",paste(colnames(data)[Factor.Flag],collapse="+"),sep="")
data2<-model.matrix(as.formula(formula),data=data)
return(cbind(data,data2))}

Factors.to.dummies(d)
  a b grp sex educ grpA grpB grpC sexm educlaw educmed educnone educsome
1 1 5   A   m none    1    0    0    1       0       0        1        0
2 4 4   B   m some    0    1    0    1       0       0        0        1
3 3 5   B   m some    0    1    0    1       0       0        0        1
4 4 3   C   f grad    0    0    1    0       0       0        0        0
5 5 4   C   f  law    0    0    1    0       1       0        0        0
6 6 5   C   f  med    0    0    1    0       0       1        0        0

On Sun, May 16, 2010 at 2:24 PM, Noah Silverman <n...@smartmediacorp.com>wrote:

> I could, but with close to 100 columns, its messy.
>
>
> On 5/16/10 11:22 AM, Peter Ehlers wrote:
> > On 2010-05-16 11:06, Noah Silverman wrote:
> >> Update,
> >>
> >> I have it working, but now its producing really ugly labels.  Must be a
> >> small adjustment to the code.  Any ideas??
> >>
> >> ##Create example data.frame
> >> group<- c("A", "B","B","C","C","C")
> >> a<- c(1,4,3,4,5,6)
> >> b<- c(5,4,5,3,4,5)
> >> d<- data.frame(cbind(a,b,group))
> >>
> >> #create new frame with discretized group
> >>> cbind(d[,1:2], model.matrix(~0+d[,3]) )
> >>    a b d[, 3]A d[, 3]B d[, 3]C
> >> 1 1 5       1       0       0
> >> 2 4 4       0       1       0
> >> 3 3 5       0       1       0
> >> 4 4 3       0       0       1
> >> 5 5 4       0       0       1
> >> 6 6 5       0       0       1
> >>
> >>
> >> So, as you can see, it works, but the labels for the groups don't
> >>
> >> I then tried using the column name instead of number and still got ugly
> >> results:
> >>
> >>> cbind(d[,1:2], model.matrix(~0+d[,"group"]) )
> >>    a b d[, "group"]A d[, "group"]B d[, "group"]C
> >> 1 1 5             1             0             0
> >> 2 4 4             0             1             0
> >> 3 3 5             0             1             0
> >> 4 4 3             0             0             1
> >> 5 5 4             0             0             1
> >> 6 6 5             0             0             1
> >>
> >>
> >>
> >> Any ideas?
> >>
> >
> > Can't you just use names(...) <- c() on your final dataframe?
> >
> >  -Peter Ehlers
> >
> >> -N
> >>
> >>
> >>
> >> On 5/15/10 11:02 AM, Noah Silverman wrote:
> >>> Hi,
> >>>
> >>> I'm looking for an easy way to discretize factors in R
> >>>
> >>> I've noticed that the lm function does this automatically with a nice
> >>> result.
> >>>
> >>> If I have
> >>>
> >>> group<- c("A", "B","B","C","C","C")
> >>>
> >>> and run:
> >>>
> >>> lm(result ~ x1 + group)
> >>>
> >>> The lm function has split the group into separate binary variables
> >>> {0,1}
> >>> before performing the regression.  I now have:
> >>> groupA
> >>> groupB
> >>> groupC
> >>>
> >>> Some of the other models that I want to try won't accept factors, so
> >>> they need to be discretized this way.
> >>>
> >>> Is there a command in R for this, or some easy shortcut?  (I tried
> >>> digging into the lm code, but couldn't find where this is being done.)
> >>>
> >>> Thanks!
> >>>
> >>> -N
> >>>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Discretize factors?

Reply via email to