On Sat, Jul 17, 2010 at 9:03 PM, Peter Dalgaard <pda...@gmail.com> wrote: > Ista Zahn wrote: >> Hi, >> On Fri, Jul 16, 2010 at 5:18 PM, CC <turtysm...@gmail.com> wrote: >>> I am sure this is a very basic question: >>> >>> I have 600,000 categorical variables in a data.frame - each of which is >>> classified as "0", "1", or "2" >>> >>> What I would like to do is collapse "1" and "2" and leave "0" by itself, >>> such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in >>> the end I only want "0" and "1" as categories for each of the variables. >> >> Something like this should work >> >> for (i in names(dat)) { >> dat[, i] <- factor(dat[, i], levels = c("0", "1", "2"), labels = >> c("0", "1", "1)) >> } > > Unfortunately, it won't: > >> d <- 0:2 >> factor(d, levels=c(0,1,1)) > [1] 0 1 <NA> > Levels: 0 1 1 > Warning message: > In `levels<-`(`*tmp*`, value = c("0", "1", "1")) : > duplicated levels will not be allowed in factors anymore >
I stand corrected. Thank you Peter. > > This effect, I have been told, goes way back to design choices in S > (that you can have repeated level names) plus compatibility ever since. > > It would make more sense if it behaved like > > d <- factor(d); levels(d) <- c(0,1,1) > > and maybe, some time in the future, it will. Meanwhile, the above is the > workaround. > > (BTW, if there are 600000 variables, you probably don't want to iterate > over their names, more likely "for(i in seq_along(dat))...") > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.