On Sun, Dec 12, 2010 at 12:48:30AM +0200, Tal Galili wrote: > Hello dear R-help mailing list, > > My question is *not* about how factors are implemented in R (which is, if I > understand correctly, that factors keeps numbers and assign levels to them). > My question *is* about why so many functions that work on factors don't > treat them as characters by default?
Personally, i try to use factors only when there is a specific reason for this and character type otherwise. Factors are natural in the data used for construction of a classification model or for categorical attributes, also for preparing input to table() function and related things. > Here are two simple examples: > Example one turning the characters inside a factor into numeric: > > x <- factor(4:6) > as.numeric(x) # output: 1 2 3 > as.numeric(as.character(x)) # output: 4 5 6 # isn't this what we wanted? If you are concerned with computing time, then applying as.numeric() only to the levels is probably better x <- factor(rep(4:6, times=1000000)) cpu1 <- system.time( out1 <- as.numeric(as.character(x)) ) cpu2 <- system.time( out2 <- as.numeric(levels(x))[as.integer(x)] ) rbind(cpu1, cpu2) user.self sys.self elapsed user.child sys.child cpu1 0.570 0.031 0.601 0 0 cpu2 0.042 0.027 0.070 0 0 > Is it that implementing a switch of factors to characters as the default in > some of the basic function will cause old code to break? I think that this is an important part of the reason. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.