Hi Heinz OK, Point taken. I must say I do not do concatenation of factors very often so this feature does not bothers me much.
Best regards Petr Heinz Tuechler <tuech...@gmx.at> napsal dne 13.12.2010 13:52:17: > Hello Petr, > > don't want to convince you. If you like the following: > > x <- factor(1:4, labels=c("one", "two", "three", "four")) > > y <- factor(3:5, labels=c("three", "four", "five")) > > data.frame(character=c(as.character(x), as.character(y)), numeric=c(x, y)) > > character numeric > 1 one 1 > 2 two 2 > 3 three 3 > 4 four 4 > 5 three 1 > 6 four 2 > 7 five 3 > > For me the behaviour of character vectors is easier to follow and > less errror prone. > > cx <- c("one", "two", "three", "four") > > cy <- c("three", "four", "five") > > c(cx, cy) > > [1] "one" "two" "three" "four" "three" "four" "five" > > > >Anyway it is maybe more about personal habits than about bad factor > >"features" > > I agree with you regarding personal habits. It's not the features of > factors. For me it's the rather inconsistent use in functions like > c() or print(). > If you print a factor, you see it's levels, but if you combine it > using c(), you combine the famouse implementation specific underlying > integer vector. > > best regards, > > Heinz > > At 13.12.2010 08:50 +0100, Petr PIKAL wrote: > >Hi > > > >r-help-boun...@r-project.org napsal dne 12.12.2010 21:00:37: > > > > > At 12.12.2010 00:48 +0200, Tal Galili wrote: > > > >Hello dear R-help mailing list, > > > > > > > >My question is *not* about how factors are implemented in R (which is, > >if I > > > >understand correctly, that factors keeps numbers and assign levels to > >them). > > > >My question *is* about why so many functions that work on factors don't > > > >treat them as characters by default? > > > > > > > >Here are two simple examples: > > > >Example one turning the characters inside a factor into numeric: > > > > > > > >x <- factor(4:6) > > > >as.numeric(x) # output: 1 2 3 > > > >as.numeric(as.character(x)) # output: 4 5 6 # isn't this what we > >wanted? > > > > > > > > > > > >Example two, using strsplit on a factor: > > > > > > > >x <- factor(paste(letters[4:6], 4:6, sep="A")) > > > >strsplit(x, "A") # will result in an error: # Error in strsplit(x, > >"A") : > > > >non-character argument > > > >strsplit(as.character(x), "A") # will work and split > > > > > > > > > > > >So what is the reason this is the case? > > > >Is it that implementing a switch of factors to characters as the > >default in > > > >some of the basic function will cause old code to break? > > > >Is it a better design in some other way? > > > > > > > >I am curious to know the reason for this. > > > > > > In my view the answer can be found implicitly in the language > >definition. > > > > > > "Factors are currently implemented using an integer array to specify > > > the actual levels and a second array of names that are mapped to the > > > integers. Rather unfortunately users often make use of the > > > implementation in order to make some calculations easier." > > > > > > It is the "unfortunate" use of factors that seems generally accepted, > > > even if the language definition continues: > > > > > > "This, however, is an implementation issue and is not guaranteed to > > > hold in all implementations of R." > > > > > > Personally, like some others, I avoid factors, except in cases, where > > > they represent a statistical concept. > > > >On contrary I find factors quite useful. Consider possibility to change > >its levels > > > > > set.seed(111) > > > x <- factor(sample(1:4, 20, replace=T), labels=c("one", "two", "three", > >"four")) > > > x > > [1] three three two three two two one three two one three > >three > >[13] one one one two one four two three > >Levels: one two three four > > > levels(x)[3:4] <- "more" > > > x > > [1] more more two more two two one more two one more more one one > >one > >[16] two one more two more > >Levels: one two more > > > >I believe that if x is character, it can be also done but factor way seems > >to me more convenient. I also use point distinction in plots by > >pch=as.numeric(some.factor) quite often. > > > >Anyway it is maybe more about personal habits than about bad factor > >"features" > > > >Regards > >Petr > > > > > > > > Certainly I would agree with you that, if only reading the "R > > > Language Definition" and not the documentation of the function > > > factor, one would rather expect functions like as.numeric or strsplit > > > to operate on the levels of a factor and not on the underlying, > > > implementation specific, integer array. > > > > > > Heinz > > > > > > > > > > > > >Thank you for your reading, > > > >Tal > > > > > > > >----------------Contact > > > >Details:------------------------------------------------------- > > > >Contact me: tal.gal...@gmail.com | 972-52-7275845 > > > >Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) > >| > > > >www.r-statistics.com (English) > > > > > >------------------------------------------------------------------- > > --------------------------- > > > > > > > > [[alternative HTML version deleted]] > > > > > > > >______________________________________________ > > > >R-help@r-project.org mailing list > > > >https://stat.ethz.ch/mailman/listinfo/r-help > > > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > > > >and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.