[EMAIL PROTECTED] wrote: >> I used "which" to obtain a subset of values from my data.frame. >> however, I find that there is a "trace" of the values I have removed. >> Any suggestions would be greatly appreciate. >> >> Below is my data: >> >> d <- data.frame( val = 1:10, >> group = sample(LETTERS[1:5], 10, repl=TRUE) ) >> >> >d >> val group >> 1 1 B >> 2 2 E >> 3 3 B >> 4 4 C >> 5 5 A >> 6 6 B >> 7 7 A >> 8 8 E >> 9 9 E >> 10 10 A >> >> ## selecting everything that is not group "A" >> d<-d[which(d$group !="A"),] >> >> > d >> val group >> 1 1 B >> 2 2 E >> 3 3 B >> 4 4 C >> 6 6 B >> 8 8 E >> 9 9 E >> >> > levels(d$group) >> [1] "A" "B" "C" "E" >> > > The (imho) unintuitive behaviour is to do with the subsetting function > [.factor, not which. There are a couple of workarounds: > In that case, your intuition needs readjustment....
There are other systems which (de facto) drop unused levels by default, and it is a real pain to work around, especially for subgroup analyses. E.g. there is no way to get PROC FREQ in SAS to include a count of zero, and barplots of ratings fro 0 to 10 lose columns "randomly" in SPSS (this _can_ be worked around, though). Anyways, it is illogical: There's no reason that a tabulation of gender distribution for (say) tenured CS professors should suddenly pretend that the female gender does not exist! > 1. Call factor to recreate the levels, and get rid of "A" > factor(d$group) > > 2. Redefine [.factor; see dropUnusedLevels in the Hmisc package. > > Regards, > Richie. > > Mathematical Sciences Unit > HSL > > > ------------------------------------------------------------------------ > ATTENTION: > > This message contains privileged and confidential info...{{dropped:20}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.