That's easy! First > str(test3) Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1
tells you that the internal values are 1 and 2, and the labels are "WITHOUT Contact" and "WITH Contact". If you read the help page for factor() you'll see this: levels: an optional vector of the values (as character strings) that ‘x’ might have taken. The default is the unique set of values taken by ‘as.character(x)’, sorted into increasing order _of ‘x’_. Note that this set can be specified as smaller than ‘sort(unique(x))’. labels: _either_ an optional character vector of (unique) labels for the levels (in the same order as ‘levels’ after removing those in ‘exclude’), _or_ a character string of length 1. So, when you create test3 you say that test can take values 0 and 1, and these should be labelled as "WITHOUT Contact" and "WITH Contact". So R internally codes "1" as 1 and "0" as 2 (internally R codes factors as integers, which can be both useful and dangerous), and then gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't care that they were 1 and 0, because you've told it to change the labels. If you want to filter by the original values, then don't change the labels (or at least not until after you've filtered by the original labels), or convert the filter to the new labels. You're asking for a data structure with two sets of labels, which sounds odd in general. Bob On 9 May 2017 at 12:12, <g.maub...@weinwolf.de> wrote: > Hi All, > > I am using factors in a study for the social sciences. > > I discovered the following: > > -- cut -- > > library(dplyr) > > test1 <- c(rep(1, 4), rep(0, 6)) > d_test1 <- data.frame(test) > > test2 <- factor(test1) > d_test2 <- data.frame(test2) > > test3 <- factor(test1, > levels = c(0, 1), > labels = c("WITHOUT Contact", "WITH Contact")) > d_test3 <- data.frame(test3) > > d_test1 %>% filter(test1 == 0) # works OK > d_test2 %>% filter(test2 == 0) # works OK > d_test3 %>% filter(test3 == 0) # does not work, why? > > myf <- function(ds) { > print(levels(ds$test3)) > print(labels(ds$test3)) > print(as.numeric(ds$test3)) > print(as.character(ds$test3)) > } > > # This showsthat it is not possible to access the original > # values which were the basis to build the factor: > myf(d_test3) > > -- cut -- > > Why is it not possible to use a factor with labels for filtering with the > original values? > Is there a data structure that works like a factor but gives also access > to the original values? > > Kind regards > > Georg > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bob O'Hara NOTE NEW ADDRESS!!! Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +49 1515 888 5440 Journal of Negative Results - EEB: www.jnr-eeb.org ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.