Dear R-List, My questions concerns missing values. Specifically, is is possible to use different "types" of missingness in a dataset and not a one-size-fits-all NA? For example, data may be missing because of an outright refusal by a respondent to answer a question, or because she didn't know an answer, or because the item simply did not apply. In later analysis it is sometimes useful to be able to distinguish between the cases, but nonetheless have them all treated as missing when using, say, lm( ). In Stata this is possible by using different missing value indicators. The standard one is a period '.' whereas '.a' and '.b' etc are treated as missing too, but can all be distinguished from another (they are even ordinal such that . < .a < .b). To give a simplistic example in R, let
> dat <- data.frame( + hours = c(36, 40, 40, 0, 37.5, 0, 36, 20, 40), + wage = c( 15.5, 7.5, 8, -1, 17.5, -1, -2, 13, -2)) > dat hours wage 1 36.0 15.5 2 40.0 7.5 3 40.0 8.0 4 0.0 -1.0 5 37.5 17.5 6 0.0 -1.0 7 36.0 -2.0 8 20.0 13.0 9 40.0 -2.0 where for wages -1 indicates "didn't work" and -2 indicates "refused to respond". How could I replace the negative values for wages with missingness indicators to use the data frame in for instance lm( ), but later operate only on those observations who "refused to respond"? Of course I can always work around this somehow, especially in this easy example, but as data frames get larger and cases more complex the workarounds seem more and more klutzy to me. So, if there is an easy way to do this that I have overlooked, I would be grateful for any advice or references. Best, Christian -- Christian Raschke Department of Economics and ISDS Research Lab (HSRG) Louisiana State University Patrick Taylor Hall, Rm 2128 Baton Rouge, LA 70803 cras...@lsu.edu [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.