On 18-Jul-10 05:47:03, Suresh Singh wrote: > I have a data file in which one of the columns is country code and NA > is the > code for Namibia. > When I read the data file using read.csv, NA for Namibia is being > treated as > null or "NA" > > How can I prevent this from happening? > > I tried the following but it didn't work > input <- read.csv("padded.csv",header = TRUE,as.is = c("code2")) > > thanks, > Suresh
I suppose this was bound to happen, and in my view it represent a bit of a mess! With a test file temp.csv: Code,Country DE,Germany IT,Italy NA,Namibia FR,France X <- read.csv("temp.csv") X Code Country # 1 DE Germany # 2 IT Italy # 3 <NA> Namibia # 4 FR France which(is.na(X)) # [1] 3 exactly as Suresh describes. It does not help to surround the NA in temp.csv with quotes: Code,Country DE,Germany IT,Italy "NA",Namibia FR,France leads to exactly the same result. And I have tried every variation I can think of of "as.is" and "colClasses", still with exactly the same result! Conclusion: If an entry in a data file is intended to become the character value "NA", there seems to be no way of reading it in directly. This should not be so: it should be preventable! As a cure, assuming that no other value in the Country Code is actually missing (and so should be <NA>), then (with Suresh's naming) I would suggest, subsequent to reading in the file, something like the following. The complication is that the variable code2 is now a factor, and you cannot simply assign a character value "NA" to its <NA> value -- you will get an error message. Hence: ix <- which(is.na(input$code2)) Y <- as.character(input$code2) Y[ix] <- "NA" input$code2) <- factor(Y) The corresponding code for my test example is: ix <- which(is.na(X$Code)) Y <- as.character(X$Code) Y[ix] <- "NA" X$Code <- factor(Y) X # Code Country # 1 DE Germany # 2 IT Italy # 3 NA Namibia # 4 FR France which(is.na(X)) # integer(0) So that works. There ought to be an option in read.csv() and friends which suppresses the conversion of a string "NA" found in input into an <NA> value. Maybe there is -- but, if so, it is not visible in the documentation! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 18-Jul-10 Time: 09:25:05 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.