Dear Mr. Grothendieck : thank you so much for your attention. You are the real expert here. the following is a mock text file: a b c aa 23 aaa 34 aaaa 77
note that both b and c column contain missing value (blank) I save it under my C drive and use both read.table and sqldf to import it to R and then use identical() function to compare the result. The following is the result: > setwd("c:/") > library(sqldf) > test <- file("test.txt") > testx <- sqldf("select * from test", + dbname = tempfile(), file.format = list(header = T, sep="\t", row.names = F)) > testy<- read.table("test.txt", header = T, sep="\t") > identical(testx, testy) [1] FALSE > testx a b c 1 aa 23.0 2 aaa 34.6 0.0 3 aaaa 77.8 > testy a b c 1 aa NA 23.0 2 aaa 34.6 NA 3 aaaa NA 77.8 > class(testx$b) [1] "factor" > class(testy$b) [1] "numeric" > read.table seems to get it right while sqldf treats b as factor (if I add method="raw", b become character). what is more troubling is that column C has number 0 at the second row while in the original file it is missing. In my real world situation with a much larger text file, the problem is that many cells are empty when they all actually have values in the original text file. I would greatly appreciate your help if you can shed some light on this. thanks -- View this message in context: http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331662.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.