Hi,

I have this generic function to read ASCII data files. It is essentially a 
wrapper around the read.table function. My function is used in a large variety 
of situations and has no a priori knowledge about the data file it is asked to 
read. Nothing is known about file size, variable types, variable names, or data 
table dimensions.

One argument of my function is na.strings which is passed down to read.table.

Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by ~ 160 
columns) using na.strings = c('-99', '.') with the intention of interpreting 
'.' and '-99'
strings as the internal missing data NA. Dots were converted to NA 
appropriately. However, not all -99 values in the data were interpreted as NA. 
In some variables, -99 were converted to NA, while in others -99 was read as a 
number. More surprisingly, when the data file was cut in smaller chunks (ie, by 
dropping either rows or columns) saved in multiple files, the function calls 
applied on the new data files resulted in the correct conversion of the -99 
values into NAs.

In all cases, the data frames produced by read.table contained the expected 
number of records.

While, on face value, it appears that file size affects how the na.strings 
argument operates, I wondering if there is something else at play here. 

Unfortunately, I cannot share the data file for confidentiality reason but was 
wondering if you could suggest some checks I could perform to get to the bottom 
on this issue.

Thank you in advance for your help and sorry for the lack of reproducible 
example.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to