The data file is a csv file. Some text variables contain spaces. "Check for extraneous spaces" Are there specific locations that would be more critical than others?
________________________________ From: Jeff Newmiller <jdnew...@dcn.davis.ca.us> Sent: Thursday, November 14, 2019 10:52 To: Sebastien Bihorel <sebastien.biho...@cognigencorp.com>; Sebastien Bihorel via R-help <r-help@r-project.org>; r-help@r-project.org <r-help@r-project.org> Subject: Re: [R] Can file size affect how na.strings operates in a read.table call? Check for extraneous spaces. You may need more variations of the na.strings. On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help <r-help@r-project.org> wrote: >Hi, > >I have this generic function to read ASCII data files. It is >essentially a wrapper around the read.table function. My function is >used in a large variety of situations and has no a priori knowledge >about the data file it is asked to read. Nothing is known about file >size, variable types, variable names, or data table dimensions. > >One argument of my function is na.strings which is passed down to >read.table. > >Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by >~ 160 columns) using na.strings = c('-99', '.') with the intention of >interpreting '.' and '-99' >strings as the internal missing data NA. Dots were converted to NA >appropriately. However, not all -99 values in the data were interpreted >as NA. In some variables, -99 were converted to NA, while in others -99 >was read as a number. More surprisingly, when the data file was cut in >smaller chunks (ie, by dropping either rows or columns) saved in >multiple files, the function calls applied on the new data files >resulted in the correct conversion of the -99 values into NAs. > >In all cases, the data frames produced by read.table contained the >expected number of records. > >While, on face value, it appears that file size affects how the >na.strings argument operates, I wondering if there is something else at >play here. > >Unfortunately, I cannot share the data file for confidentiality reason >but was wondering if you could suggest some checks I could perform to >get to the bottom on this issue. > >Thank you in advance for your help and sorry for the lack of >reproducible example. > > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.