I have a humongous csv file containing census data, far too big to read into RAM. I have been trying to extract individual columns from this file using the colbycol package. This works for certain subsets of the columns, but not for others. I have not yet been able to precisely identify the problem columns, as there are 731 columns and running colbycol on the file on my old slow machine takes about 6 hours.
However, my suspicion is that there are some funky characters, either control characters or characters with some non-standard encoding, somewhere in this 14 gig file. Moreover, I am concerned that these characters may cause me trouble down the road even if I use a different approach to getting columns out of the file. Is there an r utility will search through my file without trying to read it all into memory at one time and find non-standard characters or misplaced (non-end-of-line) control characters? Or some R code to the same end? Even if the real problem ultimately proves top be different, it would be helpful to eliminate this possibility. And this is also something I would routinely run on files from external sources if I had it. I am working in a windows XP environment, in case that makes a difference. Any help anyone could offer would be greatly appreciated. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.