Re: [R] Reading large, non-tabular files

2011-09-14 Thread jim holtman
Here how long it might take in R to do. I created a file of 558MB and then read it it, found lines that had ''76095' in them and then wrote those out: > system.time(x <- readLines('tempyy')) # read in the 558MB file user system elapsed 65.910.82 67.40 > object.size(x) 63348864 bytes

Re: [R] Reading large, non-tabular files

2011-09-14 Thread Rainer Schuermann
That looks like a perfect job for (g)awk which is in every Linux distribution but also available for Windows. It can be called with something like system( "awk -f script.awk inputfile.txt" ) and does its job silently and very fast. 650MB should not be an issue. I'm not proficient in awk but wou

Re: [R] Reading large, non-tabular files

2011-09-14 Thread David Winsemius
On Sep 14, 2011, at 7:08 AM, Stefan McKinnon Høj-Edwards wrote: Dear R-help, I have a very large ascii data file, of which I only want to read in selected lines (e.g. on fourth of the lines); determining which lines depends on the lines content. So far, I have found two approaches for do

Re: [R] Reading large, non-tabular files

2011-09-14 Thread Gabor Grothendieck
2011/9/14 Stefan McKinnon Høj-Edwards : > Dear R-help, > > I have a very large ascii data file, of which I only want to read in selected > lines (e.g. on fourth of the lines); determining which lines depends on the > lines content. So far, I have found two approaches for doing this in R; 1) > Re

Re: [R] Reading large, non-tabular files

2011-09-14 Thread jim holtman
What is overkill about reading in a 650MB text file if you have the space? You are going to have to process one way or another. I would use 'readLines' to read it in, and then 'grepl' to determine which lines I want to keep and then delete the rest, and then write the new file out. At this point

[R] Reading large, non-tabular files

2011-09-14 Thread Stefan McKinnon Høj-Edwards
Dear R-help, I have a very large ascii data file, of which I only want to read in selected lines (e.g. on fourth of the lines); determining which lines depends on the lines content. So far, I have found two approaches for doing this in R; 1) Read the file line by line using a repeat-loop and sa