Here how long it might take in R to do. I created a file of 558MB
and then read it it, found lines that had ''76095' in them and then
wrote those out:
> system.time(x <- readLines('tempyy')) # read in the 558MB file
user system elapsed
65.910.82 67.40
> object.size(x)
63348864 bytes
That looks like a perfect job for (g)awk which is in every Linux distribution
but also available for Windows.
It can be called with something like
system( "awk -f script.awk inputfile.txt" )
and does its job silently and very fast. 650MB should not be an issue. I'm not
proficient in awk but wou
On Sep 14, 2011, at 7:08 AM, Stefan McKinnon Høj-Edwards wrote:
Dear R-help,
I have a very large ascii data file, of which I only want to read in
selected lines (e.g. on fourth of the lines); determining which
lines depends on the lines content. So far, I have found two
approaches for do
2011/9/14 Stefan McKinnon Høj-Edwards :
> Dear R-help,
>
> I have a very large ascii data file, of which I only want to read in selected
> lines (e.g. on fourth of the lines); determining which lines depends on the
> lines content. So far, I have found two approaches for doing this in R; 1)
> Re
What is overkill about reading in a 650MB text file if you have the
space? You are going to have to process one way or another. I would
use 'readLines' to read it in, and then 'grepl' to determine which
lines I want to keep and then delete the rest, and then write the new
file out. At this point
Dear R-help,
I have a very large ascii data file, of which I only want to read in selected
lines (e.g. on fourth of the lines); determining which lines depends on the
lines content. So far, I have found two approaches for doing this in R; 1) Read
the file line by line using a repeat-loop and sa
6 matches
Mail list logo