I'll summarize the results in terms of total run time for the suggestions
that have been made as well as post the code for those that come across this
post in the future. First the results (the code for which is provided
second):
What I tried to do using suggestions from Bert and Dan:
t1
# user
On Wed, Jun 6, 2012 at 12:54 PM, emorway wrote:
> useRs-
>
> I'm attempting to scan a more than 1Gb text file and read and store the
> values that follow a specific key-phrase that is repeated multiple time
> throughout the file. A snippet of the text file I'm trying to read is
> attached. The t
Hello,
I've just read your follow-up question on regular expressions, and I
believe this, your original problem, can be made much faster. Just use
readLine() differently, reading large amounts of text lines at a time.
For this to work you will still need to know the total number of lines
in t
I think 1 gb is small enough that this can be easily and efficiently
done in R. The key is: regular expressions are your friend.
I shall assume that the text file has been read into R as a single
character string, named "mystring" . The code below could easily be
modifed to work on a a vector of s
R may not be the best tool for this.
Did you look at gawk? It is also available for Windows:
http://gnuwin32.sourceforge.net/packages/gawk.htm
Once gawk has written a new file that only contains the lines / data you want,
you could use R for the next steps.
You also can run gawk from within R wi
5 matches
Mail list logo