On 10/18/2012 09:57 AM, Fisher Dennis wrote:
R 2.15.1
OS X
Colleagues,
I am reading a 1 GB file into R using read.table. The file consists of 100
tables, each of which is headed by two lines of characters.
The first of these lines is:
TABLE NO. 1
The second is a list of column headers.
For example:
TABLE NO. 1
COL1 COL2 COL3 COL4 COL5 COL6 COL7
COL8 COL9 COL10 COL11 COL12
1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00
2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00
1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00
5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00
Later something similar appears:
TABLE NO. 1
COL1 COL2 COL3 COL4 COL5 COL6 COL7
COL8 COL9 COL10 COL11 COL12
1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00
0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00
2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00
1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00
5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00
I will use the term "problematic lines" to refer to the repeated occurrences of
the two non-data lines
read.table is not successful in reading the table because of these problematic lines (I
get around the first "TABLE NO." line using the skip option)
My word-around has been to:
1. read the table with readLines
2. remove the problematic lines
3. write the file to disk
4. read the file with read.table.
However, this process is slow.
I though about using "comment.char" as a means of avoiding reading the problematic lines.
However, comment.char does not accept ="[A-Z]"
Are there any clever workarounds for this?
Create a connection to a pipe, where pipe reads from the grep command.
Grep can exlude the problematic lines. Use the pipe object as your
connection in read.table.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.