Jason Are you suggesting grep in R or grep in the system? If the latter, this won't work because I need to implement this same procedure in Windows (sorry about not mentioning this), in which grep does not exist. If in R, the syntax is not obvious -- could you provide an example?
Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com On Oct 18, 2012, at 7:10 AM, Jason Edgecombe wrote: > On 10/18/2012 09:57 AM, Fisher Dennis wrote: >> R 2.15.1 >> OS X >> >> Colleagues, >> >> I am reading a 1 GB file into R using read.table. The file consists of 100 >> tables, each of which is headed by two lines of characters. >> The first of these lines is: >> TABLE NO. 1 >> The second is a list of column headers. >> >> For example: >> TABLE NO. 1 >> COL1 COL2 COL3 COL4 COL5 COL6 >> COL7 COL8 COL9 COL10 COL11 COL12 >> 1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 >> 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 >> 1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 >> 2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00 >> 1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00 >> 5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00 >> >> Later something similar appears: >> TABLE NO. 1 >> COL1 COL2 COL3 COL4 COL5 COL6 >> COL7 COL8 COL9 COL10 COL11 COL12 >> 1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 >> 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 >> 1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 >> 2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00 >> 1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00 >> 5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00 >> >> I will use the term "problematic lines" to refer to the repeated occurrences >> of the two non-data lines >> >> read.table is not successful in reading the table because of these >> problematic lines (I get around the first "TABLE NO." line using the skip >> option) >> >> My word-around has been to: >> 1. read the table with readLines >> 2. remove the problematic lines >> 3. write the file to disk >> 4. read the file with read.table. >> However, this process is slow. >> >> I though about using "comment.char" as a means of avoiding reading the >> problematic lines. However, comment.char does not accept ="[A-Z]" >> >> Are there any clever workarounds for this? >> > Create a connection to a pipe, where pipe reads from the grep command. Grep > can exlude the problematic lines. Use the pipe object as your connection in > read.table. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.