HI,
You can also try this. Should remove the problematic lines, but it may not match up to Rui's method in terms of speed. input<-readLines(textConnection("TABLE NO. 1 COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 COL11 COL12 1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00 1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00 5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00 TABLE NO. 1 COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 COL11 COL12 1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00 1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00 5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00 ")) dat1<-do.call(rbind,lapply(lapply(input[grep("\\d+",input)][grepl("[\\d+]",input[grep("\\d+",input)])],function(x) unlist(strsplit(x, split=" +"))),function(x) unlist(strsplit(x, split="+ ")))) dat2<-data.frame(dat1) dat2[]<-sapply(dat2,function(x) as.numeric(as.character(x))) names(dat2)<-lapply(strsplit(input[grepl("COL{0,1}",input)]," +"),function(x) unlist(strsplit(x,"+ ")))[[1]] dat2 # COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 #1 100100 0.000 1 1000 -1 1 0.0000e+00 0.0000e+00 0 0.0000e+00 #2 100100 10.001 1 1000 -1 1 2.2737e-14 -2.2737e-14 0 1.9281e-08 #3 100100 24.000 1 2000 -1 1 5.7541e-15 -5.7541e-15 0 5.1115e-13 #4 100100 0.000 1 1000 -1 1 0.0000e+00 0.0000e+00 0 0.0000e+00 #5 100100 10.001 1 1000 -1 1 2.2737e-14 -2.2737e-14 0 1.9281e-08 #6 100100 24.000 1 2000 -1 1 5.7541e-15 -5.7541e-15 0 5.1115e-13 # COL11 COL12 #1 0 0 #2 0 0 #3 0 0 #4 0 0 #5 0 0 #6 0 0 A.K. ----- Original Message ----- From: Fisher Dennis <fis...@plessthan.com> To: r-h...@stat.math.ethz.ch Cc: Sent: Thursday, October 18, 2012 9:57 AM Subject: [R] speeding read.table R 2.15.1 OS X Colleagues, I am reading a 1 GB file into R using read.table. The file consists of 100 tables, each of which is headed by two lines of characters. The first of these lines is: TABLE NO. 1 The second is a list of column headers. For example: TABLE NO. 1 COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 COL11 COL12 1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00 1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00 5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00 Later something similar appears: TABLE NO. 1 COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 COL11 COL12 1.0010E+05 0.0000E+00 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0010E+05 1.0001E+01 1.0000E+00 1.0000E+03 -1.0000E+00 1.0000E+00 2.2737E-14 -2.2737E-14 0.0000E+00 1.9281E-08 0.0000E+00 0.0000E+00 1.0010E+05 2.4000E+01 1.0000E+00 2.0000E+03 -1.0000E+00 1.0000E+00 5.7541E-15 -5.7541E-15 0.0000E+00 5.1115E-13 0.0000E+00 0.0000E+00 I will use the term "problematic lines" to refer to the repeated occurrences of the two non-data lines read.table is not successful in reading the table because of these problematic lines (I get around the first "TABLE NO." line using the skip option) My word-around has been to: 1. read the table with readLines 2. remove the problematic lines 3. write the file to disk 4. read the file with read.table. However, this process is slow. I though about using "comment.char" as a means of avoiding reading the problematic lines. However, comment.char does not accept ="[A-Z]" Are there any clever workarounds for this? Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.