Try one of these:
Lines <- readLines("myfile.dat") Lines <- Lines[-grep("whatever", Lines)] DF <- read.table(textConnection(Lines), ...other.args...) or # use findstr /v instead of grep -v if you are on Windows DF <- read.table(pipe("grep -v whatever myfile.dat"), ...other.args...) On Sun, Oct 12, 2008 at 11:13 AM, Dennis Fisher <[EMAIL PROTECTED]> wrote: > Colleagues, > > Using R2.7.0 in OS X, I am having trouble understanding the command > textConnection. My situation is as follows: > 1. I am trying to read a lengthy file (45000 lines) that has headers > ~ every 1000 lines. read.table (or its variants) fail because of the > recurrent headers. > 2. My present approach is the following: > a. use readLines to read the file, save as an array > b. use grep to find the recurrent headers (not including the first > set) > c. delete the recurrent headers from the array > d. write the array to a temp file > e. read the temp file using read.table > f. delete the temp file > 3. My understanding is to textConnection might enable me to replace > steps d-f with a single step akin to > read.table(textConnection(array)). This appears to work but it is > very slow. I executed code on successively larger chunks of the array: > for (Each in 1000 * 1:45) > { > cat("N lines =", Each, "\t", date(), "\n") > A <- read.table(textConnection(Z[1:Each]), header=T) > } > yielding: > N lines = 1000 Sun Oct 12 07:09:48 2008 > N lines = 2000 Sun Oct 12 07:09:48 2008 > N lines = 3000 Sun Oct 12 07:09:48 2008 > N lines = 4000 Sun Oct 12 07:09:50 2008 > N lines = 5000 Sun Oct 12 07:09:52 2008 > N lines = 6000 Sun Oct 12 07:09:56 2008 > N lines = 7000 Sun Oct 12 07:10:01 2008 > N lines = 8000 Sun Oct 12 07:10:09 2008 > N lines = 9000 Sun Oct 12 07:10:18 2008 > N lines = 10000 Sun Oct 12 07:10:31 2008 > N lines = 11000 Sun Oct 12 07:10:46 2008 > N lines = 12000 Sun Oct 12 07:11:04 2008 > N lines = 13000 Sun Oct 12 07:11:25 2008 > N lines = 14000 Sun Oct 12 07:11:51 2008 > N lines = 15000 Sun Oct 12 07:12:20 2008 > N lines = 16000 Sun Oct 12 07:12:54 2008 > N lines = 17000 Sun Oct 12 07:13:32 2008 > N lines = 18000 Sun Oct 12 07:14:16 2008 > N lines = 19000 Sun Oct 12 07:15:04 2008 > N lines = 20000 Sun Oct 12 07:15:58 2008 > N lines = 21000 Sun Oct 12 07:16:58 2008 > N lines = 22000 Sun Oct 12 07:18:04 2008 > N lines = 23000 Sun Oct 12 07:19:17 2008 > N lines = 24000 Sun Oct 12 07:20:36 2008 > N lines = 25000 Sun Oct 12 07:22:02 2008 > N lines = 26000 Sun Oct 12 07:23:36 2008 > > Any clever ideas will be greatly appreciated. > > Dennis > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-415-564-2220 > www.PLessThan.com > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.