Colleagues,

Using R2.7.0 in OS X, I am having trouble understanding the command  
textConnection.  My situation is as follows:
1.  I am trying to read a lengthy file (45000 lines) that has headers  
~ every 1000 lines.  read.table (or its variants) fail because of the  
recurrent headers.
2.  My present approach is the following:
        a.  use readLines to read the file, save as an array
        b.  use grep to find the recurrent headers (not including the first  
set)
        c.  delete the recurrent headers from the array
        d.  write the array to a temp file
        e.  read the temp file using read.table
        f.   delete the temp file
3.  My understanding is to textConnection might enable me to replace  
steps d-f with a single step akin to  
read.table(textConnection(array)).  This appears to work but it is  
very slow.  I executed code on successively larger chunks of the array:
for (Each in 1000 * 1:45)
        {
        cat("N lines =", Each, "\t", date(), "\n")
        A <- read.table(textConnection(Z[1:Each]), header=T)
        }
yielding:
N lines = 1000   Sun Oct 12 07:09:48 2008
N lines = 2000   Sun Oct 12 07:09:48 2008
N lines = 3000   Sun Oct 12 07:09:48 2008
N lines = 4000   Sun Oct 12 07:09:50 2008
N lines = 5000   Sun Oct 12 07:09:52 2008
N lines = 6000   Sun Oct 12 07:09:56 2008
N lines = 7000   Sun Oct 12 07:10:01 2008
N lines = 8000   Sun Oct 12 07:10:09 2008
N lines = 9000   Sun Oct 12 07:10:18 2008
N lines = 10000          Sun Oct 12 07:10:31 2008
N lines = 11000          Sun Oct 12 07:10:46 2008
N lines = 12000          Sun Oct 12 07:11:04 2008
N lines = 13000          Sun Oct 12 07:11:25 2008
N lines = 14000          Sun Oct 12 07:11:51 2008
N lines = 15000          Sun Oct 12 07:12:20 2008
N lines = 16000          Sun Oct 12 07:12:54 2008
N lines = 17000          Sun Oct 12 07:13:32 2008
N lines = 18000          Sun Oct 12 07:14:16 2008
N lines = 19000          Sun Oct 12 07:15:04 2008
N lines = 20000          Sun Oct 12 07:15:58 2008
N lines = 21000          Sun Oct 12 07:16:58 2008
N lines = 22000          Sun Oct 12 07:18:04 2008
N lines = 23000          Sun Oct 12 07:19:17 2008
N lines = 24000          Sun Oct 12 07:20:36 2008
N lines = 25000          Sun Oct 12 07:22:02 2008
N lines = 26000          Sun Oct 12 07:23:36 2008

Any clever ideas will be greatly appreciated.

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-415-564-2220
www.PLessThan.com


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to