Chuck, Thanks so much, these both work like a charm. The first method, though, is very, very slow for a large dataset (<100,000) while the second is reasonable in terms of speed. If you or anyone have any ideas for speeding up the import send them my way otherwise the:
con2 <- pipe( 'grep "^RD" tmp.dat' ) dat2 <- read.csv( con2, sep='|', header=FALSE) works well! Thank you, Zev Charles C. Berry wrote: > On Fri, 11 Apr 2008, Zev Ross wrote: > >> Hi All, >> >> Can anyone direct me to a read function in R that will allow me to only >> read in rows of a text file that begin with a particular value such as >> the data below. I would read the entire file in and then limit, but the >> files were constructed such that the first two letters determine how >> many variables are in the row (different letters mean different numbers >> of columns and different column names/types). >> >> I can do this in SAS, but I'd prefer to use R. The approximate SAS code >> is below with the key piece of code being "if rectype='RD'" then do. >> >> Thoughts? > > If your data are in 'tmp.dat': > >> txt <- readLines( "tmp.dat" ) con <- textConnection( grep( "^RD", >> txt, value=TRUE ) ) >> dat <- read.csv( con, sep='|', header=FALSE) >> close(con) >> summary( dat[ , 1:3 ] ) > V1 V2 V3 > RD:6 I:6 Min. :1 > 1st Qu.:1 > Median :1 > Mean :1 > 3rd Qu.:1 > Max. :1 > > Alternatively, if you have 'grep' in your system and in the path: > >> con2 <- pipe( 'grep "^RD" tmp.dat' ) >> dat2 <- read.csv( con2, sep='|', header=FALSE) >> > > > See > ?connection > ?textConnection > ?grep > > HTH, > > Chuck >> >> Zev >> >> >> RD|I|01|073|0023|68103|5|7|017|810|20070103|00:00|0.6||3||||||||||||| >> RD|I|01|073|0023|68103|5|7|017|810|20070106|00:00|9.5||3||||||||||||| >> RD|I|01|073|0023|68103|5|7|017|810|20070109|00:00|2.5||3||||||||||||| >> RD|I|01|073|0023|68103|5|7|017|810|20070112|00:00|13.7||3||||||||||||| >> RD|I|01|073|0023|68103|5|7|017|810|20070115|00:00|7.3||3||||||||||||| >> RA|I|01|073|0023|A334|5|7|017|810|20070118|00:00|3.7||3||||||||||||| >> RD|I|01|073|0023|68103|5|7|017|810|20070121|00:00|6.9||3||||||||||||| >> RC|I|01|073|0023|Quer|5|7|017|810|20070124|00:00|1.8||3||||||||||||| >> >> >> infile 'C:\junk\RD_501_88101_2006-0.txt' >> dlm='|' firstobs=3 missover; >> rectype $2. @; >> if rectype = 'RD' then do; >> >> -- >> Zev Ross >> ZevRoss Spatial Analysis >> 303 Fairmount Ave >> Ithaca, NY 14850 >> 607-277-0004 (phone) >> 866-877-3690 (fax, toll-free) >> [EMAIL PROTECTED] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive Medicine > E mailto:[EMAIL PROTECTED] UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > > -- Zev Ross ZevRoss Spatial Analysis 303 Fairmount Ave Ithaca, NY 14850 607-277-0004 (phone) 866-877-3690 (fax, toll-free) [EMAIL PROTECTED] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.