This will read it in all in and then you can decide what you want to do with it:
Lines <- "DISKREAD,metadata about disks MEM,metadata about memory ZZZZ,observation-identifier,time,date DISKREAD,observation-identifier,data about disks MEM,observation-identifier,data about memory" DF <- read.table(textConnection(Lines), sep = ",", fill = TRUE) On Tue, Aug 11, 2009 at 2:55 PM, Allen S. Rout<a...@ufl.edu> wrote: > > > Greetings, all. > > I've got a datafile I've been working with that has an ideosyncratic, > heterogeneous format. It's grossly like: > > > [...] > DISKREAD,metadata about disks > MEM,metadata about memory > > ZZZZ,observation-identifier,time,date > DISKREAD,observation-identifier,data about disks > MEM,observation-identifier,data about memory > > [ and repeat for each observation ] > > What I've done in the past was take the monolithic file, and > preprocess it into files, one per observation type. The observation > types are structurally self-similar, so once I have them split up, > normal read.csv methods work just fine. Then I read the ZZZZ file to > get timestamps, and whichever observation files I care about on this > run. > > > But ideally, I'd like to do this entire operation with R features, and > without multiple passes through the file. > > The line lengths vary wildly, so a read.table doesn't help. > > > I was visualizing the following: > > + create a FIFO for each desired observation class, including the ZZZZ > metadata > + In one pass through the source file, populate the FIFOs with their data > + read.csv the output sides of the FIFOs. > > > But I have problems right out of the gate: when I set a data.frame > element to the output of fifo(), what actually gets inserted seems to > be an integer; I am guessing it's being turned into a factor. > > > example: > ---- > desired_slices=c("ZZZZ","DISKWRITE") > temps = data.frame(slice=desired_slices,row.names=1,handle=I("")) > > temps["ZZZZ",] = fifo("./ZZZZ",open="w+") > showConnections() > ( you can see that the connection is open) > temps > ( you can see that the contents of the data.frame cell is the filehandle > number) > ----- > > Am I just barking up the wrong tree? > > > > - Allen S. Rout > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.