Dear R users,

I have a big text file formatted like this:

x      x_string
y      y_string
id1    id1_string
id2    id2_string
z      z_string
w      w_string
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string1
y      y_string1
z      z_string1
w      w_string1
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string2
y      y_string2
id1    id1_string1
id2    id2_string1
z      z_string2
w      w_string2
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
...
...


I'd like to parse this file and retrieve the x, y, id1, id2, z, w fields and save them into a a matrix object:

x        y          id1         id2         z          w
x_string y_string id1_string id2_string z_string w_string x_string1 y_string1 NA NA z_string1 w_string1
x_string2 y_string2 id1_string1 id2_string1 z_string2  w_string2
...
...

id1, id2 fields are not always present within a section (the interval between x and the last stuff) and I'd like to insert a NA when they are absent (see above) so that length(x)==length(y)==length(id1)==... .

Without the id1, id2 fields the task is easily solvable importing the text file with readLines and retrieving the single fields with grep:

input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...

I'd like to accomplish this task entirely in R (no SQL, no perl script), possibly without using loops.

Any suggestions are quite welcome!

Regards,
Paolo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to