[R] Parsing

Paolo Sonego Wed, 09 Jul 2008 02:43:53 -0700

Dear R users,

I have a big text file formatted like this:


x      x_string
y      y_string
id1    id1_string
id2    id2_string
z      z_string
w      w_string
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string1
y      y_string1
z      z_string1
w      w_string1
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string2
y      y_string2
id1    id1_string1
id2    id2_string1
z      z_string2
w      w_string2
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
...
...

I'd like to parse this file and retrieve the x, y, id1, id2, z, w fieldsand save them into a a matrix object:


x        y          id1         id2         z          w

x_string y_string id1_string id2_string z_string w_stringx_string1 y_string1 NA NA z_string1 w_string1

x_string2 y_string2 id1_string1 id2_string1 z_string2  w_string2
...
...

id1, id2 fields are not always present within a section (the intervalbetween x and the last stuff) andI'd like to insert a NA when they are absent (see above) so thatlength(x)==length(y)==length(id1)==... .

Without the id1, id2 fields the task is easily solvable importing thetext file with readLines and retrieving the single fields with grep:


input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...

I'd like to accomplish this task entirely in R (no SQL, no perlscript), possibly without using loops.


Any suggestions are quite welcome!

Regards,
Paolo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Parsing

Reply via email to