"out of your depth" does not serve as a legitimate excuse -- for me anyway. There are many good tutorials on regular expressions out there. Go through one. Ditto with R data handling. "An Introduction to R" (ships with R) is one that's right at hand.
Although others may be more inclined than I am to help, you would certainly increase the likelihood by first doing some homework and showing us code that you tried. Although, by that time, you probably will have figured it out for yourself. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Dec 17, 2014 at 12:14 PM, Robert Strother <rstro...@gmail.com> wrote: > I have a large dataset (~50,000 rows, 96 columns), of hospital > administrative data. > many of the columns are clinical coding of inpatient event (using ICD-10). > A simplified example of the data is below > >> dput(dat_unmatched) > structure(list(ID = structure(c(4L, 3L, 2L, 1L), .Label = c("BCM3455", > "BZD2643", "GDR2343", "MCZ4325"), class = "factor"), X.1 = structure(c(2L, > 3L, 1L, 1L), .Label = c("B83.2", "C23.2", "F56.23"), class = "factor"), > X.2 = structure(c(2L, 1L, 2L, 2L), .Label = c("M20.64", "T43.2" > ), class = "factor"), X.3 = structure(c(2L, 3L, 3L, 1L), .Label = > c("F56.23", > "R23.1", "Y32.1"), class = "factor"), X.4 = structure(c(1L, > 2L, 2L, 3L), .Label = c("M23.5", "T44.2", "Y32.1"), class = "factor"), > X.5 = structure(c(1L, 2L, 1L, 2L), .Label = c("", "Q23.6" > ), class = "factor")), .Names = c("ID", "X.1", "X.2", "X.3", > "X.4", "X.5"), class = "data.frame", row.names = c(NA, -4L)) > > I am interested in a set of codes that start with a "T" or a "Y", and link > them to the preceding column that does not begin with a "T" or "Y". I > suspect I will need to use regular expressions, and likely a loop, but I am > really out of my depth at this point. > > I would like the final dataset to look like: > >> dput(dat_matched) > structure(list(ID = structure(c(4L, 3L, 2L, 1L), .Label = c("BCM3455", > "BZD2643", "GDR2343", "MCZ4325"), class = "factor"), X.1 = structure(c(2L, > 3L, 1L, 1L), .Label = c("B83.2", "C23.2", "M20.64"), class = "factor"), > X.2 = structure(c(1L, 2L, 1L, 1L), .Label = c("T43.2", "Y32.1" > ), class = "factor"), X.3 = structure(c(1L, 4L, 2L, 3L), .Label = c("", > "B83.2", "F56.23", "M20.64"), class = "factor"), X.4 = structure(c(1L, > 2L, 3L, 3L), .Label = c("", "T44.2", "Y32.1"), class = "factor"), > X.5 = structure(c(1L, 1L, 2L, 1L), .Label = c("", "B83.2" > ), class = "factor"), X = structure(c(1L, 1L, 2L, 1L), .Label = c("", > "T44.2"), class = "factor")), .Names = c("ID", "X.1", "X.2", > "X.3", "X.4", "X.5", "X"), class = "data.frame", row.names = c(NA, > -4L)) > > Any help appreciated. > > Matthew > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.