On Jun 15, 2015, at 1:12 PM, Federman, Douglas wrote: > I'm trying to do the following: search each patient's list of diagnoses for a > specific code then create a new column based upon the the presence of the > specific code. > Simplified data follows: > > con <- textConnection(" > ID DX1 DX2 DX3 > 1 4109 4280 7102 > 2 734 311 490 > 3 4011 42822 4101 > ") > df <- read.table(con, header = TRUE, strip.white = TRUE, > colClasses="character") > # > # I would like to add a column such the result of searching for 410 would > give: The search string would always be at the start of a word and doesn't > need regex. > # > # ID DX1 DX2 DX3 htn > # 1 4109 4280 7102 1 > # 2 734 311 490 0 > # 3 4011 42822 4101 1 > # > # The following works but is slow and returns NA if the search string is not > found: > > for (i in 1:nrow(df)) { > df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], > fixed = TRUE) ))) > }
Is this any better? > df$htn <- apply(df[-1], 1, function(r) max( substr(r, 1,3) == "410" )) > df ID DX1 DX2 DX3 htn 1 1 4109 4280 7102 1 2 2 734 311 490 0 3 3 4011 42822 4101 1 Can add an na.rm=TRUE to the max call if warranted. `max` coerces logicals to integer. -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.