On Jun 15, 2015, at 1:12 PM, Federman, Douglas wrote:

> I'm trying to do the following: search each patient's list of diagnoses for a 
> specific code then create a new column based upon the the presence of the 
> specific code.  
> Simplified data follows:
> 
> con <- textConnection("
> ID    DX1     DX2     DX3
> 1     4109    4280    7102
> 2     734     311     490
> 3     4011    42822   4101
> ")
> df <- read.table(con, header = TRUE, strip.white = TRUE, 
> colClasses="character")
> #
> # I would like to add a column such the result of searching for 410 would 
> give:  The search string would always be at the start of a word and doesn't 
> need regex.
> #
> # ID  DX1     DX2     DX3     htn
> # 1   4109    4280    7102    1
> # 2   734     311     490     0
> # 3   4011    42822   4101    1
> #
> # The following  works but is slow and returns NA if the search string is not 
> found:
> 
> for (i in 1:nrow(df)) {
>    df[i,"htn"] <- any(sapply('410', function(x)  which( grepl(x, df[i, 2:4], 
> fixed = TRUE) )))
> }

Is this any better?

> df$htn <-  apply(df[-1], 1, function(r) max( substr(r, 1,3) == "410" ))
> df
  ID  DX1   DX2  DX3 htn
1  1 4109  4280 7102   1
2  2  734   311  490   0
3  3 4011 42822 4101   1


Can add an na.rm=TRUE to the max call if warranted. `max` coerces logicals to 
integer.



-- 
David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to