Hi R users, I really need help with subsetting data frames:
I have a large database of medical records and I want to be able to match patterns from a list of search terms . I've used this simplified data frame in a previous example: db <- structure(list(ind = c("ind1", "ind2", "ind3", "ind4"), test1 = c(1, 2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 9, 1.2)), .Names = c("ind", "test1", "test2", "test3"), class = "data.frame", row.names = c(NA, -4L)) terms_include <- c("1","2","3") terms_exclude <- c("1.1","1.2","1.3") So in this example I want to include all the terms from terms include as long as they don't occur with terms exclude in the same row of the data frame. Previously I was given this function which works very well if you want to match exactly: f <- function(x) !any(x %in% terms_exclude) && any(x %in% terms_include) db[apply(db[, -1], 1, f), ] ind test1 test2 test3 2 ind2 2 27 28.0 4 ind4 3 2 1.2 I would like to know if there is a way to write a similar function that looks for matches that start with the query string: as in grepl("^pattern",x) I started writing a function but am not sure how to get it to return the dataframe or matrix: for (i in 1:length(terms_include)){ db_new <- apply(db,2, grepl,pattern=i) } Applying this function gives me: db_new <- structure(c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), .Dim = c(4L, 4L), .Dimnames = list(NULL, c("ind", "test1", "test2", "test3" ))) So the above is searching the pattern anywhere in the dataframe instead of just at the beginning of the string. How would I incorporate look for terms to include but don't return the row of the data frame if it also includes one of the terms to exclude while using partial matching? I hope that this makes sense. Many thanks, Natalie ----- Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-tp4160127p4160127.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.