Hi Elahe, You could modify your count_verbs function from your previous post:
* use scan to extract the tokens (words) from Message * use your previous grepl expression to index the tokens that are verbs * paste the verbs together to form the entries of a new column. Here is one solution: >>>>>>>>>>>>>>> library(openNLP) library(NLP) df <- data.frame(DocumentID = c(478920L, 510133L, 499497L, 930234L), Message = structure(c(4L, 2L, 3L, 1L), .Label = c("Thank you very much for your nice feedback.\n", "THank you, added it", "Thanks for the well explained article.", "The solution has been updated"), class = "factor")) dput(df) tagPOS <- function(x, ...) { s <- as.String(x) if(s=="") return(list()) word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence", 1L, nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2) a3w <- a3[a3$type == "word"] POStags <- unlist(lapply(a3w$features, `[[`, "POS")) POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ") list(POStagged = POStagged, POStags = POStags) } verbs <-function(x) { tagPOSx <- tagPOS(x) scanx <- scan(text=as.character(x), what="character") n <- length(scanx) paste(scanx[(1:n)[grepl("VB", tagPOSx$POStags)]], collapse="|") } library(dplyr) df %>% group_by(DocumentID) %>% summarise(verbs = verbs(Message)) <<<<<<<<<<<<<<<<<<<<< I'll leave it to you to extract a column of verbs from the result and rbind it to the original data.frame. Btw, I don't this solution is efficient, I would guess that the processing that scan does in the verbs function is duplicating work already done in the tagPOS function by annotate, so you may want to return a list of tokens from tagPOS and use that instead of scan. Rgds, Robert On 06/11/18 10:26, Elahe chalabi via R-help wrote: > Hi all, In my df I would like to generate a new column which contains > a string showing all the verbs in each row of df$Message. >> library(openNLP) library(NLP) dput(df) > structure(list(DocumentID = c(478920L, 510133L, 499497L, 930234L ), > Message = structure(c(4L, 2L, 3L, 1L), .Label = c("Thank you very much > for your nice feedback.\n", "THank you, added it", "Thanks for the > well explained article.", "The solution has been updated"), class = > "factor")), class = "data.frame", row.names = c(NA, -4L)) tagPOS <- > function(x, ...) { s <- as.String(x) word_token_annotator <- > Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence", 1L, > nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <- > annotate(s, Maxent_POS_Tag_Annotator(), a2) a3w <- a3[a3$type == > "word"] POStags <- unlist(lapply(a3w$features, `[[`, "POS")) POStagged > <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ") > list(POStagged = POStagged, POStags = POStags) } Any help? Thanks in > advance! Elahe ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the > posting guide http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.