I have used "tm" package to import a set of text documents using the following command:
text <- Corpus(DirSource("."),readerControl = list(language ="ansi")) I would like to extract only a certain portion of the text in each document using certain keywords. For example, I would like to include all the text between key words <Start Text> and <End Text>. All the remaining text should be discarded. Is there anyway to accomplish this in 'tm' package??? Also, is there a quick way to remove all the HTML tags from the text??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Extracting-certain-text-using-tm-package-tp3627063p3627063.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.