Mike, This is what I am looking for.
http://en.wikipedia.org/wiki/Automatic_summarization I want to obtain a summary of a huge document as meaningful sentences. I do not want a bag of words as the output. I have 1000's of documents each one running to 3-4 pages. I plan to use R to do clustering/classification of these documents. Instead of working with the original document, I think it would be better to work with a summary of the documents since this would avoid memory issues. Thank you. Ravi On Tue, May 31, 2011 at 10:02 PM, Mike Marchywka <marchy...@hotmail.com>wrote: > > > > > > > > ---------------------------------------- > > Date: Tue, 31 May 2011 03:25:56 -0700 > > From: viora...@gmail.com > > To: r-help@r-project.org > > Subject: [R] Text Summarization > > > > Is there a text mining/ NLP package in R that could do text > summarization? > > For example, take a huge text as input and provide a summary of the text. > > > > In package tm, summarization is defined more as high frequency terms > which > > is not what I want. I actually want a summary of what is present in the > huge > > volume of text. > > > Cliff's notes? Can you define it more precisely? There are some > computational > linguistics packages IIRC. > > > > Any help on a R package would be helpful. Thank you. > > > > Ravi > > > > -- > > View this message in context: > http://r.789695.n4.nabble.com/Text-Summarization-tp3562735p3562735.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.