Lucene has support for ngrams during indexing and querying. The rest would have to be done for you.
<shamelessPlug>Taming Text chapter 7 has some basic implementations using Lucene to do categorization. http://www.manning.com/ingersoll</shamelessPlug> -Grant On Jul 24, 2011, at 12:38 PM, Saurabh Gokhale wrote: > Hi All, > > I need to work on the application where I have to categorize text (group of > sentences) into multiple pre-defined categories. > > As I understand from the searches on the internet, theoretically it is > possible with Ngram based index and matching the incoming text n-gram with > the known fingerprint of the category. > > I wanted to know if Lucene already has any contribution done in this regards > that I can find in the contrib directory or is there any example that I can > look at else where. > > Saurabh -------------------------- Grant Ingersoll --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org