Lucene has support for ngrams during indexing and querying.  The rest would 
have to be done for you.  

<shamelessPlug>Taming Text chapter 7 has some basic implementations using 
Lucene to do categorization.  http://www.manning.com/ingersoll</shamelessPlug>

-Grant

On Jul 24, 2011, at 12:38 PM, Saurabh Gokhale wrote:

> Hi All,
> 
> I need to work on the application where I have to categorize text (group of
> sentences) into multiple pre-defined categories.
> 
> As I understand from the searches on the internet, theoretically it is
> possible with Ngram based index and matching the incoming text n-gram with
> the known fingerprint of the category.
> 
> I wanted to know if Lucene already has any contribution done in this regards
> that I can find in the contrib directory or is there any example that I can
> look at else where.
> 
> Saurabh

--------------------------
Grant Ingersoll




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to