Do you want search result clustering or document clustering? My understanding of Carrot2 is it isn't designed for the latter. The difference being it is designed to work off of shorter snippets of text, as opposed to the whole document. FWIW, you _might_ find some help over on the Mahout project (http://lucene.apache.org/mahout) in terms of different approaches. We have a couple of clustering approaches implemented there, but they just work off a matrix and it is up to you to fill the matrix (presumably with some distance calculation.) The Carrot2 list may also help clarify if it is the appropriate place.

-Grant

On May 15, 2008, at 4:34 PM, Otis Gospodnetic wrote:

Have you tried using Carrot2 with Lucene? They work quite well in tandem!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
From: Supheakmungkol SARIN <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, May 14, 2008 11:23:45 PM
Subject: Document clustering with Lucene

Dear all,

I'd like to do document clustering using full-text with Lucene. In other words, I would like to group similar documents in their respective groups. I searched the mailing list and found that there are two ways around. The first method is to represent the one document as query and search the collection. The other way would be to construct the vector of terms of each of the documents and use the cosine distance function to compute the similarity. I found these methods here:

- http://www.mail-archive.com/[EMAIL PROTECTED]/msg04916.html) .

I would like to know whether there are better way? or any built-in functions to
do clustering in the recent release version of Lucene?

Thank you.

Kind regards,

Supheakmungkol


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to