Do you want search result clustering or document clustering? My
understanding of Carrot2 is it isn't designed for the latter. The
difference being it is designed to work off of shorter snippets of
text, as opposed to the whole document. FWIW, you _might_ find some
help over on the Mahout project (http://lucene.apache.org/mahout) in
terms of different approaches. We have a couple of clustering
approaches implemented there, but they just work off a matrix and it
is up to you to fill the matrix (presumably with some distance
calculation.) The Carrot2 list may also help clarify if it is the
appropriate place.
-Grant
On May 15, 2008, at 4:34 PM, Otis Gospodnetic wrote:
Have you tried using Carrot2 with Lucene? They work quite well in
tandem!
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Supheakmungkol SARIN <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, May 14, 2008 11:23:45 PM
Subject: Document clustering with Lucene
Dear all,
I'd like to do document clustering using full-text with Lucene. In
other words,
I would like to group similar documents in their respective groups.
I searched
the mailing list and found that there are two ways around. The
first method is
to represent the one document as query and search the collection.
The other way
would be to construct the vector of terms of each of the documents
and use the
cosine distance function to compute the similarity. I found these
methods here:
- http://www.mail-archive.com/[EMAIL PROTECTED]/msg04916.html)
.
I would like to know whether there are better way? or any built-in
functions to
do clustering in the recent release version of Lucene?
Thank you.
Kind regards,
Supheakmungkol
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]