RE: Document clustering using lucene

2006-06-15 Thread John Hamilton
I'v been thinking about a similar problem. However, it seems that the similarity score returned by a search is only relevant within those search results. You can't compare the similarity scores from two different searches. I think you will have to compute the similarities yourself using the t

Re: Document clustering using lucene

2006-06-15 Thread Paul Elschot
On Thursday 15 June 2006 13:50, Prasenjit Mukherjee wrote: > I want to do some document clustering on a corpus of ~ 100,000 > documents, with average doc size being ~ 7k. I have looked into carrot2 > but it seems to work only for relatively short documents and has soem > scalign issues for lar