I use my own LSI implementation based on Lucene for text clustering. I've done some tests, but I do believe that integrating LSI onto the lucene search subsystem (i.e. creating something like LSISimilarity) is not an easy task
I start analyzing the documents using Lucene, and then extract tfidf values (with lucene again), in order to build a documents/terms matrix. Then I use an implementation of LSI/SVD to analyze it. At this point I think that reassigning the scores back to Lucene documents is very difficult; but I'm trying to grab the modified scores from the matrix on my LSISImilarity. Instead clustering search results this way is not too difficult, I just apply the algorithm (mostly HAC-like) to the modified matrix. To search using LSI you must choose a small subset of the collection and then apply LSI/SVD to it, then extend the matrix by 'folding in' new documents. But how to choose the initial subset? Maybe just searching the index and then using the first n documents retrieved. Any idea? Lorenzo On 10/7/05, Paul Libbrecht <[EMAIL PROTECTED]> wrote: > > > I've met other persons with such needs and we would also be interested. > > Unfortunately, this seems not to be available. > A clear issue might be that LSI, in its original form at least, is > covered by an US patent. But maybe someone finds another form which is > not. > > paul > > > Le 5 oct. 05, à 14:59, <[EMAIL PROTECTED]> a écrit : > > I am looking for LSI implementation i lucene. Is it available. I > > couldnt find it in the website. I searched in the archives but no > > help. could some one tell me if it is available or not. > > > > Could you tell me where can i see to find if there are any Language > > processing tools for Indexing and retrieval stuff available > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >