On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote: > Paul Libbrecht <[EMAIL PROTECTED]> writes: > > > We're also thinking about implementing something similar to LSI within > > ActiveMath which is lucene-powered where both formulae and text > > searching would benefit of the latent-semantic-similarity. I've been > > refrained of doing "exactly this" at least since LSI is patented. This > > might also be a reason why there's no implementation in Lucene's > > sandbox. > > > > Have you looked at other vector-based approaches which are not exactly LSI ? > > Have you looked at InfoMap NLP ? > > Look for Thomas Hofmann's "probabilistic LSI", and other recent work > which cites it.
You might also be interested in "Latent Dirichlet Allocation (LDA)" by David Blei. In short, it is a more advanced version of "probabilistic LSI". I am currently writing some code to dump Lucene documents into a file format used by Blei's LDA implementation written in C. Regards, Dave. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]