On 12/13/05, Dave Kor <[EMAIL PROTECTED]> wrote:
> On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote:
> > Paul Libbrecht <[EMAIL PROTECTED]> writes:
> >
> > > We're also thinking about implementing something similar to LSI within
> > > ActiveMath which is lucene-powered where both formulae and text
> > > searching would benefit of the latent-semantic-similarity. I've been
> > > refrained of doing "exactly this" at least since LSI is patented. This
> > > might also be a reason why there's no implementation in Lucene's
> > > sandbox.
> > >
> > > Have you looked at other vector-based approaches which are not exactly 
> > > LSI ?
> > > Have you looked at InfoMap NLP ?
> >
> > Look for Thomas Hofmann's "probabilistic LSI", and other recent work
> > which cites it.
>
> You might also be interested in "Latent Dirichlet Allocation (LDA)" by
> David Blei. In short, it is a more advanced version of "probabilistic
> LSI". I am currently writing some code to dump Lucene documents into a
> file format used by Blei's LDA implementation written in C.

Following up on my previous mail about LDA, here are a few links

David Blei, Andrew Ng and Michael Jordan's paper on LDA
http://www.cs.berkeley.edu/~blei/papers/blei03a.pdf

David Blei's C implementation of LDA http://www.cs.berkeley.edu/~blei/lda-c/

Gregor Heinrich's port of LDA-c to Java http://www.arbylon.net/projects/
Note: To use his code in non-windows platform, you will need to
replace his fast Mersenne Twister based random number generator with
Java's standard random number generator.



Regards,
Dave.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to