Hi Kevin, Yes I saw that. Slycoder's package uses collapsed Gibbs sampling, which may run faster than a standard VB implementation of LDA. I hope to soon implement a collapsed VB implementation for LDA which should outperform both collapsed Gibbs sampling and standard VB.
On Saturday, July 2, 2016 at 5:22:13 PM UTC-7, Kevin Squire wrote: > > TopicModels.jl (https://github.com/slycoder/TopicModels.jl > <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fslycoder%2FTopicModels.jl%2Fblob%2Fmaster%2FREADME.md&sa=D&sntz=1&usg=AFQjCNEvhQ7gWCMHuW6DuVS0toN9VKKymw>) > > has an implementation of LDA. > > Cheers, > Kevin > > On Saturday, July 2, 2016, esproff <[email protected] <javascript:>> wrote: > >> thanks! >> >> So I know there is a Java implementation of LDA (MALLET Pkg), which I >> believe uses collapsed Gibbs sampling, and also there are probably multiple >> C++ implementations as well, unfortunately I don't know Java or C++ so I'm >> unable personally to benchmark against those. However there are also >> Matlab and R implementations which are two languages that I do probably >> know well enough that I could run some benchmarks against them, so I may do >> that in the near future. >> >> On Saturday, July 2, 2016 at 6:30:34 AM UTC-7, Cedric St-Jean wrote: >>> >>> Impressive work, especially with the documentation! Have you benchmarked >>> it against other implementations? >>> >>> On Saturday, July 2, 2016 at 12:32:13 AM UTC-4, esproff wrote: >>>> >>>> Hi all! >>>> >>>> So I have just released a new variational Bayes topic modeling package >>>> for Julia, which can be found here: >>>> >>>> https://github.com/esproff/TopicModelsVB.jl >>>> >>>> The models included are: >>>> >>>> 1. >>>> >>>> Latent Dirichlet Allocation (LDA) >>>> 2. >>>> >>>> Filtered Latent Dirichlet Allocation (fLDA) >>>> 3. >>>> >>>> Correlated Topic Model (CTM) >>>> 4. >>>> >>>> Filtered Correlated Topic Model (fCTM) >>>> 5. >>>> >>>> Dynamic Topic Model (DTM) >>>> 6. >>>> >>>> Collaborative Topic Poisson Factorization (CTPF) >>>> >>>> This is, as far as I can tell, the best open-source topic modeling >>>> package to date. It's still a bit rough around the edges and there are a >>>> few edge-case bugs I think still deep in the belly of 1 or 2 of the >>>> algorithms. But overall it's polished enough that I think it needs to be >>>> tried out by other people besides myself. >>>> >>>> I'm open to collaborators, and I'm especially interested in adding some >>>> GPGPU support, however, formally speaking, I'm trained as a mathematician, >>>> not a computer scientist or software engineer, and thus if you're an >>>> expert >>>> in GPGPU I'd be very interested in talking to you about adding this >>>> functionality as Bayesian learning can be *EXTREMELY *computationally >>>> intensive. (you can contact me on here or at [email protected]) >>>> >>>> On the other hand, if you're more into the applied math / machine >>>> learning side, there are still a number of models to implement, mostly >>>> non-parametric versions of the ones I've implemented, however I should >>>> warn >>>> you that Bayesian nonparametrics is not for the faint of heart. >>>> >>>> Julia is a great language, and I hope you all like it as much as I do, >>>> of course the speed is the big seller, however I think maybe its best >>>> feature is the ease with which one can dig down into the internals of the >>>> language, and considering how high-level the language is, this is truly a >>>> masterstroke by the creators. >>>> >>>
