Hi Robert, There's some work to do LDA via Gibbs sampling in this JIRA: https://issues.apache.org/jira/browse/SPARK-1405 as well as this one: https://issues.apache.org/jira/browse/SPARK-5556
It may make sense to have a more general Gibbs sampling framework, but it might be good to have a few desired applications in mind (e.g. higher level models that rely on Gibbs) to help API design, parallelization strategy, etc. See the guide ( https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingNewAlgorithmstoMLLib) for information about contributing to MLlib. - Evan On Tue, Mar 3, 2015 at 5:51 PM, Robert Dodier <robert.dod...@gmail.com> wrote: > Hi, > > I have some ideas for MLlib that I think might be of general interest > so I'd like to see what people think and maybe find some collaborators. > > (1) Some form of Markov chain Monte Carlo such as Gibbs sampling > or Metropolis-Hastings. Any kind of Monte Carlo method is readily > parallelized so Spark seems like a natural platform for them. > MCMC plays an important role in computational implementations > of Bayesian inference. > (2) A function to compute the calibration of a probabilistic classifier. > The question this answers is, if the classifier outputs 0.x for some > group of examples, is the actual proportion approximately 0.x ? > This is useful to know if the classifier outputs are used to compute > expected loss in some decision procedure. > > Of course (1) is much bigger than (2). Perhaps (2) is a one-person > job but (1) will take a lot of teamwork. I am thinking that in the short > term, we could at least make some progress on an outline or > framework for (1). > > I am a newcomer to Scala and Spark but I have a lot of experience > in statistical computing. I am thinking that maybe one or the other > of these projects will be a good way for me to learn more about > Spark and make a useful contribution. Thanks for your interest > and I look forward to your comments. > > Robert Dodier > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >