Hi Robert,

There's some work to do LDA via Gibbs sampling in this JIRA:
https://issues.apache.org/jira/browse/SPARK-1405 as well as this one:
https://issues.apache.org/jira/browse/SPARK-5556

It may make sense to have a more general Gibbs sampling framework, but it
might be good to have a few desired applications in mind (e.g. higher level
models that rely on Gibbs) to help API design, parallelization strategy,
etc.

See the guide (
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingNewAlgorithmstoMLLib)
for information about contributing to MLlib.

- Evan

On Tue, Mar 3, 2015 at 5:51 PM, Robert Dodier <robert.dod...@gmail.com>
wrote:

> Hi,
>
> I have some ideas for MLlib that I think might be of general interest
> so I'd like to see what people think and maybe find some collaborators.
>
> (1) Some form of Markov chain Monte Carlo such as Gibbs sampling
> or Metropolis-Hastings. Any kind of Monte Carlo method is readily
> parallelized so Spark seems like a natural platform for them.
> MCMC plays an important role in computational implementations
> of Bayesian inference.


> (2) A function to compute the calibration of a probabilistic classifier.
> The question this answers is, if the classifier outputs 0.x for some
> group of examples, is the actual proportion approximately 0.x ?
> This is useful to know if the classifier outputs are used to compute
> expected loss in some decision procedure.
>
> Of course (1) is much bigger than (2). Perhaps (2) is a one-person
> job but (1) will take a lot of teamwork. I am thinking that in the short
> term, we could at least make some progress on an outline or
> framework for (1).
>
> I am a newcomer to Scala and Spark but I have a lot of experience
> in statistical computing. I am thinking that maybe one or the other
> of these projects will be a good way for me to learn more about
> Spark and make a useful contribution. Thanks for your interest
> and I look forward to your comments.
>
> Robert Dodier
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to