Hi Sameer,
MLbase started out as a set of three ML components on top of Spark. The
lowest level, MLlib, is now a rapidly growing component within Spark and is
maintained by the Spark community. The two higher-level components (MLI and
MLOpt) are experimental components that serve as testbeds for
To add to the last point, multimodel training is something we've explored
as part of the MLbase Optimizer, and we've seen some nice speedups. This
feature will be added to MLlib soon (not sure if it'll make it into the 1.1
release though).
On Sat, Jul 26, 2014 at 11:27 PM, Matei Zaharia
wrote:
Hi Pedro,
Yes, although they will probably not be included in the next release (since
the code freeze is ~2 weeks away), GBM (and other ensembles of decision
trees) are currently under active development. We're hoping they'll make
it into the subsequent release.
-Ameet
On Wed, Jul 16, 2014 at
Hi Joseph,
Thanks for your email. Many users are requesting this functionality, while
it would be a stretch for them to appear in Spark 1.1, various people
(including Manish Amde and folks at the AMPLab, Databricks and Alpine Labs)
are actively work on developing ensembles of decision trees (rand
Hi Wanda,
As Sean mentioned, K-means is not guaranteed to find an optimal answer,
even for seemingly simple toy examples. A common heuristic to deal with
this issue is to run kmeans multiple times and choose the best answer. You
can do this by changing the runs parameter from the default value (1