Thank you Ram and Joseph.
I am also hoping to contribute to MLib once my Scala gets up to snuff, this
is the guidance I needed for how to proceed when ready.
Best wishes,
Trevor
On Wed, May 20, 2015 at 1:55 PM, Joseph Bradley
wrote:
> Hi Trevor,
>
> I may be repeating what Ram said, but to 2
Hi Trevor,
I may be repeating what Ram said, but to 2nd it, a few points:
We do want MLlib to become an extensive and rich ML library; as you said,
scikit-learn is a great example. To make that happen, we of course need to
include important algorithms. "Important" is hazy, but roughly means bei
Hi Trevor
I'm attaching the MLLib contribution guideline here:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines
It speaks to widely known and accepted algorithms but not to whether an
algorithm has to be better than
Hi Trevor
Good point, I didn't mean that some algorithm has to be clearly better than
another in every scenario to be included in MLLib. However, even if someone
is willing to be the maintainer of a piece of code, it does not make sense
to accept every possible algorithm into the core library.
Th
Hey Ram,
I'm not speaking to Tarek's package specifically but to the spirit of
MLib. There are a number of method/algorithms for PCA, I'm not sure by
what criterion the current one is considered 'standard'.
It is rare to find ANY machine learning algo that is 'clearly better' than
any other. Th
Hi Trevor, Tarek
You make non standard algorithms (PCA or otherwise) available to users of
Spark as Spark Packages.
http://spark-packages.org
https://databricks.com/blog/2014/12/22/announcing-spark-packages.html
With the availability of spark packages, adding powerful experimental /
alternative m
There are most likely advantages and disadvantages to Tarek's algorithm
against the current implementation, and different scenarios where each is
more appropriate.
Would we not offer multiple PCA algorithms and let the user choose?
Trevor
Trevor Grant
Data Scientist
*"Fortunate is he, who is a
Hi Tarek,
Thanks for your interest & for checking the guidelines first! On 2 points:
Algorithm: PCA is of course a critical algorithm. The main question is how
your algorithm/implementation differs from the current PCA. If it's
different and potentially better, I'd recommend opening up a JIRA