[
https://issues.apache.org/jira/browse/SPARK-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yanbo Liang updated SPARK-17847:
--------------------------------
Description:
Copy {{GaussianMixture}} implementation from mllib to ml, then we can add new
features to it.
I left mllib {{GaussianMixture}} untouched, unlike some other algorithms to
wrap the ml implementation. For the following reasons:
* mllib {{GaussianMixture}} allow k == 1, but ml does not.
* mllib {{GaussianMixture}} supports setting initial model, but ml does not
support currently. (We will definitely add this feature for ml in the future)
Meanwhile, There is a big performance improvement for {{GaussianMixture}} in
this task. Since the covariance matrix of multivariate gaussian distribution is
symmetric, we can only store the upper triangular part of the matrix and it
will greatly reduce the shuffled data size.
was:
Copy {{GaussianMixture}} implementation from mllib to ml, then we can add new
features to it.
I left mllib {{GaussianMixture}} untouched, unlike some other algorithms to
wrap the ml implementation. For the following reasons:
* mllib {{GaussianMixture}} allow k == 1, but ml does not.
* mllib {{GaussianMixture}} supports setting initial model, but ml does not
support currently. (We will definitely add this feature for ml in the future)
Meanwhile, There is a big performance improvement for {{GaussianMixture}} in
this task. Since the covariance matrix of multivariate gaussian distribution is
symmetric, we can only store the upper triangular part of the matrix and it
will greatly reduce the shuffled data size.
> Reduce shuffled data size of GaussianMixture & copy the implementation from
> mllib to ml
> ---------------------------------------------------------------------------------------
>
> Key: SPARK-17847
> URL: https://issues.apache.org/jira/browse/SPARK-17847
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Reporter: Yanbo Liang
>
> Copy {{GaussianMixture}} implementation from mllib to ml, then we can add new
> features to it.
> I left mllib {{GaussianMixture}} untouched, unlike some other algorithms to
> wrap the ml implementation. For the following reasons:
> * mllib {{GaussianMixture}} allow k == 1, but ml does not.
> * mllib {{GaussianMixture}} supports setting initial model, but ml does not
> support currently. (We will definitely add this feature for ml in the future)
> Meanwhile, There is a big performance improvement for {{GaussianMixture}} in
> this task. Since the covariance matrix of multivariate gaussian distribution
> is symmetric, we can only store the upper triangular part of the matrix and
> it will greatly reduce the shuffled data size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]