I think this is great info and context to put in the JIRA.
On Fri, Oct 19, 2018, 6:53 PM Matt Saunders wrote:
> Hi Sean, thanks for your feedback. I saw this as a missing feature in the
> existing PCA implementation in MLlib. I suspect the use case is a common
> one: you have data from different
Hi Sean, thanks for your feedback. I saw this as a missing feature in the
existing PCA implementation in MLlib. I suspect the use case is a common
one: you have data from different entities (could be different users,
different locations, or different products, for example) and you need to
model the
It's OK to open a JIRA though I generally doubt any new functionality will
be added. This might be viewed as a small worthwhile enhancement, haven't
looked at it. It's always more compelling if you can sketch the use case
for it and why it is more meaningful in spark than outside it.
There is spar
Thanks, Eric. I went ahead and created SPARK-25782 for this improvement
since it is a feature I and others have looked for in MLlib, but doesn't
seem to exist yet. Also, while searching for PCA-related issues in JIRA I
noticed that someone added grouping support for PCA to the MADlib project a
whil
For 3rd-party libs, I have been publishing independently, for example at
isarn-sketches-spark or silex:
https://github.com/isarn/isarn-sketches-spark
https://github.com/radanalyticsio/silex
Either of these repos provide some good working examples of publishing a
spark UDAF or ML library for jvm an
Erik - is there a current locale for approved/recommended third party
additions? The spark-packages has been stale for years it seems.
Am Fr., 19. Okt. 2018 um 07:06 Uhr schrieb Erik Erlandson <
eerla...@redhat.com>:
> Hi Matt!
>
> There are a couple ways to do this. If you want to submit it for
Hi Matt!
There are a couple ways to do this. If you want to submit it for inclusion
in Spark, you should start by filing a JIRA for it, and then a pull
request. Another possibility is to publish it as your own 3rd party
library, which I have done for aggregators before.
On Wed, Oct 17, 2018 at
I built an Aggregator that computes PCA on grouped datasets. I wanted to
use the PCA functions provided by MLlib, but they only work on a full
dataset, and I needed to do it on a grouped dataset (like a
RelationalGroupedDataset).
So I built a little Aggregator that can do that, here’s an example o