bytes (8 bytes per double)
You should try to use my new generalized kmeans clustering package
<https://github.com/derrickburns/generalized-kmeans-clustering> , which
works on high dimensional sparse data.
You will want to use the RandomIndexing embedding:
def sparseTrain(raw: RDD[
Here is a spark challenge for you!
I have a data set where each entry has a date. I would like to identify
gaps in the dates greater larger a given length. For example, if the data
were log entries, then the gaps would tell me when I was missing log data
for long periods of time. What is the mos
This project generalizes the Spark MLLIB K-Means clusterer to support
clustering of dense or sparse, low or high dimensional data using distance
functions defined by Bregman divergences.
https://github.com/derrickburns/generalized-kmeans-clustering
--
View this message in context:
http