I believe I have a similar question to this. I would like to process an
offline event stream for testing/debugging. The stream is stored in a CSV
file where each row in the file has a timestamp. I would like to feed this
file into Spark Streaming and have the concept of time be driven by the
timest
Have you solved this issue? im also wondering how to stream an existing file.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-tp14306p16406.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Hi,
Were you able to figure out how to choose a specific version? Im having the
same issue.
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-could-I-start-new-spark-cluster-with-hadoop2-0-2-tp10450p15939.html
Sent from the Apache Spark User Lis
Thanks for your response Burak it was very helpful.
I am noticing that if I run PCA before KMeans that the KMeans algorithm will
actually take longer to run than if I had just run KMeans without PCA. I was
hoping that by using PCA first it would actually speed up the KMeans
algorithm.
I have foll
sowen wrote
> it seems that the singular values from the SVD aren't returned, so I don't
> know that you can access this directly
Its not clear to me why these aren't returned? The S matrix would be useful
to determine a reasonable value for K.
--
View this message in context:
http://apache-sp
I would like to reduce the dimensionality of my data before running kmeans.
The problem I'm having is that both RowMatrix.computePrincipalComponents()
and RowMatrix.computeSVD() return a DenseMatrix whereas KMeans.train()
requires an RDD[Vector]. Does MLlib provide a way to do this conversion?
-
Not sure if you resolved this but I had a similar issue and resolved it. In
my case, the problem was the ids of my items were of type Long and could be
very large (even though there are only a small number of distinct ids...
maybe a few hundred of them). KMeans will create a dense vector for the
cl
Does MLlib provide utility functions to do this kind of encoding?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.