I am trying to run a SVD on a dataframe and I have used ml TF-IDF which
has created a dataframe.
Now for Singular Value Decomposition I am trying to use RowMatrix which
takes in RDD with mllib.Vector so I have to convert this Dataframe with
what I assumed was ml.Vector
However the conversion
/val convertedTermDocMatrix =
MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")/
fails with
java.lang.IllegalArgumentException: requirement failed: Column features
must be new Matrix type to be converted to old type but got
org.apache.spark.ml.linalg.VectorUDT
So the question is: How do I perform SVD on a DataFrame? I assume all
the functionalities of mllib has not be ported to ml.
I tried to convert my entire project to use RDD but computeSVD on
RowMatrix is throwing up out of Memory errors and anyway I would like to
stick with DataFrame.
Our text corpus is around 55 Gb of text data.
Ganesh