I am trying to run a SVD on a dataframe and I have used ml TF-IDF which has created a dataframe. Now for Singular Value Decomposition I am trying to use RowMatrix which takes in RDD with mllib.Vector so I have to convert this Dataframe with what I assumed was ml.Vector

However the conversion

/val convertedTermDocMatrix = MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")/

fails with

java.lang.IllegalArgumentException: requirement failed: Column features must be new Matrix type to be converted to old type but got org.apache.spark.ml.linalg.VectorUDT


So the question is: How do I perform SVD on a DataFrame? I assume all the functionalities of mllib has not be ported to ml.


I tried to convert my entire project to use RDD but computeSVD on RowMatrix is throwing up out of Memory errors and anyway I would like to stick with DataFrame.

Our text corpus is around 55 Gb of text data.



Ganesh

Reply via email to