VectorUDT and ml.Vector

Ganesh Mon, 07 Nov 2016 05:34:08 -0800

I am trying to run a SVD on a dataframe and I have used ml TF-IDF whichhas created a dataframe.Now for Singular Value Decomposition I am trying to use RowMatrix whichtakes in RDD with mllib.Vector so I have to convert this Dataframe withwhat I assumed was ml.Vector


However the conversion

/val convertedTermDocMatrix =MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")/


fails with

java.lang.IllegalArgumentException: requirement failed: Column featuresmust be new Matrix type to be converted to old type but gotorg.apache.spark.ml.linalg.VectorUDT

So the question is: How do I perform SVD on a DataFrame? I assume allthe functionalities of mllib has not be ported to ml.

I tried to convert my entire project to use RDD but computeSVD onRowMatrix is throwing up out of Memory errors and anyway I would like tostick with DataFrame.


Our text corpus is around 55 Gb of text data.



Ganesh

VectorUDT and ml.Vector

Reply via email to