The reason behind this error can be inferred from the error log: *MLUtils.convertMatrixColumnsFromML *was used to convert ml.linalg.Matrix to mllib.linalg.Matrix, but it looks like the column type is ml.linalg.Vector in your case. Could you check the type of column "features" in your dataframe (Vector or Matrix)? I think it's ml.linalg.Vector, so your should use *MLUtils.convertVectorColumnsFromML.*
Thanks Yanbo On Mon, Nov 7, 2016 at 5:25 AM, Ganesh <m...@ganeshkrishnan.com> wrote: > I am trying to run a SVD on a dataframe and I have used ml TF-IDF which > has created a dataframe. > Now for Singular Value Decomposition I am trying to use RowMatrix which > takes in RDD with mllib.Vector so I have to convert this Dataframe with > what I assumed was ml.Vector > > However the conversion > > *val convertedTermDocMatrix = > MLUtils.convertMatrixColumnsFromML(termDocMatrix,"features")* > > fails with > > java.lang.IllegalArgumentException: requirement failed: Column features > must be new Matrix type to be converted to old type but got > org.apache.spark.ml.linalg.VectorUDT > > > So the question is: How do I perform SVD on a DataFrame? I assume all the > functionalities of mllib has not be ported to ml. > > > I tried to convert my entire project to use RDD but computeSVD on > RowMatrix is throwing up out of Memory errors and anyway I would like to > stick with DataFrame. > > Our text corpus is around 55 Gb of text data. > > > > Ganesh >