Hi Raj,
If you choose JSON as the storage format, Spark SQL will store VectorUDT as
Array of Double.
So when you load back to memory, it can not be recognized as Vector.
One workaround is storing the DataFrame as parquet format, it will be
loaded and recognized as expected.
df.write.format("parqu
Thanks for the response Yanbo. Here is the source (it uses the
sample_libsvm_data.txt file used in the
mlliv examples).
-Raj
— IOTest.scala -
import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.DataFrame
object IOTe
Hi Raj,
Could you share your code which can help others to diagnose this issue?
Which version did you use?
I can not reproduce this problem in my environment.
Thanks
Yanbo
2016-02-26 10:49 GMT+08:00 raj.kumar :
> Hi,
>
> I am using mllib. I use the ml vectorization tools to create the vectorize