Re: Saving and Loading Dataframes

2016-02-28 Thread Yanbo Liang
Hi Raj, If you choose JSON as the storage format, Spark SQL will store VectorUDT as Array of Double. So when you load back to memory, it can not be recognized as Vector. One workaround is storing the DataFrame as parquet format, it will be loaded and recognized as expected. df.write.format("parqu

Re: Saving and Loading Dataframes

2016-02-26 Thread Raj Kumar
Thanks for the response Yanbo. Here is the source (it uses the sample_libsvm_data.txt file used in the mlliv examples). -Raj — IOTest.scala - import org.apache.spark.{SparkConf,SparkContext} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.DataFrame object IOTe

Re: Saving and Loading Dataframes

2016-02-25 Thread Yanbo Liang
Hi Raj, Could you share your code which can help others to diagnose this issue? Which version did you use? I can not reproduce this problem in my environment. Thanks Yanbo 2016-02-26 10:49 GMT+08:00 raj.kumar : > Hi, > > I am using mllib. I use the ml vectorization tools to create the vectorize