Hi,

I am using mllib. I use the ml vectorization tools to create the vectorized
input dataframe for
the ml/mllib machine-learning models with schema:
 
root
 |-- label: double (nullable = true)
 |-- features: vector (nullable = true)

To avoid repeated vectorization, I am trying to save and load this dataframe
using
   df.write.format("json").mode("overwrite").save( url )
    val data = Spark.sqlc.read.format("json").load( url )

However when I load the dataframe, the newly loaded dataframe has the
following schema:
root
 |-- features: struct (nullable = true)
 |    |-- indices: array (nullable = true)
 |    |    |-- element: long (containsNull = true)
 |    |-- size: long (nullable = true)
 |    |-- type: long (nullable = true)
 |    |-- values: array (nullable = true)
 |    |    |-- element: double (containsNull = true)
 |-- label: double (nullable = true)

which the machine-learning models do not recognize. 

Is there a way I can save and load this dataframe without the schema
changing. 
I assume it has to do with the fact that Vector is not a basic type. 

thanks
-Raj





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-and-Loading-Dataframes-tp26339.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to