Re: Saving and Loading Dataframes

Yanbo Liang Thu, 25 Feb 2016 21:47:48 -0800

Hi Raj,

Could you share your code which can help others to diagnose this issue?
Which version did you use?
I can not reproduce this problem in my environment.


Thanks
Yanbo

2016-02-26 10:49 GMT+08:00 raj.kumar <raj.ku...@hooklogic.com>:

> Hi,
>
> I am using mllib. I use the ml vectorization tools to create the vectorized
> input dataframe for
> the ml/mllib machine-learning models with schema:
>
> root
>  |-- label: double (nullable = true)
>  |-- features: vector (nullable = true)
>
> To avoid repeated vectorization, I am trying to save and load this
> dataframe
> using
>    df.write.format("json").mode("overwrite").save( url )
>     val data = Spark.sqlc.read.format("json").load( url )
>
> However when I load the dataframe, the newly loaded dataframe has the
> following schema:
> root
>  |-- features: struct (nullable = true)
>  |    |-- indices: array (nullable = true)
>  |    |    |-- element: long (containsNull = true)
>  |    |-- size: long (nullable = true)
>  |    |-- type: long (nullable = true)
>  |    |-- values: array (nullable = true)
>  |    |    |-- element: double (containsNull = true)
>  |-- label: double (nullable = true)
>
> which the machine-learning models do not recognize.
>
> Is there a way I can save and load this dataframe without the schema
> changing.
> I assume it has to do with the fact that Vector is not a basic type.
>
> thanks
> -Raj
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Saving-and-Loading-Dataframes-tp26339.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Saving and Loading Dataframes

Reply via email to