I have used a very similar script, I think there might be some extra steps
that are needed before it could be as robust as toPandas. If you look at
_to_corrected_pandas_type in the toPandas
(https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1869),
this would have to be implemented for this too. I agree that serializing the
data to a pandas dataframe or numpy array is faster and less memory
intensive.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to