Re: Faster and Lower memory implementation toPandas

2017-11-20 Thread gmcrosh
I have used a very similar script, I think there might be some extra steps that are needed before it could be as robust as toPandas. If you look at _to_corrected_pandas_type in the toPandas (https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1869), this would have to be im

Re: Faster and Lower memory implementation toPandas

2017-11-16 Thread Reynold Xin
Please send a PR. Thanks for looking at this. On Thu, Nov 16, 2017 at 7:27 AM Andrew Andrade wrote: > Hello devs, > > I know a lot of great work has been done recently with pandas to spark > dataframes and vice versa using Apache Arrow, but I faced a specific pain > point on a low memory setup w