Re: [DISCUSS] PySpark Window UDF

2018-09-08 Thread Wes McKinney
hi Li, These results are very cool. I'm excited to see you continuing to push this effort forward. - Wes On Wed, Sep 5, 2018 at 5:52 PM Li Jin wrote: > > Hello again! > > I recently implemented a proof-of-concept implementation of proposal above. I > think the results are pretty exciting so I w

Re: toPandas very slow

2016-03-22 Thread Wes McKinney
hi all, I recently did an analysis of the performance of toPandas summary: http://wesmckinney.com/blog/pandas-and-apache-arrow/ ipython notebook: https://gist.github.com/wesm/0cb5531b1c2e346a0007 One solution I'm planning for this is an alternate serializer for Spark DataFrames, with an optimize

Re: PySpark API divergence + improving pandas interoperability

2016-03-21 Thread Wes McKinney
ich changes the behavior of implicit type coercion and > allows boolean to integral automatically. > > > On Thursday, March 17, 2016, Wes McKinney wrote: >> >> hi everyone, >> >> I've recently gotten moving on solving some of the low-level data >> i

PySpark API divergence + improving pandas interoperability

2016-03-19 Thread Wes McKinney
hi everyone, I've recently gotten moving on solving some of the low-level data interoperability problems between Python's NumPy-focused scientific computing and data libraries like pandas and the rest of the big data ecosystem, Spark being a very important part of that. One of the major efforts h