Re: [DISCUSS] Support pandas API layer on PySpark

Liang-Chi Hsieh Sat, 13 Mar 2021 22:52:42 -0800

>From Python developer perspective, this direction sounds making sense to me.
As pandas is almost the standard library in the related area, if PySpark
supports pandas API out of box, the usability would be in a higher level.


For maintenance cost, IIUC, there are some Spark committers in the community
of Koalas and they are pretty active. So seems we don't need to worry about
who will be interested to do the maintenance. 

It is good that it is as a separate package and does not break anything in
the existing codes. How about test code? Does it fit into PySpark test
framework?


Hyukjin Kwon wrote
> Hi all,
> 
> I would like to start the discussion on supporting pandas API layer on
> Spark.
> 
> If we have a general consensus on having it in PySpark, I will initiate
> and
> drive an SPIP with a detailed explanation about the implementation’s
> overview and structure.
> 
> I would appreciate it if I can know whether you guys support this or not
> before starting the SPIP.
> 
> I do recommend taking a quick look for blog posts and talks made for
> pandas
> on Spark:
> https://koalas.readthedocs.io/en/latest/getting_started/videos_blogs.html.
> They explain why we need this far more better.





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSS] Support pandas API layer on PySpark

Reply via email to