>From Python developer perspective, this direction sounds making sense to me. As pandas is almost the standard library in the related area, if PySpark supports pandas API out of box, the usability would be in a higher level.
For maintenance cost, IIUC, there are some Spark committers in the community of Koalas and they are pretty active. So seems we don't need to worry about who will be interested to do the maintenance. It is good that it is as a separate package and does not break anything in the existing codes. How about test code? Does it fit into PySpark test framework? Hyukjin Kwon wrote > Hi all, > > I would like to start the discussion on supporting pandas API layer on > Spark. > > If we have a general consensus on having it in PySpark, I will initiate > and > drive an SPIP with a detailed explanation about the implementation’s > overview and structure. > > I would appreciate it if I can know whether you guys support this or not > before starting the SPIP. > > I do recommend taking a quick look for blog posts and talks made for > pandas > on Spark: > https://koalas.readthedocs.io/en/latest/getting_started/videos_blogs.html. > They explain why we need this far more better. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org