+1 from my side sounds good, it will be helpful to both users and contributors to improve the test coverage
On Wed, Jun 14, 2023 at 8:27 AM Hyukjin Kwon <gurwls...@apache.org> wrote: > Yeah, I have been thinking about this too, and Holden did some work here > that this SPIP will reuse. I support this. > > On Wed, 14 Jun 2023 at 08:10, Amanda Liu <amanda....@databricks.com.invalid> > wrote: > >> Hi all, >> >> I'd like to start a discussion about implementing an official PySpark >> test framework. Currently, there's no official test framework, but only >> various open-source repos and blog posts. >> >> Many of these open-source resources are very popular, which demonstrates >> user-demand for PySpark testing capabilities. spark-testing-base >> <https://github.com/holdenk/spark-testing-base> has 1.4k stars, and >> chispa <https://github.com/MrPowers/chispa> has 532k downloads/month. >> However, it can be confusing for users to piece together disparate >> resources to write their own PySpark tests (see The Elephant in the >> Room: How to Write PySpark Tests >> <https://towardsdatascience.com/the-elephant-in-the-room-how-to-write-pyspark-unit-tests-a5073acabc34> >> ). >> >> We can streamline and simplify the testing process by incorporating test >> features, such as a PySpark Test Base class (which allows tests to share >> Spark sessions) and test util functions (for example, asserting dataframe >> and schema equality). >> >> Please see the SPIP document attached: >> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd >> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042 >> >> I would appreciate it if you could share your thoughts on this proposal. >> >> Thank you! >> Amanda Liu >> >