IMHO, one thing we should consider before this is, refactoring the PySpark tests all to make them separate pairs for main codes. Now, we put all those unit tests into few several files, which makes hard to follow the tests.
2018년 8월 31일 (금) 오후 2:05, Felix Cheung <felixcheun...@hotmail.com>님이 작성: > +1 on what Li said. > > And +1 on getting more coverage in unit tests - however often times we > omit python unit tests deliberately if the python “wrapper” is trivial. > This is what I’ve learned over the years from the previous pyspark > maintainers. Admittedly gaps are there. > > > ------------------------------ > *From:* Imran Rashid <iras...@cloudera.com.invalid> > *Sent:* Wednesday, August 29, 2018 1:42 PM > *To:* ice.xell...@gmail.com > *Cc:* dev > *Subject:* Re: [DISCUSS] move away from python doctests > > (Also, maybe there are already good unit tests, and I just don't know > where to find them, as Bryan Cutler pointed out for the bit of code I was > originally asking about.) > > On Wed, Aug 29, 2018 at 3:26 PM Imran Rashid <iras...@cloudera.com> wrote: > >> Hi Li, >> >> yes that makes perfect sense. That more-or-less is the same as my view, >> though I framed it differently. I guess in that case, I'm really asking: >> >> Can pyspark changes please be accompanied by more unit tests, and not >> assume we're getting coverage from doctests? >> >> Imran >> >> On Wed, Aug 29, 2018 at 2:02 PM Li Jin <ice.xell...@gmail.com> wrote: >> >>> Hi Imran, >>> >>> My understanding is that doctests and unittests are orthogonal - >>> doctests are used to make sure docstring examples are correct and are not >>> meant to replace unittests. >>> Functionalities are covered by unit tests to ensure correctness and >>> doctests are used to test the docstring, not the functionalities itself. >>> >>> There are issues with doctests, for example, we cannot test arrow >>> related functions in doctest because of pyarrow is optional dependency, but >>> I think that's a separate issue. >>> >>> Does this make sense? >>> >>> Li >>> >>> On Wed, Aug 29, 2018 at 6:35 PM Imran Rashid >>> <iras...@cloudera.com.invalid> wrote: >>> >>>> Hi, >>>> >>>> I'd like to propose that we move away from such heavy reliance on >>>> doctests in python, and move towards more traditional unit tests. The main >>>> reason is that its hard to share test code in doc tests. For example, I >>>> was just looking at >>>> >>>> https://github.com/apache/spark/commit/82c18c240a6913a917df3b55cc5e22649561c4dd >>>> and wondering if we had any tests for some of the pyspark changes. >>>> SparkSession.createDataFrame has doctests, but those are just run with one >>>> standard spark configuration, which does not enable arrow. Its hard to >>>> easily reuse that test, just with another spark context with a different >>>> conf. Similarly I've wondered about reusing test cases but with >>>> local-cluster instead of local mode. I feel like they also discourage >>>> writing a test which tries to get more exhaustive coverage on corner cases. >>>> >>>> I'm not saying we should stop using doctests -- I see why they're >>>> nice. I just think they should really only be when you want that code >>>> snippet in the doc anyway, so you might as well test it. >>>> >>>> Admittedly, I'm not really a python-developer, so I could be totally >>>> wrong about the right way to author doctests -- pushback welcome! >>>> >>>> Thoughts? >>>> >>>> thanks, >>>> Imran >>>> >>>