So I've had some offline discussion around this, so I'd like to clarify. SPARK-25344 maybe some non-trivial work to do, as its significant refactoring.
But can we agree on an *immediate* first step: all new python tests should go into their own files? is there some reason to not do that right away? I understand that in some case, you'll want to add a test case that really is related to an existing test already in those giant files, and it makes sense for you to keep them close. Its fine to decide on a case-by-case basis whether we should do the relevant refactoring for that relevant bit at the same or just put it in the same file. But we should still have this *goal* in mind, so you should do it in the cases where its really independent cases. That avoid us making the problem worse till we get to SPARK-25344, and furthermore it will allow work on SPARK-25344 to eventually proceed without never ending merge conflicts with other changes that are also adding new tests. On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <iras...@cloudera.com> wrote: > I filed https://issues.apache.org/jira/browse/SPARK-25344 > > On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <r...@databricks.com> wrote: > >> We should break it. >> >> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <iras...@cloudera.com.invalid> >> wrote: >> >>> Hi, >>> >>> another question from looking more at python recently. Is there any >>> reason we've got a ton of tests in one humongous tests.py file, rather than >>> breaking it out into smaller files? >>> >>> Having one huge file doesn't seem great for code organization, and it >>> also makes the test parallelization in run-tests.py not work as well. On >>> my laptop, tests.py takes 150s, and the next longest test file takes only >>> 20s. >>> >>> can we at least try to put new tests into smaller files? >>> >>> thanks, >>> Imran >>> >>