So I've had some offline discussion around this, so I'd like to clarify.
SPARK-25344 maybe some non-trivial work to do, as its significant
refactoring.

But can we agree on an *immediate* first step: all new python tests should
go into their own files?  is there some reason to not do that right away?

I understand that in some case, you'll want to add a test case that really
is related to an existing test already in those giant files, and it makes
sense for you to keep them close.   Its fine to decide on a case-by-case
basis whether we should do the relevant refactoring for that relevant bit
at the same or just put it in the same file.  But we should still have this
*goal* in mind, so you should do it in the cases where its really
independent cases.

That avoid us making the problem worse till we get to SPARK-25344, and
furthermore it will allow work on SPARK-25344 to eventually proceed without
never ending merge conflicts with other changes that are also adding new
tests.

On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <iras...@cloudera.com> wrote:

> I filed https://issues.apache.org/jira/browse/SPARK-25344
>
> On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <r...@databricks.com> wrote:
>
>> We should break it.
>>
>> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <iras...@cloudera.com.invalid>
>> wrote:
>>
>>> Hi,
>>>
>>> another question from looking more at python recently.  Is there any
>>> reason we've got a ton of tests in one humongous tests.py file, rather than
>>> breaking it out into smaller files?
>>>
>>> Having one huge file doesn't seem great for code organization, and it
>>> also makes the test parallelization in run-tests.py not work as well.  On
>>> my laptop, tests.py takes 150s, and the next longest test file takes only
>>> 20s.
>>>
>>> can we at least try to put new tests into smaller files?
>>>
>>> thanks,
>>> Imran
>>>
>>

Reply via email to