Test reporters do often contain some assumptions about the characters in
the test methods. Historically JUnit XML reporters have never sanitised the
method names so XML injection attacks have been fairly trivial. Haven't
tried this for a while.

That whole JUnit XML report "standard" was actually put together in the Ant
project with <Junitreport> doing the postprocessing of the JUnit run. It
was driven by the team's XSL skills than any overreaching strategic goal
about how to present test results of tests which could run for hours and
whose output you would really want to aggregate the locks from multiple
machines and processes and present in awake you can actually navigate. With
hindsight, a key failing is that we chose to store the test summaries (test
count, failure count...) as attributes on the root XML mode. Which is why
the whole DOM gets built up in the JUnit runner. Which is why when that
JUnit process crashes, you get no report at all.

It'd be straightforward to fix -except too much relies on that file
now...important things will break. And the maven runner has historically
never supported custom reporters, to let you experiment with it.

Maybe this is an opportunity to change things.

On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:

> DisplayName looks good in general but actually here I would like first to
> find a existing pattern to document in guidelines given the actual existing
> practice we all are used to. I'm trying to be very conservative since this
> guidelines affect everybody.
>
> I think it might be better to discuss separately if we want to change what
> we have been used to.
>
> Also, using arbitrary names might not be actually free due to such bug
> like https://github.com/apache/spark/pull/25630 . It will need some more
> efforts to investigate as well.
>
> On Fri, 15 Nov 2019, 20:56 Steve Loughran, <ste...@cloudera.com.invalid>
> wrote:
>
>>  Junit5: Display names.
>>
>> Goes all the way to the XML.
>>
>>
>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>>
>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> Should we also add a guideline for non Scala tests? Other languages
>>> (Java, Python, R) don't support using string as a test name.
>>>
>>> Best Regards,
>>> Ryan
>>>
>>>
>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <gurwls...@gmail.com>
>>> wrote:
>>>
>>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>>
>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:
>>>>
>>>>> > In general a test should be self descriptive and I don't think we
>>>>> should be adding JIRA ticket references wholesale. Any action that the
>>>>> reader has to take to understand why a test was introduced is one too 
>>>>> many.
>>>>> However in some cases the thing we are trying to test is very subtle and 
>>>>> in
>>>>> that case a reference to a JIRA ticket might be useful, I do still feel
>>>>> that this should be a backstop and that properly documenting your tests is
>>>>> a much better way of dealing with this.
>>>>>
>>>>> Yeah, the test should be self-descriptive. I don't think adding a JIRA
>>>>> prefix harms this point. Probably I should add this sentence in the
>>>>> guidelines as well.
>>>>> Adding a JIRA prefix just adds one extra hint to track down details. I
>>>>> think it's fine to stick to this practice and make it simpler and clear to
>>>>> follow.
>>>>>
>>>>> > 1. what if multiple JIRA IDs relating to the same test? we just take
>>>>> the very first JIRA ID?
>>>>> Ideally one JIRA should describe one issue and one PR should fix one
>>>>> JIRA with a dedicated test.
>>>>> Yeah, I think I would take the very first JIRA ID.
>>>>>
>>>>> > 2. are we going to have a full scan of all existing tests and attach
>>>>> a JIRA ID to it?
>>>>> Yea, let's don't do this.
>>>>>
>>>>> > It's a nice-to-have, not super essential, just because ...
>>>>> It's been asked multiple times and each committer seems having a
>>>>> different understanding on this.
>>>>> It's not a biggie but wanted to make it clear and conclude this.
>>>>>
>>>>> > I'd add this only when a test specifically targets a certain issue.
>>>>> Yes, so this one I am not sure. From what I heard, people adds the
>>>>> JIRA in cases below:
>>>>>
>>>>> - Whenever the JIRA type is a bug
>>>>> - When a PR adds a couple of tests
>>>>> - Only when a test specifically targets a certain issue.
>>>>> - ...
>>>>>
>>>>> Which one do we prefer and simpler to follow?
>>>>>
>>>>> Or I can combine as below (im gonna reword when I actually document
>>>>> this):
>>>>> 1. In general, we should add a JIRA ID as prefix of a test when a PR
>>>>> targets to fix a specific issue.
>>>>>     In practice, it usually happens when a JIRA type is a bug or a PR
>>>>> adds a couple of tests.
>>>>> 2. Uses "SPARK-XXXX: test name" format
>>>>>
>>>>> If we have no objection with ^, let me go with this.
>>>>>
>>>>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen <sro...@gmail.com>님이 작성:
>>>>>
>>>>>> Let's suggest "SPARK-12345:" but not go back and change a bunch of
>>>>>> test cases.
>>>>>> I'd add this only when a test specifically targets a certain issue.
>>>>>> It's a nice-to-have, not super essential, just because in the rare
>>>>>> case you need to understand why a test asserts something, you can go
>>>>>> back and find what added it in the git history without much trouble.
>>>>>>
>>>>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi all,
>>>>>> >
>>>>>> > Maybe it's not a big deal but it brought some confusions time to
>>>>>> time into Spark dev and community. I think it's time to discuss about
>>>>>> when/which format to add a JIRA ID as a prefix for the test case name in
>>>>>> Scala test cases.
>>>>>> >
>>>>>> > Currently we have many test case names with prefixes as below:
>>>>>> >
>>>>>> > test("SPARK-XXXXX blah blah")
>>>>>> > test("SPARK-XXXXX: blah blah")
>>>>>> > test("SPARK-XXXXX - blah blah")
>>>>>> > test("[SPARK-XXXXX] blah blah")
>>>>>> > …
>>>>>> >
>>>>>> > It is a good practice to have the JIRA ID in general because, for
>>>>>> instance,
>>>>>> > it makes us put less efforts to track commit histories (or even
>>>>>> when the files
>>>>>> > are totally moved), or to track related information of tests failed.
>>>>>> > Considering Spark's getting big, I think it's good to document.
>>>>>> >
>>>>>> > I would like to suggest this and document it in our guideline:
>>>>>> >
>>>>>> > 1. Add a prefix into a test name when a PR adds a couple of tests.
>>>>>> > 2. Uses "SPARK-XXXX: test name" format which is used in our code
>>>>>> base most
>>>>>> >       often[1].
>>>>>> >
>>>>>> > We should make it simple and clear but closer to the actual
>>>>>> practice. So, I would like to listen to what other people think. I would
>>>>>> appreciate if you guys give some feedback about when to add the JIRA
>>>>>> prefix. One alternative is that, we only add the prefix when the JIRA's
>>>>>> type is bug.
>>>>>> >
>>>>>> > [1]
>>>>>> > git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>>>>> >      923
>>>>>> > git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>>>>> >      477
>>>>>> > git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>>>>> >       16
>>>>>> > git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>>>>> >       13
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>

Reply via email to