Re: categorizing export tests

Guenter Milde Mon, 23 Nov 2015 08:40:00 -0800

On 2015-11-23, Kornel Benko wrote:
> Am Montag, 23. November 2015 um 13:09:40, schrieb Guenter Milde 
> <mi...@users.sf.net>
>> On 2015-11-22, Kornel Benko wrote:


>> > We apparently don't understand each other.

>> Indeed, there is a misunderstanding, but I believe to be a bit wiser now.
>> Please correct me where I am still wrong.

I see we still have not reached understanding. Another try.


>> > For me, ignored test is ignored, that is, the test is not even created.

>> > The ctest machinery (not our usage of it) runs _all_ created tests with
>> > 'ctest' (without any parameter).

>> For me, a "test case" or "potential test" is some a combination of

>>  - document
>>  - output format
>>  - scripted changes (like systemF: "set the used font to non-TeX-fonts")

>> The general rule:

>>   Create a "test instance" for
>>   * every document matching  lib/(doc|templates|examples)/*.lyx!

> All but ignored.

>>   * with every output format (dvi.?|ps|pdf.?|html)

> Only if the output format is 'default'. (html + lyx12 are always included)

>>   * with system Fonts and TeX Fonts (for dvi3|pdf4|pdf5)

> Yes.

>> expands to a list of "potential tests" where either

>>  a) expect successful export,
>>  b) expect an export error, or
>>  c) don't care (because whether there is an export error does not
>>     correlate with a (new) problem in LyX)

>> Case a) is the default.
>> Exceptions are defined via regular expressions in

>>  b) development/autotests/revertedTests
>>  c) development/autotests/ignoredTests


>> The usual test run after a commit should detect regressions and therefore
>> run all "potential tests" in a) and b) but not c).
>> This means it must not "create a test" for for the "to be ignored
>> possible tests" in c).

>> Up to here, I hope we can agree.

> Partly. I made the comments where I think it should be more precise.

Which of the follwing is true?

a) whether a combination of document, output format, and scripted changes
   is tested or ignored is decided at configuration time,

b) this cannot be changed (add or skip certain combinations)
   when running the test suite,

c) it is not possible to list "ignored tests"?



>> Now to the differences. I hope we can get them sorted out -- maybe this are
>> also just misunderstandings?


>> On 2015-11-20, Kornel Benko wrote:
>> > Am 20. November 2015 um 11:20:54, schrieb Guenter Milde 
>> > <mi...@users.sf.net>

>> ...
>> >> * ignored

...

>> >>   - nonstandard  # requires packages or similar that are not on CTAN
>> >    Do not ignore them. They _are_ compilable at the end.

>> >>   - suspended    # - non-LyX bugs that may be resolved (works depending on
>> >>                  #   TeXLive version).
>> >>                  # - problems that we currently cannot solve but want to.

>> >    Yes, but not ignore.


>> Here, we do have 2 categories where the suggestion to ignore them was met by
>> the objection

>> > To make it clear: Everything ignored cannot be tested.
>> > If we want to see, if anything changed (like XeTeX), we should be able
>> > to retest.

>> My response that I consider this limitation a fundamental flaw in the test
>> machinery because

>> >> We need a category and rule-set for tests where:

>> >> * we don't care for the result because it does not tell us anything about
>> >>   the "healthiness" of LyX and hence don't run them normally, but

>> >> * we may want to run them on special request (because we know the phase of
>> >>   the moon or have installed a special package or want to check the status
>> >>   of upstream packages or fixed a nofix bug).

>> was answered with

>> > In this case we can just compare the before results with the results
>> > after clearing the ignoredTests file.

>> Well, then it should be possible to

>> * have regular expressions for "potential tests" that meet the criteria
>>   for "suspended" or "nonstandard" in development/autotests/ignoredTests

> You probably mean development/autotests/revertedTests here.

> suspended is subset of reverted (and therefore inverted).
> nonstandard is subset of all exports (but ignored)

I want suspended tests to be *ignored*, because running
them does not tell us anything about regressions.
It doesn't make sense to expect a defined result here.

IMO, suspended test should become a subset of "ignored" - not run normally
but only on special request.

> Nonstandard test should not be part of reverted.
> (and cannot be, even if regex from revertedTests signals otherwise)
> It doesn't make sense to expect defined result here.

True.

> I must confess, that I don't understand what you mean with "potential
> tests". You can always select what you want to test.

> For instance 'ctest -R texF' tests everything using tex fonts, independent of
> if the test is part of suspended, reverted or nonstandard.

> So after update of, say luatex, we could use 'ctest -R "dvi3|pdf5"' to
> check if any relevant tests changed its compilation result.

Does this mean, with 'ctest -R texF' you can also test *ignored* tests?

How does this relate to "Everything ignored cannot be tested."? 

..........................

>> My vision for the output of a "normal" test run would be

>> * report all tests that fail our expectations, i.e.

>>   - in category a) if there is an export error
>>   - in category b) if there is no export error

>>   ideally with error message for a).

> That is not possible. All messages go to
> Testing/Temporary/LastTest.log. One has to extract them from there
> after the test(s) is/are run.

OK. But it should be possible to do this extraction in a post-processing
script or a wrapper.

>>   These failing tests would be regressions calling for action
>>   (solve the problem or suspend the test case).

> Yes.


>> * optionally, report "suspended" tests

> Yes, suspended/reverted tests are waiting for success.

Not all reverted tests are waiting for success. There are combinations of
document + output format where we must ensure that export fails (because
we know for sure it cannot work and otherwise this would be a hidden
failure).

>>   This would be a summary of postponed TODO items.

> You didn't mention tests which works on some systems but not on others.

I don't want any output about them after a "normal" test run.

Günter

Re: categorizing export tests

Reply via email to