Re: categorizing export tests

Kornel Benko Thu, 26 Nov 2015 16:02:56 -0800

Am Donnerstag, 26. November 2015 um 22:18:18, schrieb Guenter Milde 
<mi...@users.sf.net>
> On 2015-11-26, Kornel Benko wrote:
> > Am 26. November 2015 um 11:23:46, schrieb Guenter Milde <mi...@users.sf.net>
> 
> >> The following proposal for an export test case categorisation tries to
> >> avoid the controversial terms "inverted/reverted", "suspended", and
> >> "ignored".
> 
> >> Instead, the basic distinction is between "good" tests and "known 
> >> problems".
> 
> >> While the concept of "known problems" matches roughly to "inverted", there
> >> are some differences:
> 
> >> * tests with "known problems" usually fail, but may also pass.
> 
> >> * a line 
> 
> >>     KNOWN_PROBLEM.<subtag>.export/...
> 
> >>   is easier to understand than
> 
> >>     INVERTED-SEE-README.export/...
> 
> > Hm, yes. But the first entries are not in any subcategory (no subtag).
> 
> Even without subtag, 
> 
>   KNOWN_PROBLEM.export/...
> 
> >>   is easier to understand than
> 
> >>     INVERTED-SEE-README.export/...
> 
> ;-)
> 
> ...
> 
> >> * There is no need for a top-level category "unreliable".
> 
> > I added it to please you ... :(
> 
> Ach so. I thought it was to allow an easier description:


No, I really took your proposal from the mail mentioning unreliable and added 
the description
later.

> \begin_layout Description
> -nonstandard In primary sense such test means "requires non-standard resources
> - (LaTeX packages and document classes, fonts, ...
> - that are not a requirement for running this test suite".
> +nonstandard Requires non-standard resources (LaTeX packages and document
> + classes, fonts, ...) that are not a requirement for running this test suite.
>  \end_layout
>  
>  \begin_deeper
>  \begin_layout Standard
> -In a wider sense, it is currently used also for "not to be expected to 
> succeed
> - on every site that runs this test suite".
> - This wider definition includes tests that have "arbitrary" result depending
> - on local configuration, OS, TeX distribution, package versions, or the
> - phase of the moon.
> -\end_layout
> -
> -\begin_layout Standard
>  These tests are labelled as 
>  \family typewriter
>  
>  
> ...
> 
> Unreliable test cases are test cases with a known problem. The correct,
> full hierarchy would be
> 
>   * known problems
>     ...
>     - unreliable
>       · nonstandard
>       · erratic
>     
> If we do not want 3 levels with subsubcategories, we can just remove the
> level "unreliable" add its subcategories below "known_problems":
> 
>   * known problems
>     ...
>     - nonstandard
>     - erratic

I don't like known problems. We know, there is a problem, but don't know 
exactly which (and if it is
the only one)

> or use "unreliable" on the same level as "known problems":
>   
>   * known problems
>     ...
>   * unreliable
>     - nonstandard
>     - erratic
> 
> whatever suits you more.    
> 


I am open to implement something that fits with ctest. Some remarks to its 
functionality below. 


> >> Export Test Categorisation
> >> --------------------------
> 
> >> To get a feel for the severity of a known problem, it makes sense to 
> >> sort known problems in sub-categories, e.g.
> 
> 
> >> * TODO            # problems we want to solve but currently cannot.
> 
> >> * minor           # problems that may be eventually solved
> 
> >> * wontfix         # LyX problems with cases so special we decided to 
> >>                   # leave them, or LaTeX problems that 
> >>                   # - can't be solved due to systematic limitations, or
> >>                   # - are bugs in "historic" packages no one works on.
> 
> >> * wrong output    # the output is corrupt, LyX should rise an error
> >>                   # but export returns success.
> 
> >> * LaTeX bug       # problems due to LaTeX packages or other "external"
> >>                   # reasons (someone else's problems).
> >>                   # that may be eventually solved
> >>                   # (In this case, the case goes to "unreliable" until
> >>                   # everyone has the version with the fix.)
> 
> >> * nonstandard     # requires packages or other resources that are not on 
> >> CTAN
> >>                   # (some developers may have them installed)
> 
> >> * erratic         # depending on local configuration, OS, TeX distribution,
> >>                   # package versions, or the phase of the moon.
> 
> 
> > Feels good, but who shall categorize?
> 
> This will be a collaborative work.  Normally, this would be done when
> addressing a new "known problem".
> 
> But first we need to agree on and set up the framework.
> 
> Proposal
> ========
> 
> * Rename "inverted" to "known_problem" and the file

problematic?

>   autotests/revertedTests to autotests/problematicTests.
>

For me 'reverted' clearly signals that the testresult is inverted.

>   - in test mode (looking for regressions), the result of these test
>     cases is irrelevant

Yes
 
>   - in maintenance mode, the label should be removed from test cases
>     that pass.
> 
>   This means that `ctest -L export` should not run tests with
>   "known_problems".

Why not? It is already possible to run only non-inverted tests.
E.g. 'ctest -L export -LE reverted'
Make an alias.

>   Running `ctest` (without -L) should rather list the failing tests with
>   "known problems" than the passing ones. This is less confusing.

This is not possible. ctest does not care for labels if called without '-L'

>   Motivation: In "test mode",
>     · a test that fails to fail is no problem (we search regressions),
>     · a test that fails for a "known reason" is recognised as such by its
>       label and can be ignored by the user or a post-processing script.
>   
>   (BTW: in the list of failing tests recently sent by Scott, there were a
>   number of "INVERTED_SEE-README" tests. Does this mean these tests failed
>   or does it mean these tests failed to fail?)

They failed to fail. But they fail here. Therefore I moved them to nonstandard 
(now unreliable).

> * Handle "unreliable" test cases similar to "known problems": create test
>   instances with a telling label (unless they are wontfix):
> 
>   - in test mode (looking for regressions), the result of these test
>     cases is irrelevant
>     
>   - in maintenance mode, the label should be removed from test cases
>     that pass everywhere and every time. 
>     
>   However:
>     The label "unreliable" is an indicator that the test is not "good" if
>     it passes at one site or only one time.

It passes because of 'better' configuration. Scott and I have similar 
platforms, so this is
the only difference.

>     To remove this label, you need confirmation from other developers
>     that the problem is really solved.

Sure.

> * Rename autotests/ignoredTests to autotests/wontfixTests and move
>   "wontfix" problems there.

We do not agree.

> * Rename autotests/suspendedTests to autotests/fragileTests.

Suspended is not fragile. It always fail, and we cannot do anything yet. Maybe 
later, with
new luatex or xetex version.

>   Use this label for all fragile tests, not only inverted ones. 

No, test which always pass are not fragile.

> > The problem is the huge number of tests which do not fail. They are not
> > categorized ATM.
> 
> The idea here is a file autotests/fragileTests with "wide" regular
> expressions, e.g.
> 
>    .*pdf4SystemF
>    .*Math.*
>
> This should apply to all tests, not only the ones with "known problems".   

No, same reason as above.

> Then, when there is a "regression" in one of the fragile tests, it will
> be shown with the "moderating" label "fragile" telling that the reason is
> more likely a surfacing problem with the document or export format than a
> new one.
> 

Such test would go to unreliableTests (or whatever the name will be)

> >> If we want to make sure that no "good fail" is transformed to a
> >> "wrong output" we would need a category "assert fail" and report
> >> export without failure:
> 
> >> * assert fail     # we know the export does not work for a permanent reason
> >>                   # and want to test whether LyX correctly fails
> >>                   # (e.g. pdflatex with a package requiring LuaTeX)
> 
> 
> > That is for later, used in autotests/export.
> > All other lyx-files (but attic) are distributed. Normally we expect
> > them in good shape.
> 
> Yes.
> 

Now to ctest:
Possible selection of test through the testname
        # ctest -R <some regex to select tests> -E <some regex to exclude from 
test>
In this appearance the given labels are irrelevant.
You can give more than one '-R' parameter, but unfortunately they are connected 
with 'or'.

Second selection thorough labels:
        # ctest -L <some regex to select labels> -LE <some regex to exclude 
labels>

You can also mix, so for instance I use
        # ctest -L export -E "xhtml|lyx16"
(notice '-E' instead of '-LE')
to check only tex exports.
To check only tex regressions, you may want to use
        # ctest -L export -E "xhtml|lyx16" -LE reverted

> Günter

        Kornel

signature.asc
Description: This is a digitally signed message part.

Re: categorizing export tests

Reply via email to