Re: categorizing export tests

Guenter Milde Thu, 26 Nov 2015 03:24:56 -0800

On 2015-11-23, Kornel Benko wrote:
> Am 23. November 2015 um 16:38:20, schrieb Guenter Milde <mi...@users.sf.net>


...

> Maybe you could propose different names?

The following proposal for an export test case categorisation tries to avoid
the controversial terms "inverted/reverted", "suspended", and "ignored".

Instead, the basic distinction is between "good" tests and "known problems".

We have to distinguish 2 modi of working:

a) test for regressions

   The results of all tests with "known problems" are irrelevant
   when testing for regressions.

b) maintaining the test suite

   Update the list of known problems. 
   Here, we need to know which test cases with known problems fail or pass.


While the concept of "known problems" matches roughly to "inverted", there
are some differences:

* tests with "known problems" usually fail, but may also pass.

* a line 

    KNOWN_PROBLEM.<subtag>.export/...

  is easier to understand than

    INVERTED-SEE-README.export/...

* There is no top-level category "ignored". 

  Whether test instances are created or not is a feature of the
  subcategories based on practicability.
  
  "known problems.wontfix" is a rough equivalence but we could extempt other
  subcategories, too.

* There is no need for a top-level category "unreliable".

  As we allow test cases with known problems to pass, "nonstandard" and 
  "erratic" can be made sub-categories of "known problem".
  
  (If it eases maintenance, they could, however, also be sorted under
   a top-level "unreliable".)



Export Test Categorisation
--------------------------

Export tests are generated by taking sample documents, eventually modifying
them and calling LyX to export them to a given output format.
This results in $N_{documents} x N_{modifications} x N_{output formats}$
possible combinations (test cases) which can be sorted in two main
categories:

* good                # we expect the export to succeed

* known problems      # export may fail for a known reason

When testing for regressions, test cases with "known problems" can be
ignored. Creating/running tests with "known problems" is not required. We
don't need to know whether they fail or not.
If all "good" tests pass we have reached a clean state, while
"good" test cases that fail require action.


OTOH, to find out if any "known problem" is solved, we need to run the
respective test(s).

This means we have to compromise between resource-efficiency (not running
tests when we are not interested in the result) vs. ease of use (make it
easy to re-check (some) of the tests with "known problems").



To get a feel for the severity of a known problem, it makes sense to 
sort known problems in sub-categories, e.g.


* TODO            # problems we want to solve but currently cannot.

* minor           # problems that may be eventually solved

* wontfix         # LyX problems with cases so special we decided to 
                  # leave them, or LaTeX problems that 
                  # - can't be solved due to systematic limitations, or
                  # - are bugs in "historic" packages no one works on.
                    
* wrong output    # the output is corrupt, LyX should rise an error
                  # but export returns success.

* LaTeX bug       # problems due to LaTeX packages or other "external"
                  # reasons (someone else's problems).
                  # that may be eventually solved
                  # (In this case, the case goes to "unreliable" until
                  # everyone has the version with the fix.)
                    
* nonstandard     # requires packages or other resources that are not on CTAN
                  # (some developers may have them installed)
                    
* erratic         # depending on local configuration, OS, TeX distribution,
                  # package versions, or the phase of the moon.




The following label is independent and can be given in addition to the above
categories.

* fragile           # prone to break down

We could use "generic" regular expressions to give hints about documents or
export routes where simple changes may lead to failure.

This can be 

  - problematic documents that use heavy ERT or preamble code or
    many/unstable packages (e.g. Math.lyx, Additional.lyx).
  
  - poorly supported and seldom used export formats 
    (e.g. XeTeX + TeX-fonts)

If a "fragile" test case fails that formerly was OK, chances are
high that this is not a regression but due to an existing problem.



If we want to make sure that no "good fail" is transformed to a
"wrong output" we would need a category "assert fail" and report
export without failure:

* assert fail     # we know the export does not work for a permanent reason
                  # and want to test whether LyX correctly fails
                  # (e.g. pdflatex with a package requiring LuaTeX)




Günter

Re: categorizing export tests

Reply via email to