On 2015-12-01, Kornel Benko wrote:
> Am 30. November 2015 um 12:56:22, schrieb Guenter Milde <mi...@users.sf.net>
>> On 2015-11-28, Kornel Benko wrote:

Thanks for explaining the filter chain. The table in Development.lyx gives 
a quite good overview. I still believe there is room for improvement in
both, the actual test setup and naming as well as the documentation.
(BTW: I reordered the headings in Development.lyx and pushed to git.)


First, a general question,

When tagging tests that fail:

Do we want tags for 

a) the reason of failure (TexBug, LyxBug, incompatibility, nonstandard, ...)

b) our treatment of the test case (ignored, inverted, suspended, ...), or

c) both?


Currently, 
* the filter files have names from b), 
* labels are a mixture of test-class (export, load, lyx2lyx), b), and c):

#> ctest --print-label
Test project /usr/local/src/lyxbuild
All Labels:
  chemgreek
  cmplyx
  erratic
  examples
  export
  layout
  load
  lyx2lyx
  manuals
  mathmacros
  module
  nonstandard
  reverted
  roundtrip
  suspended
  templates
  unreliable

For clarity, we could give them some structure, e.g.

  cmplyx
  examples
  export
  export:reverted
  export:reverted:chemgreek  # actually, I wouldn't make "chemgreek" a label
  export:suspended
  export:unreliable:erratic
  export:unreliable:nonstandard
  layout
  load
  lyx2lyx
  manuals
  mathmacros
  module
  roundtrip
  templates


Selection by label uses regular expressions, so instead of 

    ctest -L export -LE reverted

one could use

    ctest -L "export$"

to test only for new failures.

........................................................................

When creating the tests, I propose the following logic:

1. filter: sort out test cases to ignore

   ignoredTests -> Export combinations matching here are withdrawn

2. categorization: label the remaining tests

   a) *all* export tests get the label "export" to make them easy to
      distinguish from lyx2lyx, roundtrip, mathmacros, ...
      
   b) Set specific labels: test names are matched against regexps in
      all "*Tests" files.

      Matches regexp in  Label       Sublabels
      =================  ==========  ========================================
      InvertedTests¹     inverted²   wontfix, minor, TeXBug, assertError, ...
      UnreliableTests    unreliable  nonstandard, ...        
      SuspendedTests¹    suspended

      A test matching a regexp in UnreliableTests and
      InvertedTests, say, would get the labels "unreliable" and "inverted".
      
Combining this with the proposal above, there would not be independent
sublabels but one label, e.g. "export:inverted:TeXBug" or 
"export:unreliable:nonstandard".

¹ see also the discussion below.

² tests with label "inverted" will also get the test property "inverted", of
  course.      



.....................................................................

Choice of names:


suspicious
==========

>> I am not happy with the naming "suspicious" (we know for sure there is a
>> problem and the test fail).

> > > I don't like "known problems". We know, there is a problem,
> > > but don't know exactly which (and if it is the only one)

However, we not only suspect a problem: 
We only insert a regexp here after establishing there is a problem we cannot
currently solve.

We could also say "known to fail" or "acknowledged problem" or just
"problematic". 


inverted/reverted
=================

> For me 'reverted' clearly signals that the testresult is inverted.

Actually, reverted has a related but somewhat different meaning,
e.g. inversion of argument $\ne$ reversion of argument:

  to revert = turn back, undo, ... (revert a commit), 
  to invert = turn round, turn upside down, negate, ...

Hence, using "revert" for inverted tests is confusing (at least for me).

Is there a reason the label cannot be called "inverted"?



.....................................................................


>> I am not happy with the naming "suspended" (what is its use case?)¹

> This one are tests which are failing, but we cannot do anything on them.
> See .*pdf4_texF

We actually solved a lot of "INV.*(dvi3|pdf4|pdf5)_texF" and there is
more that could be done (however it may be not worth the effort).

But what is this label usefull for?  

The effect of this label is that if a "suspended" test works again, there
is no feedback when running inverted export tests. Anything else?
I don't see the advantage.

For problems that are unlikely to be fixed soon, I propose one of

  inverted:minor    # problems that may be eventually solved
  inverted:wontfix  # problems that 
                    # - minor and are hard to solve (not worth the effort),
                    # - can't be solved due to systematic limitations, or
                    # - are bugs in "historic" packages no one works on.

However, not with "catchall" regular expressions but on a casewise basis,
and with explaining comment. This could be implemented via sublabels.


tests with low signal/noise ratio
================================

The export with Unicode-aware TeX engine but TeX-fonts
(dvi3|pdf4|pdf5)_texF is rather an example for a class of "unreliable"
tests -- however, the unreliability is not "fail/pass depending on the
site or the phase of the moon". 
The inverted test cases fail for every developer running the tests, but 
> Many other tests with the same signature are not failing.

Rather, export success depends on the document content:
Adding or removing one character (which depends on some autoloaded
package), can make a compilable document uncompilable and vice
versa. Also, the probability of hidden problems (erroneous output despite
non-failing export) is high.

This means that "(dvi3|pdf4|pdf5)_texF"-tests have a low signal/noise
ratio. The statement
 
  The answer is that if a non-default route is broken it is often because
  a bug was introduced in LyX and not because a document-specific change
  was made that is not supported by the route. In other words, there is a
  high signal/noise ratio in the export tests for some non-default
  formats. 

in Development.lyx does not hold for them.

The questions are

Do we want a label for all tests with low signal/noise ratio?

How to name this label?

How should we handle tests carrying this label?


Thanks,
Günter

Reply via email to