Re: search for better regtest comparison algorithm

Michael Käppler Sun, 28 Jul 2024 13:45:27 -0700

Hi Werner et al.,
I am thinking about a different approach, knowing, that I am likely
missing some details...


I would like to bring up the question if it is really necessary that we
do all testing
"end-to-end", i.e. from input ly code to pixel-based graphical output.
IIUC, we have an intermediate graphical language, consisting of the
various stencil commands,
that is mostly backend-agnostic. (modulo such things like `\image` or
`\postscript`)

What if we resuscitate the SCM backend in a different form, say,
outputting all stencils in serialized form
as JSON or XML and first compare these files?
Only if there is a change there we would render and compare the actual
images.
A change like a missing object would immediately be noticed, regardless
of how small the optical
effect is.

Complementary, we would test all backends extensively that they output
all stencils in exactly the intended way.
If both test phases pass, I think we could have a equally, if not higher
probability compared to the current state
that everything (tm) works well.

I am aware that this can only work if we have *very* good coverage for
these backend tests, especially
where the drawing commands rely on some internal state in the backend
and are thus not independently testable from
the other ones.

This would also take into account the fact that the correlation between
the severity of a problem and the
amount of graphical change - regardless how it is eventually measured -
is very hard to define.
A small change like a missing symbol or number can be a sign of a severe
problem, while a big change, e.g.,
a changed line break may simply be the effect of a changed default
margin setting.

A challenge could be that there is no 1:1 mapping from stencils to a
particular graphical output, or, in other words,
there are many - if not infinitely many - ways to achieve the same
graphical output with different stencil combinations.
But even a change in the order of stencil commands (like Carl and you
noticed some months ago) is likely something
we would like to notice during testing. So maybe this is more of an
advantage.

IMHO, such an approach would also scale better - with the current
approach we have low coverage e.g. for the SVG backend.
To improve this, we would have to test all files in all backends, don't
we? Let's imagine we add an MusicXML backend...

Excited to hear your thoughts!
Michael


Am 27.07.2024 um 21:35 schrieb Werner LEMBERG:

Yes, I might be a moment due to friends visiting and such, but
definitely can.

Great!

Could you get me going pointing me to a few image pairs and an
indication (like you did on SE) of the defect you see?

The example I gave on SE is *the* example – a small object of about
the size of a note head that appears, disappears, or is slightly moved
while the rest of the image stays the same.

Python/NumPy/SciPy ok, yes?

For a demo, yes, of course.

I'll pick a random version to show the concept, and then we can
adapt to whatever dependencies lily is comfortable with.
Importantly: is it ok to make lilypond's test suite depend on NumPy?
It's not a small package (although it is easy and rock-solid to
install)

Let's see what you come up with – and thanks in advance!  Python is ok
since the testing script (`output-distance.py`) is written in Python,
too.  As you say, a dependency on NumPy shouldn't be too much of a
problem, but Jonas can evaluate this better than me.


     Werner

Re: search for better regtest comparison algorithm

Reply via email to