Re: [Qemu-devel] [PATCH 1/2] Acceptance tests: exclude "flaky" tests

Wainer dos Santos Moschetta Fri, 28 Jun 2019 13:43:53 -0700


On 06/21/2019 11:38 AM, Cleber Rosa wrote:

On Fri, Jun 21, 2019 at 09:03:33AM +0200, Philippe Mathieu-Daudé wrote:

On 6/21/19 8:09 AM, Cleber Rosa wrote:

It's a fact that some tests may not be 100% reliable in all
environments.  While it's a tough call to remove a useful test that
from the tree because it may fail every 1/100th time (or so), having
human attention drawn to known issues is very bad for humans and for
the projects they manage.

As a compromise solution, this marks tests that are known to have
issues, or that exercises known issues in QEMU or other components,
and excludes them from the entry point.  As a consequence, tests
marked as "flaky" will not be executed as part of "make
check-acceptance".

Because such tests should be forgiven but never be forgotten, it's
possible to list them with (assuming "make check-venv" or "make
check-acceptance" has already initiatilized the venv):

   $ ./tests/venv/bin/avocado list -t flaky tests/acceptance

It needs a Make target to run those flaky tests (If we ever agree onthis idea of flaky tests). Other Avocado flags are passed (e.g. -t fortags) that can happen to fail tests on their absent. One clear exampleis the spice test on patch 02 of this series...

Side note: check-acceptance seems to get growing in complexity that Iworry will end up in pitfalls. is a Make target the proper way toimplement complex test runs (I don't think so). Perhaps Avocado runnerconcept could help somehow?


The current list of tests marked as flaky are a result of running
the entire set of acceptance tests around 20 times.  The results
were then processed with a helper script[1].  That either confirmed
known issues (in the case of aarch64 and arm)[2] or revealed new
ones (mips).

This also bumps the Avocado version to one that includes a fix to the
parsing of multiple and mix "key:val" and simple tag values.

[1] 
https://raw.githubusercontent.com/avocado-framework/avocado/master/contrib/scripts/summarize-job-failures.py
[2] https://bugs.launchpad.net/qemu/+bug/1829779

Signed-off-by: Cleber Rosa <cr...@redhat.com>
---
  docs/devel/testing.rst                   | 17 +++++++++++++++++
  tests/Makefile.include                   |  6 +++++-
  tests/acceptance/boot_linux_console.py   |  2 ++
  tests/acceptance/linux_ssh_mips_malta.py |  2 ++
  tests/requirements.txt                   |  2 +-
  5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index da2d0fc964..ff4d8e2e1c 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -574,6 +574,23 @@ may be invoked by running:

tests/venv/bin/avocado run $OPTION1 $OPTION2 tests/acceptance/+Tagging tests

+-------------
+
+flaky
+~~~~~
+
+If a test is known to fail intermittently, even if only every one
+hundredth time, it's highly advisable to mark it as a flaky test.
+This will prevent these individual tests from failing much larger
+jobs, will avoid human interaction and time wasted to verify a known
+issue, and worse of all, can lead to the discredit of automated
+testing.
+
+To mark a test as flaky, add to its docstring.::
+
+  :avocado: tags=flaky

I certainly disagree with this patch, failing tests have to be fixed.
Why not tag all the codebase flaky and sing "happy coding"?

That's a great idea! :)

Now, seriously, I also resisted this for quite a long time.  The
reality, though, is that intermittent failures will continue to
appear, and letting tests (and jobs, and CI pipelines, and whatnot)
fail is a very bad idea.  We all agree that real fixes are better than
this, but many times they don't come quickly.

It seems to me that flaky test is just a case in a broaden scenario: run(or not) grouped tests. You may have tests indeed broken or that takesconsiderable time (those tagged "slow") which one may fairly want toexclude from `make check-acceptance` as well. Thus some way to grouptests plus define run inclusion/exclusion patterns seems the ultimategoal here.

Anyway if this get accepted, 'flaky' tags must have the intermittent
failure well described, and a Launchpad/Bugzilla tracking ticket referenced.

And here you have a key point that I absolutely agree with.  The
"flaky" approach can either poison a lot of tests, and be seen as
quick way out of a difficult issue revealed by a test.  Or, it can
serve as an effective tool to keep track of these very important
issues.

If we add:

    # https://bugs.launchpad.net/qemu/+bug/1829779
    :avocado: flaky

Topped with some human, I believe this can be very effective.  This goes
without saying, but comments here are very much welcome.

I agree that all flaky test should have a tracking bug. In the end itrepresents a technical debit that we should address.


- Wainer


- Cleber.

+
  Manual Installation
  -------------------

diff --git a/tests/Makefile.include b/tests/Makefile.include

index db750dd6d0..4c97da2878 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -1125,7 +1125,11 @@ TESTS_RESULTS_DIR=$(BUILD_DIR)/tests/results
  # Any number of command separated loggers are accepted.  For more
  # information please refer to "avocado --help".
  AVOCADO_SHOW=app
-AVOCADO_TAGS=$(patsubst %-softmmu,-t arch:%, $(filter 
%-softmmu,$(TARGET_DIRS)))
+
+# Additional tags that are added to each occurence of "--filter-by-tags"
+AVOCADO_EXTRA_TAGS := ,-flaky
+
+AVOCADO_TAGS=$(patsubst 
%-softmmu,--filter-by-tags=arch:%$(AVOCADO_EXTRA_TAGS), $(filter 
%-softmmu,$(TARGET_DIRS)))

ifneq ($(findstring v2,"v$(PYTHON_VERSION)"),v2)

  $(TESTS_VENV_DIR): $(TESTS_VENV_REQ)
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 32159503e9..6bd5c1ab53 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -249,6 +249,7 @@ class BootLinuxConsole(Test):
          """
          :avocado: tags=arch:aarch64
          :avocado: tags=machine:virt
+        :avocado: tags=flaky
          """
          kernel_url = ('https://download.fedoraproject.org/pub/fedora/linux/'
                        
'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz')
@@ -270,6 +271,7 @@ class BootLinuxConsole(Test):
          """
          :avocado: tags=arch:arm
          :avocado: tags=machine:virt
+        :avocado: tags=flaky
          """
          kernel_url = ('https://download.fedoraproject.org/pub/fedora/linux/'
                        
'releases/29/Everything/armhfp/os/images/pxeboot/vmlinuz')
diff --git a/tests/acceptance/linux_ssh_mips_malta.py 
b/tests/acceptance/linux_ssh_mips_malta.py
index aafb0c39f6..ae70b658e0 100644
--- a/tests/acceptance/linux_ssh_mips_malta.py
+++ b/tests/acceptance/linux_ssh_mips_malta.py
@@ -208,6 +208,7 @@ class LinuxSSH(Test):
          :avocado: tags=machine:malta
          :avocado: tags=endian:big
          :avocado: tags=device:pcnet32
+        :avocado: tags=flaky
          """
          kernel_url = ('https://people.debian.org/~aurel32/qemu/mips/'
                        'vmlinux-3.2.0-4-5kc-malta')
@@ -222,6 +223,7 @@ class LinuxSSH(Test):
          :avocado: tags=machine:malta
          :avocado: tags=endian:little
          :avocado: tags=device:pcnet32
+        :avocado: tags=flaky
          """
          kernel_url = ('https://people.debian.org/~aurel32/qemu/mipsel/'
                        'vmlinux-3.2.0-4-5kc-malta')
diff --git a/tests/requirements.txt b/tests/requirements.txt
index 3ae0e29ad7..58d63d171f 100644
--- a/tests/requirements.txt
+++ b/tests/requirements.txt
@@ -1,5 +1,5 @@
  # Add Python module requirements, one per line, to be installed
  # in the tests/venv Python virtual environment. For more info,
  # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
-avocado-framework==68.0
+avocado-framework==69.1
  paramiko

Re: [Qemu-devel] [PATCH 1/2] Acceptance tests: exclude "flaky" tests

Reply via email to