On 11/03/2025 14.37, Daniel P. Berrangé wrote:
On Tue, Mar 11, 2025 at 11:13:26PM +1000, Nicholas Piggin wrote:
The NetBSD archive is currently failing part-way through downloads,
which results in no clean HTTP error but a short transfer and checksum
error. This is treated as fatal in the precache download, and it halts
an entire set of tests even if some others could run.

I hacked up this patch to get a bunch of CI tests going again for ppc
merge testing.

Don't treat any precaching failures as errors.
This causes tests to be skipped when they try to fetch their asset.
Some CI results before/after patching:

functional-system-fedora
https://gitlab.com/npiggin/qemu/-/jobs/9370860490 #bad
https://gitlab.com/npiggin/qemu/-/jobs/9373246826 #good

functional-system-debian
https://gitlab.com/npiggin/qemu/-/jobs/9370860479 #bda
https://gitlab.com/npiggin/qemu/-/jobs/9373246822 #good

This is making the tests skip. Is there a way to make the error more
prominent / obvious in the output? Should they fail instead? I think
there should be a more obvious indication of failure due to asset so
it does not go unnoticed.

Signed-off-by: Nicholas Piggin <npig...@gmail.com>
---
  tests/functional/qemu_test/asset.py | 9 +++------
  1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tests/functional/qemu_test/asset.py 
b/tests/functional/qemu_test/asset.py
index f0730695f09..3134ccb10da 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -174,14 +174,11 @@ def precache_test(test):
                  try:
                      asset.fetch()
                  except HTTPError as e:
-                    # Treat 404 as fatal, since it is highly likely to
-                    # indicate a broken test rather than a transient
-                    # server or networking problem
-                    if e.code == 404:
-                        raise
-

Why are you removing this ? The commit above does not make any reference
to the problem being a missing URL (404 code). We want missing URLs to
be fatal so that we identify when images we rely on are deleted by their
host, as that is not a transient problem.

                      log.debug(f"HTTP error {e.code} from {asset.url} " +
                                "skipping asset precache")
+                except:
+                    log.debug(f"Error from {asset.url} " +
+                              "skipping asset precache")

So is the bit that actually deals with the exception you show in the
jobs above.

Best practice would be for us to define an 'AssetException' and use that
in assert.py when raising exceptions, or to wrap other exceptions in cases
where we propagate exceptions. Then this code can be move tailored to
catch AssetException, instead of Exception.

At least we should distinguish between "HTTP server bailed out early" (in which case we should likely skip the test), and "checksum of the asset does not match" in which case we should rather fail the test since this is a hard error that needs to be tackled if the file has been changed on the server (otherwise this would go unnoticed and the test will never be run).

 Thomas


Reply via email to