On 13/05/2021 04:56, osstest service owner wrote:
> flight 161917 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/161917/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-arm64-arm64-examine      8 reboot                   fail REGR. vs. 
> 161898
>  test-arm64-arm64-xl-thunderx  8 xen-boot                 fail REGR. vs. 
> 161898
>  test-arm64-arm64-xl-credit1   8 xen-boot                 fail REGR. vs. 
> 161898
>  test-arm64-arm64-xl-credit2   8 xen-boot                 fail REGR. vs. 
> 161898
>  test-arm64-arm64-xl           8 xen-boot                 fail REGR. vs. 
> 161898

I reported these on IRC, and Julien/Stefano have already committed a fix.

> Tests which are failing intermittently (not blocking):
>  test-xtf-amd64-amd64-3 92 xtf/test-pv32pae-xsa-286 fail in 161909 pass in 
> 161917

While noticing the ARM issue above, I also spotted this one by chance. 
There are two issues.

First, I have reverted bed7e6cad30 and edcfce55917.  The XTF test is
correct, and they really do reintroduce XSA-286.  It is a miracle of
timing that we don't need an XSA/CVE against Xen 4.15.

Given that I was unhappy with the changes in the first place, I don't
particularly want to see an attempt to resurrect them.  I did not find
the claim that they were a perf improvement in the first place very
convincing, and the XTF test demonstrates that the reasoning about their
safety was incorrect.


Second, the unexplained OSSTest behaviour.

When I repro'd this on pinot1, test-pv32pae-xsa-286 failing was totally
deterministic and repeatable (I tried 100 times because the test is a
fraction of a second).

>From the log trawling which Ian already did, the first recorded failure
was flight 160912 on April 11th.  All failures (12, but this number is a
few flights old now) were on pinot*.

What would be interesting to see is whether there have been any passes
on pinot since 160912.

I can't see any reason why the test would be reliable for me, but
unreliable for OSSTest, so I'm wondering whether it is actually
reliable, and something is wrong with the stickiness heuristic.

~Andrew


Reply via email to