https://bugs.freedesktop.org/show_bug.cgi?id=111747

Petri Latvala <petri.latv...@intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|IGT                         |DRM/Intel
           Assignee|dri-devel@lists.freedesktop |intel-gfx-bugs@lists.freede
                   |.org                        |sktop.org
           Priority|not set                     |medium
           Severity|not set                     |normal
         QA Contact|                            |intel-gfx-bugs@lists.freede
                   |                            |sktop.org
      i915 features|GEM/Other                   |CI Infra

--- Comment #15 from Petri Latvala <petri.latv...@intel.com> ---
Happens to TGL in 5 / 16 runs (31.2%), last seen in: the previous build.

(I mention TGL since this bug seems to be for the TGL occurrences but it can
happen to any machine)

User impact for this issue in particular is N/A since it's a CI issue. However,
having incompletes reduces the coverage for any test that doesn't get run due
to this so potentially very dire. It doesn't happen at 100% regularity though,
and happens for arbitrary tests so coverage loss is not entirely up to the
potential cap.

What happens here is

1) Jenkins connects to DUT through ssh and launches tests
2) Jenkins loses ssh connection
3) The Jenkins job for executing the test finishes, because the ssh command
completed
4) At the end of finishing a test, a reboot-and-collect job is executed
5) The reboot-and-collect job connects through ssh and reboots the machine

The remote reboot job got a logging step added, tests that die due to the
reboot command prematurely invoked get a log entry in dmesg stating power.sh is
taking this machine down. From that we can determine that network didn't
completely die, just the ssh connection.

There is a plan to solve this. igt_runner will be changed to expose an AF_LOCAL
socket for outside control, and the Jenkins job for executing tests will then
no longer be required to maintain an ssh connection active for the duration of
the whole test round. Instead tests will be launched in the background (with
screen or tmux or just nohup) and the Jenkins job will reconnect the ssh
connection when/if it fails and check through igt_runner's control channel if a
test is still running.

Moving this bug to CI infra.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to