Hi Janusz,
On 2025-07-04 at 15:30:35 +0200, Janusz Krzysztofik wrote:
> In case of soft lockups, it might be helpful from root cause analysis
> perspective to see if the test was still able to complete despite
> triggering the soft lockup warning, or if that soft lockup seems not
> recoverable without killing the test. For that to be possible, igt_runner
> should not kill the test too promptly if a soft lockup related kernel
> taint is detected.
> 
> On kernel taints, igt_runner now decreases per test and inactivity
> timeouts by a factor of 10.  Let it check if the taint is caused by a
> soft lockup and decrease the timeouts only by the factor of 2 in those
> cases.
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzyszto...@linux.intel.com>
> ---
>  runner/executor.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/runner/executor.c b/runner/executor.c
> index 13180a0a46..de9d29d28d 100644
> --- a/runner/executor.c
> +++ b/runner/executor.c
> @@ -871,10 +871,14 @@ static const char *need_to_timeout(struct settings 
> *settings,
>       if (settings->abort_mask & ABORT_TAINT &&
>           is_tainted(taints)) {
>               /* list of timeouts that may postpone immediate kill on taint */
> -             if (settings->per_test_timeout || settings->inactivity_timeout)
> -                     decrease = 10;
> -             else
> +             if (settings->per_test_timeout || settings->inactivity_timeout) 
> {
> +                     if (is_tainted(taints) == (1 << 9) && taints & (1 << 
> 14))

Looks reasonable, imho there should be #define or constants
for those (1 << 9) and (1 << 14), at least for these and maybe
also other bits.

Regards,
Kamil

> +                             decrease = 2;   /* only warn + soft lockup */
> +                     else
> +                             decrease = 10;
> +             } else {
>                       return "Killing the test because the kernel is 
> tainted.\n";
> +             }
>       }
>  
>       if (settings->per_test_timeout != 0 &&
> @@ -1526,8 +1530,9 @@ static int monitor_output(pid_t child,
>                       sigfd = -1; /* we are dying, no signal handling for now 
> */
>               }
>  
> +             igt_kernel_tainted(&taints);
>               timeout_reason = need_to_timeout(settings, killed,
> -                                              igt_kernel_tainted(&taints),
> +                                              taints,
>                                                
> igt_time_elapsed(&time_last_activity, &time_now),
>                                                
> igt_time_elapsed(&time_last_subtest, &time_now),
>                                                igt_time_elapsed(&time_killed, 
> &time_now),
> -- 
> 2.50.0
> 

Reply via email to