Hi Shuah, 
Thanks for the review. I will send a v2 soon. 
Yifei

> On Sep 23, 2024, at 8:37 AM, Shuah Khan <sk...@linuxfoundation.org> wrote:
> 
> On 9/20/24 11:52, Yifei Liu wrote:
>> Hi Shuah,
>>> On Sep 20, 2024, at 8:07 AM, Shuah Khan <sk...@linuxfoundation.org> wrote:
>>> 
>>> On 9/19/24 17:09, Yifei Liu wrote:
>>>> Hi Shuah,
>>>>> On Sep 19, 2024, at 11:36 AM, Shuah Khan <sk...@linuxfoundation.org> 
>>>>> wrote:
>>>>> 
>>>>> On 9/17/24 16:48, Yifei Liu wrote:
>>>>>> We recently notice that the step_after_suspend_test would
>>>>>> fail on our plenty devices.  The test believesit failed to
>>>>> 
>>>>> What are "plenty devices"
>>>>> recently noticed?
>>>> We have seen this issue on more than one platforms, including bare metal 
>>>> device and virtual machine.
>>>>> Typo - believesit?
>>>> Yes, missing a space here. It should be
>>>> The test believes itself ffailed to enter suspend state.
>>>> Thank you for pointing out.
>>>>> 
>>>>>> enter suspend state with
>>>>>> $ sudo ./step_after_suspend_test
>>>>>> TAP version 13
>>>>>> Bail out! Failed to enter Suspend state
>>>>>> However, in the kernel message, I indeed see the system get
>>>>>> suspended and then wake up later.
>>>>>> [611172.033108] PM: suspend entry (s2idle)
>>>>>> [611172.044940] Filesystems sync: 0.006 seconds
>>>>>> [611172.052254] Freezing user space processes
>>>>>> [611172.059319] Freezing user space processes completed (elapsed 0.001 
>>>>>> seconds)
>>>>>> [611172.067920] OOM killer disabled.
>>>>>> [611172.072465] Freezing remaining freezable tasks
>>>>>> [611172.080332] Freezing remaining freezable tasks completed (elapsed 
>>>>>> 0.001 seconds)
>>>>>> [611172.089724] printk: Suspending console(s) (use no_console_suspend to 
>>>>>> debug)
>>>>>> [611172.117126] serial 00:03: disabled
>>>>>> --- some other hardware get reconnected ---
>>>>>> [611203.136277] OOM killer enabled.
>>>>>> [611203.140637] Restarting tasks ...
>>>>>> [611203.141135] usb 1-8.1: USB disconnect, device number 7
>>>>>> [611203.141755] done.
>>>>>> [611203.155268] random: crng reseeded on system resumption
>>>>>> [611203.162059] PM: suspend exit
>>>>>> After investigation, I notice that for the code block
>>>>>> if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
>>>>>> ksft_exit_fail_msg("Failed to enter Suspend state\n");
>>>>>> The write will return -1 and errno is set to 16 (device busy).
>>>>>> It should be caused by the write function is not successfully returned
>>>>>> before the system suspend and the return value get messed when waking up.
>>>>>> As a result, It may be better to check the time passed of those few 
>>>>>> instructions
>>>>>> to determine whether the suspend is executed correctly for it is pretty 
>>>>>> hard to
>>>>>> execute those few lines for 4 seconds, or even more if it is not long 
>>>>>> enough.
>>>>> 
>>>>> I don't think this is the right fix. Can you change this to do echo 
>>>>> instead.
>>>>> It does the same thing, but it goes through sysfs interface instead of 
>>>>> direct
>>>>> write:
>>>>> 
>>>>> ret = system("echo mem > /sys/power/state”);
>>>> Sure, I can do that.
>>>>> 
>>>>>> Fixes: bfd092b8c2728 ("selftests: breakpoint: add 
>>>>>> step_after_suspend_test")
>>>>>> Reported-by: Sinadin Shan <sinadin.s...@oracle.com>
>>>>>> Signed-off-by: Yifei Liu <yifei.l....@oracle.com>
>>>>>> ---
>>>>>>  .../selftests/breakpoints/step_after_suspend_test.c      | 9 +++++++--
>>>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>> diff --git 
>>>>>> a/tools/testing/selftests/breakpoints/step_after_suspend_test.c 
>>>>>> b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
>>>>>> index dfec31fb9b30d..d615f091e5bae 100644
>>>>>> --- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
>>>>>> +++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
>>>>>> @@ -18,6 +18,7 @@
>>>>>>  #include <sys/timerfd.h>
>>>>>>  #include <sys/types.h>
>>>>>>  #include <sys/wait.h>
>>>>>> +#include <time.h>
>>>>>>    #include "../kselftest.h"
>>>>>>  @@ -133,6 +134,7 @@ void suspend(void)
>>>>>>   int timerfd;
>>>>>>   int err;
>>>>>>   struct itimerspec spec = {};
>>>>>> + clock_t t;
>>>>>>     if (getuid() != 0)
>>>>>>   ksft_exit_skip("Please run the test as root - Exiting.\n");
>>>>>> @@ -152,8 +154,11 @@ void suspend(void)
>>>>>>   if (err < 0)
>>>>>>   ksft_exit_fail_msg("timerfd_settime() failed\n");
>>>>> 
>>>>> I don't think you will need to add clock() code. timerfd_settime()
>>>>> sets the time for 5 seconds and you can simply extend the alarm
>>>>> time.
>>>>> 
>>>>> There needs to be some logic to check timer elapse and poll the
>>>>> timer_fd
>>>> Yes, it is also a choice to check the remaining time of the timer. We 
>>>> could use timerfd_gettime() to get the remaining time. The timer would not 
>>>> rearm because the it_interval value is not set (set to 0). Then if the 
>>>> remaining time has both 0 or seconds and nano-seconds (tv_sec and 
>>>> tv_nsec), it should point out the time goes for at least 5 seconds. If the 
>>>> system fails to enter suspend state and waked up by the timer, it should 
>>>> not take 5 whole seconds or more to get to the check line.
>>> 
>>> Right, You will have to check the remaining time. There is an example in
>>> tools/testing/selftests/timers/alarmtimer-suspend.c you can refer to.
>> I checked the remaining value, if the suspend succeed, it would be all 0 (0s 
>> and 0 ns). If it failed to enter suspend by any chance, it would be a 
>> non-zero value.
>>> 
>>>> Would you prefer this methods?
>>> 
>>> I am curious what happens with starting suspend with sysfs first and then
>>> explore checking elapsed timer.
>>> 
>>> system("echo mem > /sys/power/state”);
>> I checked this, and find that it will give a similar failure.
>> sh: line 1: echo: write error: Device or resource busy
>> The console command will get its return result after the suspend and wake 
>> up. Maybe we could do
>> system("echo mem > /sys/power/state >/dev/nul”);
> 
> Yes. The idea is that we go through the sysfs interface as opposed
> to writing to the file directly.
> 
>> To avoid an error message in the console. And we are checking the success of 
>> suspend by the time, so it would not be necessary for us to see the output 
>> of this console command.
> 
> Sounds good to me.
> 
> thanks,
> -- Shuah


Reply via email to