Hello,

I need help debugging random total system lock-ups.
This is a notebook Acer Aspire V3-572G-78A running Debian Stretch with
the 4.9.0-5-amd64 kernel.

When running on battery (does not happen on AC power), usually after
resuming from RAM, after some rather random time (can be a few minutes to
hours) the system suddenly locks up, the screen freezes, keyboard and the
click-pad don't react, sound keeps playing a ~2 second loop. The computer
does not react to magic SysRq combos (probably because the keyboard doesn't
react), or to pressing the power key. I cannot ping it nor ssh into it. The
notebook appears to stay in this state indefinitely (the screen does not
blank). Only a ~10-sec power-key hold or removing the battery does a hard
reset.

I believe this is a kernel-level lock-up in some hardware driver.
Unfortunately, I haven't been able to find out which one, because the log
files (tried both syslog and journald) contain nothing out of the ordinary
just before the lock-up. Probably the IO locks-up as well.

Netconsole isn't really an easy option, because I cannot reliably reproduce
this in a suitable controlled environment, which is further complicated by
the lack of polling support (required for netconsole) on the wireless
interface.

My suspects:
- The integrated Intel graphics card with the i915 driver: always had
issues with it (on linux-3.16 it used to crash/hang a lot), maybe the gpu
hangs are not properly detected anymore.
- The hard disk sometimes loses APM levels after suspend (have to use
pm_async == 0 to prevent errors after each suspend). Maybe this points to a
larger suspend/power-mgmt issue.
- My iwlwifi interface sometimes crashes and only removing it from the PCI
bus and rescanning for it helps. But this procedure does not hang the whole
system.

Any help, suggestions, pointers will be appreciated.

Regards,
Ondrej G.

Reply via email to