Over the weekend, I had a number of occurrences of my laptop becoming
unresponsive for periods of up to several minutes.  gkrellm monitors showed
near 100% system time for both CPUs (and the way X and other userspace programs
were behaving was consistent with them being starved for CPU for extended
periods - at one point gkrellm was unable to update the display for over 4
minutes).  The problem would hit, everything would get sluggish/stop for
anywhere from 15 seconds or so to several minutes - and then it would just
as mysteriously go away for no apparent reason, only to return a bit later...

The only other clue I was able to get was running 'perf top' to see where
the CPU was going - anytime the kernel was in this state, I'd see the top
two entries were consistently delay_tsc and do_raw_read_lock:

 25.29%  [kernel]                                        [k] delay_tsc
 23.47%  [kernel]                                        [k] do_raw_read_lock
  8.01%  [kernel]                                        [k] 
ext4_es_reclaim_extents_count
  7.44%  [kernel]                                        [k] 
ftrace_likely_update
  2.80%  [kernel]                                        [k] lock_release
  2.79%  [kernel]                                        [k] do_raw_read_unlock
  2.65%  [kernel]                                        [k] lock_acquire

 15.33%  [kernel]                                        [k] delay_tsc
 14.02%  [kernel]                                        [k] do_raw_read_lock
  6.80%  libc-2.17.so                                    [.] re_search_internal
  5.55%  libmagic.so.1.0.0                               [.] 0x000000000000af9a
  5.16%  [kernel]                                        [k] 
ftrace_likely_update
  4.79%  [kernel]                                        [k] 
ext4_es_reclaim_extents_count

 22.94%  [kernel]                                        [k] delay_tsc
 19.84%  [kernel]                                        [k] do_raw_read_lock
  6.92%  [kernel]                                        [k] 
ext4_es_reclaim_extents_count
  6.79%  [kernel]                                        [k] 
ftrace_likely_update
  2.62%  libc-2.17.so                                    [.] re_search_internal

I'm not sure why ext4_es-reclaim_extents_count appears a lot - the workload
that was running at the time was seeing some memory pressure due to a large
Firefox process, and a 'find | xargs file' command causing a lot of inode and
data read I/O, but there shouldn't have been a lot of reclaims of extents as
not much was being written anyplace?

I can try to reproduce the issue - any suggestions how to efficiently tell
who/what is doing the contending in do_raw_read_lock?

Attachment: pgprst4VpzUOb.pgp
Description: PGP signature

Reply via email to