Hi,

I am seeing (what I believe is;) severe process CPU starvation in
2.4.0-prerelease.  At first, I attributed it to semaphore troubles
as when I enable semaphore deadlock detection in IKD and set it to
5 seconds, it triggers 100% of the time on nscd when I do sequential
I/O (iozone eg).  In the meantime, I've done a slew of tracing, and
I think the holder of the semaphore I'm timing out on just flat isn't
being scheduled so it can release it.  In the usual case of nscd, I
_think_ it's another nscd holding the semaphore.  In no trace can I
go back far enough to catch the taker of the semaphore or any user
task other than iozone running between __down() time and timeout 5
seconds later.  (trace buffer covers ~8 seconds of kernel time)

I think the snippet below captures the gist of the problem.

c012f32e  nr_free_pages +<e/4c> (0.16) pid(256)
c012f37a  nr_inactive_clean_pages +<e/44> (0.22) pid(256)
c01377f2  wakeup_bdflush +<12/a0> (0.14) pid(256)
c011620a  wake_up_process +<e/58> (0.29) pid(256)
c012eea4  __alloc_pages_limit +<10/b8> (0.28) pid(256)
c012eea4  __alloc_pages_limit +<10/b8> (0.30) pid(256)
c012e3fa  wakeup_kswapd +<12/d4> (0.25) pid(256)
c0115613  __wake_up +<13/130> (0.41) pid(256)
c011527b  schedule +<13/398> (0.66) pid(256->6)
c01077db  __switch_to +<13/d0> (0.70) pid(6)
c01893c6  generic_unplug_device +<e/38> (0.25) pid(6)
c011527b  schedule +<13/398> (0.50) pid(6->256)
c01077db  __switch_to +<13/d0> (0.29) pid(256)
c012eea4  __alloc_pages_limit +<10/b8> (0.22) pid(256)
c012d267  reclaim_page +<13/408> (0.54) pid(256)
c012679e  __remove_inode_page +<e/74> (0.54) pid(256)
c0126fe0  add_to_page_cache_unique +<10/e4> (0.23) pid(256)
c0126751  add_page_to_hash_queue +<d/4c> (0.16) pid(256)
c012c9fa  lru_cache_add +<e/f4> (0.29) pid(256)
c0153ac5  ext2_prepare_write +<d/28> (0.15) pid(256)
c013697f  block_prepare_write +<f/4c> (0.15) pid(256)
c0136233  __block_prepare_write +<13/214> (0.17) pid(256)
c0135fd4  create_empty_buffers +<10/7c> (0.17) pid(256)
c0135d13  create_buffers +<13/1bc> (0.14) pid(256)
(repeats zillion times)

Despite wakeup_kswapd(0), we never schedule kswapd until much
later when we hit a wakeup_kswapd(1).. in the interrum, we
bounce back and forth betweem bdflush and iozone.. no other
task is scheduled through very many schedules.
(Per kdb, I had a few tasks which would have liked some CPU ;-)

In addition, kswapd is quite the CPU piggie when it's doing
page_launder() as this profiled snippet of one of kswapd's
running periods shows.  (60ms with no schedule)

  0.1093%          66.06       11.01        6 c010d9c2 timer_interrupt
  0.1462%          88.39        0.28      313 c0134ea2 __remove_from_lru_list
  0.1489%          90.02       11.25        8 c01b2077 do_rw_disk
  0.1599%          96.66       32.22        3 c018a9e7 elevator_linus_merge
  0.1624%          98.18        0.30      325 c0115613 __wake_up
  0.2016%         121.88        0.42      290 c012c01f kmem_cache_free
  0.2075%         125.43        0.66      189 c0189f1d end_buffer_io_sync
  0.2089%         126.32       15.79        8 c01ad533 ide_build_sglist
 34.8258%       21054.55        0.53    39486 c0137667 try_to_free_buffers
 62.4743%       37769.90        0.95    39796 c012f2b9 __free_pages
Total entries: 81970  Total usecs:     60456.69 Idle: 0.00%

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to