On 08/13/2016 09:32 AM, Oleg Nesterov wrote:
On 08/12, Bart Van Assche wrote:
before I started testing. It took some time
before I could reproduce the hang in truncate_inode_pages_range().

all I can say this contradicts with the previous testing results with
my previous patch or with your change in abort_exclusive_wait().

Hello Oleg,

My opinion is that all this means is that we do not yet have a full understanding of what is going on.

BTW, I have improved my page lock owner instrumentation patch such that it prints a call stack of the lock owner if lock_page() takes too long. The following call stack was reported:

__lock_page / pid 8549 / m 0x2: timeout - continuing to wait for 8549
  [<ffffffff8102b316>] save_stack_trace+0x26/0x50
  [<ffffffff81152bee>] add_to_page_cache_lru+0x7e/0x170
  [<ffffffff8121bfc5>] mpage_readpages+0xc5/0x170
  [<ffffffff81215548>] blkdev_readpages+0x18/0x20
  [<ffffffff81163a68>] __do_page_cache_readahead+0x268/0x310
  [<ffffffff811640a8>] force_page_cache_readahead+0xa8/0x100
  [<ffffffff81164139>] page_cache_sync_readahead+0x39/0x40
  [<ffffffff81153967>] generic_file_read_iter+0x707/0x920
  [<ffffffff81215920>] blkdev_read_iter+0x30/0x40
  [<ffffffff811d4b4b>] __vfs_read+0xbb/0x130
  [<ffffffff811d4f31>] vfs_read+0x91/0x130
  [<ffffffff811d62b4>] SyS_read+0x44/0xa0
  [<ffffffff816281e5>] entry_SYSCALL_64_fastpath+0x18/0xa8

My understanding of mpage_readpages() is that the page unlock happens after readahead I/O completed (see also page_endio()). So this probably means that an I/O request submitted because of readahead code did not get completed. I will see whether I can find anything that's wrong in the block layer.

Bart.

Reply via email to