Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-11 Thread Mel Gorman
On Fri, Jan 11, 2013 at 12:51:05AM +, Eric Wong wrote: > Mel Gorman wrote: > > mm: compaction: Partially revert capture of suitable high-order page > > > > > Reported-by: Eric Wong > > Cc: sta...@vger.kernel.org > > Signed-off-by: Mel Gorman > > Thanks, my original use case and test wor

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Wong
Mel Gorman wrote: > mm: compaction: Partially revert capture of suitable high-order page > Reported-by: Eric Wong > Cc: sta...@vger.kernel.org > Signed-off-by: Mel Gorman Thanks, my original use case and test works great after several hours! Tested-by: Eric Wong Unfortunately, I also hi

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Dumazet
On Thu, 2013-01-10 at 19:42 +, Mel Gorman wrote: > Thanks Eric, it's much appreciated. However, I'm still very much in favour > of a partial revert as in retrospect the implementation of capture took the > wrong approach. Could you confirm the following patch works for you? > It's should funct

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Wong
Mel Gorman wrote: > Thanks Eric, it's much appreciated. However, I'm still very much in favour > of a partial revert as in retrospect the implementation of capture took the > wrong approach. Could you confirm the following patch works for you? > It's should functionally have the same effect as the

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Mel Gorman
On Thu, Jan 10, 2013 at 09:25:11AM +, Eric Wong wrote: > Mel Gorman wrote: > > page->pfmemalloc can be left set for captured pages so try this but as > > capture is rarely used I'm strongly favouring a partial revert even if > > this works for you. I haven't reproduced this using your workload

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-10 Thread Eric Wong
Mel Gorman wrote: > page->pfmemalloc can be left set for captured pages so try this but as > capture is rarely used I'm strongly favouring a partial revert even if > this works for you. I haven't reproduced this using your workload yet > but I have found that high-order allocation stress tests for

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Eric Wong
Mel Gorman wrote: > When I looked at it for long enough I found a number of problems. Most > affect timing but two serious issues are in there. One affects how long > kswapd spends compacting versus reclaiming and the other increases lock > contention meaning that async compaction can abort early.

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Mel Gorman
On Wed, Jan 09, 2013 at 01:37:46PM +, Mel Gorman wrote: > On Tue, Jan 08, 2013 at 11:23:25PM +, Eric Wong wrote: > > Mel Gorman wrote: > > > Please try the following patch. However, even if it works the benefit of > > > capture may be so marginal that partially reverting it and simplifying

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Mel Gorman
On Tue, Jan 08, 2013 at 06:32:29PM -0800, Eric Dumazet wrote: > On Tue, 2013-01-08 at 18:14 -0800, Eric Dumazet wrote: > > On Tue, 2013-01-08 at 23:23 +, Eric Wong wrote: > > > Mel Gorman wrote: > > > > Please try the following patch. However, even if it works the benefit of > > > > capture ma

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Mel Gorman
On Tue, Jan 08, 2013 at 11:23:25PM +, Eric Wong wrote: > Mel Gorman wrote: > > Please try the following patch. However, even if it works the benefit of > > capture may be so marginal that partially reverting it and simplifying > > compaction.c is the better decision. > > I already got my VM s

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Eric Wong
Eric Wong wrote: > Oops, I had to restart my test :x. However, I was able to reproduce the > issue very quickly again with your patch. I've double-checked I'm > booting into the correct kernel, but I do have more load on this > laptop host now, so maybe that made it happen more quickly... Oops,

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-09 Thread Eric Wong
Eric Wong wrote: > Eric Dumazet wrote: > > On Tue, 2013-01-08 at 18:32 -0800, Eric Dumazet wrote: > > > Hmm, it seems sk_filter() can return -ENOMEM because skb has the > > > pfmemalloc() set. > > > > > > > > One TCP socket keeps retransmitting an SKB via loopback, and TCP stack > > > drops th

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Wong
Eric Dumazet wrote: > On Tue, 2013-01-08 at 18:32 -0800, Eric Dumazet wrote: > > Hmm, it seems sk_filter() can return -ENOMEM because skb has the > > pfmemalloc() set. > > > > > One TCP socket keeps retransmitting an SKB via loopback, and TCP stack > > drops the packet again and again. > > soc

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Dumazet
On Tue, 2013-01-08 at 18:32 -0800, Eric Dumazet wrote: > > Hmm, it seems sk_filter() can return -ENOMEM because skb has the > pfmemalloc() set. > > One TCP socket keeps retransmitting an SKB via loopback, and TCP stack > drops the packet again and again. sock_init_data() sets sk->sk_allocatio

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Dumazet
On Tue, 2013-01-08 at 18:14 -0800, Eric Dumazet wrote: > On Tue, 2013-01-08 at 23:23 +, Eric Wong wrote: > > Mel Gorman wrote: > > > Please try the following patch. However, even if it works the benefit of > > > capture may be so marginal that partially reverting it and simplifying > > > compa

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Dumazet
On Tue, 2013-01-08 at 23:23 +, Eric Wong wrote: > Mel Gorman wrote: > > Please try the following patch. However, even if it works the benefit of > > capture may be so marginal that partially reverting it and simplifying > > compaction.c is the better decision. > > I already got my VM stuck on

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Wong
Mel Gorman wrote: > Please try the following patch. However, even if it works the benefit of > capture may be so marginal that partially reverting it and simplifying > compaction.c is the better decision. I already got my VM stuck on this one. I had two twosleepy instances, 2774 was the one that

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Mel Gorman
On Mon, Jan 07, 2013 at 10:38:50PM +, Eric Wong wrote: > Mel Gorman wrote: > > Right now it's difficult to see how the capture could be the source of > > this bug but I'm not ruling it out either so try the following (untested > > but should be ok) patch. It's not a proper revert, it just dis

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-08 Thread Eric Wong
Eric Wong wrote: > Mel Gorman wrote: > > Right now it's difficult to see how the capture could be the source of > > this bug but I'm not ruling it out either so try the following (untested > > but should be ok) patch. It's not a proper revert, it just disables the > > capture page logic to see i

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-07 Thread Eric Wong
Eric Dumazet wrote: > It would not surprise me if sk_stream_wait_memory() have plain bug(s) or > race(s). > > In 2010, in commit 482964e56e132 Nagendra Tomar fixed a pretty severe > long standing bug. > > This path is not taken very often on most machines. > > I would try the following patch :

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-07 Thread Eric Wong
Mel Gorman wrote: > Right now it's difficult to see how the capture could be the source of > this bug but I'm not ruling it out either so try the following (untested > but should be ok) patch. It's not a proper revert, it just disables the > capture page logic to see if it's at fault. Things loo

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-07 Thread Eric Dumazet
On Mon, 2013-01-07 at 12:25 +, Mel Gorman wrote: > > > ===> 28014[28017]/stack <=== > > [] release_sock+0xe5/0x11b > > [] sk_stream_wait_memory+0x1f7/0x1fc > > [] autoremove_wake_function+0x0/0x2a > > [] tcp_sendmsg+0x710/0x86d > > [] sock_sendmsg+0x7b/0x93 > > [] sys_sendto+0xee/0x145 > > []

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-07 Thread Mel Gorman
On Sun, Jan 06, 2013 at 12:07:00PM +, Eric Wong wrote: > Mel Gorman wrote: > > Using a 3.7.1 or 3.8-rc2 kernel, can you reproduce the problem and then > > answer the following questions please? > > This is on my main machine running 3.8-rc2 > > > 1. What are the contents of /proc/vmstat at t

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-06 Thread Eric Wong
Mel Gorman wrote: > Using a 3.7.1 or 3.8-rc2 kernel, can you reproduce the problem and then > answer the following questions please? This is on my main machine running 3.8-rc2 > 1. What are the contents of /proc/vmstat at the time it is stuck? ===> /proc/vmstat <=== nr_free_pages 40305 nr_inact

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Eric Wong
Mel Gorman wrote: > On Wed, Jan 02, 2013 at 08:08:48PM +, Eric Wong wrote: > > Instead, I disabled THP+compaction under v3.7.1 and I've been unable to > > reproduce the issue without THP+compaction. > > > > Implying that it's stuck in compaction somewhere. It could be the case > that compact

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Eric Wong
Mel Gorman wrote: > On Wed, Jan 02, 2013 at 08:08:48PM +, Eric Wong wrote: > > Instead, I disabled THP+compaction under v3.7.1 and I've been unable to > > reproduce the issue without THP+compaction. > > > > Implying that it's stuck in compaction somewhere. It could be the case > that compact

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Eric Dumazet
On Fri, 2013-01-04 at 16:01 +, Mel Gorman wrote: > Implying that it's stuck in compaction somewhere. It could be the case > that compaction alters timing enough to trigger another bug. You say it > tests differently depending on whether TCP or unix sockets are used > which might indicate multi

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-04 Thread Mel Gorman
On Wed, Jan 02, 2013 at 08:08:48PM +, Eric Wong wrote: > (changing Cc:) > > Eric Wong wrote: > > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > > local TCP socket. The isolated code below can reproduces the issue > > after many minutes (<1 hour). It might be easier to

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Wong wrote: > Eric Wong wrote: > > I think this requires frequent dirtying/cycling of pages to reproduce. > > (from copying large files around) to interact with compaction. > > I'll see if I can reproduce the issue with read-only FS activity. > > Still successfully running the read-only tes

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Wong wrote: > I think this requires frequent dirtying/cycling of pages to reproduce. > (from copying large files around) to interact with compaction. > I'll see if I can reproduce the issue with read-only FS activity. Still successfully running the read-only test on my main machine, will pro

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Wong wrote: > Eric Dumazet wrote: > > With the following patch, I cant reproduce the 'apparent stuck' > > Right, the output is just an approximation and the logic there > was bogus. > > Thanks for looking at this. I'm still able to reproduce the issue under v3.8-rc2 with your patch for to

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Wong
Eric Dumazet wrote: > On Wed, 2013-01-02 at 20:47 +, Eric Wong wrote: > > Eric Wong wrote: > > > [1] my full setup is very strange. > > > > > > Other than the FUSE component I forgot to mention, little depends on > > > the kernel. With all this, the standalone toosleepy can get stuc

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-03 Thread Eric Dumazet
On Wed, 2013-01-02 at 20:47 +, Eric Wong wrote: > Eric Wong wrote: > > [1] my full setup is very strange. > > > > Other than the FUSE component I forgot to mention, little depends on > > the kernel. With all this, the standalone toosleepy can get stuck. > > I'll try to reproduce

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-02 Thread Eric Wong
Eric Wong wrote: > [1] my full setup is very strange. > > Other than the FUSE component I forgot to mention, little depends on > the kernel. With all this, the standalone toosleepy can get stuck. > I'll try to reproduce it with less... I just confirmed my toosleepy processes will ge

Re: ppoll() stuck on POLLIN while TCP peer is sending

2013-01-02 Thread Eric Wong
(changing Cc:) Eric Wong wrote: > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > local TCP socket. The isolated code below can reproduces the issue > after many minutes (<1 hour). It might be easier to reproduce on > a busy system while disk I/O is happening. s/might be/

Re: ppoll() stuck on POLLIN while TCP peer is sending

2012-12-29 Thread Eric Wong
Eric Wong wrote: > Eric Wong wrote: > > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > > local TCP socket. The isolated code below can reproduces the issue > > after many minutes (<1 hour). It might be easier to reproduce on > > a busy system while disk I/O is happening.

Re: ppoll() stuck on POLLIN while TCP peer is sending

2012-12-27 Thread Eric Wong
Eric Wong wrote: > I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a > local TCP socket. The isolated code below can reproduces the issue > after many minutes (<1 hour). It might be easier to reproduce on > a busy system while disk I/O is happening. Ugh, I can't seem to reprod