Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-25 Thread Mel Gorman
On Wed, Apr 24, 2013 at 03:09:13PM -0400, Jeff Moyer wrote: > Mel Gorman writes: > > >> I'll also note that even though your I/O is going all over the place > >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a > >> struct request allocation or between Queue and Merge: > >> > >

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-24 Thread Jeff Moyer
Mel Gorman writes: >> I'll also note that even though your I/O is going all over the place >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a >> struct request allocation or between Queue and Merge: >> >> All Devices >> >>

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Tue, Apr 23, 2013 at 11:50:19AM -0400, Theodore Ts'o wrote: > On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: > > That's a pretty big drop but it gets bad again for the second worst stall -- > > wait_on_page_bit as a result of generic_file_buffered_write. > > > > Vanilla kernel 133

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Theodore Ts'o
On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: > That's a pretty big drop but it gets bad again for the second worst stall -- > wait_on_page_bit as a result of generic_file_buffered_write. > > Vanilla kernel 1336064 ms stalled with 109 events > Patched kernel 2338781 ms stalled with

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Sat, Apr 20, 2013 at 08:05:22PM -0400, Theodore Ts'o wrote: > An alternate solution which I've been playing around adds buffer_head > flags so we can indicate that a buffer contains metadata and/or should > have I/O submitted with the REQ_PRIO flag set. > I beefed up the reporting slightly and

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: > > 3. The blktrace indicates that reads can starve writes from flusher > > > >While there are people that can look at a blktrace and find problems > >like they are rain man, I'm more like an ADHD squirrel when looking at > >a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Jan Kara
On Mon 22-04-13 18:42:23, Jeff Moyer wrote: > Jan, if I were to come up with a way of promoting a particular async > queue to the front of the line, where would I put such a call in the > ext4/jbd2 code to be effective? As Ted wrote the simplies might be to put his directly in __lock_buffer(). So

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Theodore Ts'o
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: > > Jan, if I were to come up with a way of promoting a particular async > queue to the front of the line, where would I put such a call in the > ext4/jbd2 code to be effective? Well, I thought we had discussed trying to bump a pending I

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Jeff Moyer
Mel Gorman writes: > (Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) Thanks. I've got a few comments and corrections for you below. > TLDR: Flusher writes pages very quickly after processes dirty a buffer. Reads > starve flusher writes. [snip] > 3. The blktrace indi

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Mel Gorman
(Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark buffe

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-20 Thread Theodore Ts'o
As an update to this thread, we brought up this issue at LSF/MM, and there is a thought that we should be able to solve this problem by having lock_buffer() check to see if the buffer is locked due to a write being queued, to have the priority of the write bumped up in the write queues to resolve t

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Dave Chinner
On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark buffer_heads that contain metadata with a flag, so that the > > > flusher thread

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Theodore Ts'o
On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > If that is the case, one possible solution that comes to mind would be > > to mark buffer_heads that contain metadata with a flag, so that the > > flusher thread can write them back at the same priority as reads. > > Ext4 is already

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Tvrtko Ursulin
Hi all, On Thursday 11 April 2013 22:57:08 Theodore Ts'o wrote: > That's an interesting theory. If the workload is one which is very > heavy on reads and writes, that could explain the high latency. That > would explain why those of us who are using primarily SSD's are seeing > the problems, be

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Mel Gorman
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > > I think it might be more enlightening if Mel traced which process in > > which funclion is holding the buffer lock. I suspect we'll find out that > > the flusher thread h

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Mel Gorman
On Thu, Apr 11, 2013 at 02:35:12PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > > If we're stalling on lock_buffer(), that implies that buffer was being > > > written, and for some reason it was taking a very long time to > > > complete. > > > > >

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Dave Chinner
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > > I think it might be more enlightening if Mel traced which process in > > which funclion is holding the buffer lock. I suspect we'll find out that > > the flusher thread h

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > I think it might be more enlightening if Mel traced which process in > which funclion is holding the buffer lock. I suspect we'll find out that > the flusher thread has submitted the buffer for IO as an async write and > thus it takes a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Jan Kara
On Thu 11-04-13 14:35:12, Ted Tso wrote: > On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > > If we're stalling on lock_buffer(), that implies that buffer was being > > > written, and for some reason it was taking a very long time to > > > complete. > > > > > > > Yes. > > > > > It

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > If we're stalling on lock_buffer(), that implies that buffer was being > > written, and for some reason it was taking a very long time to > > complete. > > > > Yes. > > > It might be worthwhile to put a timestamp in struct dm_crypt_

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Mel Gorman
On Wed, Apr 10, 2013 at 09:12:45AM -0400, Theodore Ts'o wrote: > On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote: > > During major activity there is likely to be "good" behaviour > > with stalls roughly every 30 seconds roughly corresponding to > > dirty_expire_centiseconds. As you'd exp

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-10 Thread Theodore Ts'o
On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote: > During major activity there is likely to be "good" behaviour > with stalls roughly every 30 seconds roughly corresponding to > dirty_expire_centiseconds. As you'd expect, the flusher thread is stuck > when this happens. > > 237 ?

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-10 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: > On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote: > > I'm testing a page-reclaim-related series on my laptop that is partially > > aimed at fixing long stalls when doing metadata-intensive operations on > > low memory such a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Theodore Ts'o
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote: > > semantic error: while resolving probe point: identifier 'kprobe' at > > /tmp/stapdjN4_l:18:7 > > source: probe kprobe.function("get_request_wait") > > ^ > > Pass 2: analysis failed. [man error::pas

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Frank Ch. Eigler
Hi, Mel - > > [...] git kernel developers > > should use git systemtap, as has always been the case. [...] > > At one point in the past this used to be the case but then systemtap had to > be compiled as part of automated tests across different kernel versions. It > could have been worked aroun

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Mel Gorman
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote: > > Hi - > > > tytso wrote: > > > So I tried to reproduce the problem, and so I installed systemtap > > (bleeding edge, since otherwise it won't work with development > > kernel), and then rebuilt a kernel with all of the necessa

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-07 Thread Frank Ch. Eigler
Hi - tytso wrote: > So I tried to reproduce the problem, and so I installed systemtap > (bleeding edge, since otherwise it won't work with development > kernel), and then rebuilt a kernel with all of the necessary CONFIG > options enabled: > > CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_REL

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Theodore Ts'o
On Sat, Apr 06, 2013 at 09:29:48AM +0200, Jiri Slaby wrote: > > I'm not sure, as I am using -next like for ever. But sure, there was a > kernel which didn't ahve this problem. Any chance you could try rolling back to 3.2 or 3.5 to see if you can get a starting point? Even a high-level bisection

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 09:37 AM, Jiri Slaby wrote: > On 04/06/2013 09:29 AM, Jiri Slaby wrote: >> On 04/06/2013 01:16 AM, Theodore Ts'o wrote: >>> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but it still sucks. Upd

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 09:29 AM, Jiri Slaby wrote: > On 04/06/2013 01:16 AM, Theodore Ts'o wrote: >> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: >>> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but >>> it still sucks. Updating a kernel in a VM still results in "Your sy

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 01:16 AM, Theodore Ts'o wrote: > On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: >> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but >> it still sucks. Updating a kernel in a VM still results in "Your system >> is too SLOW to play this!" by mplayer a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Theodore Ts'o
On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: > Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but > it still sucks. Updating a kernel in a VM still results in "Your system > is too SLOW to play this!" by mplayer and frame dropping. What was the first kernel wher

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Jiri Slaby
On 04/03/2013 12:19 PM, Mel Gorman wrote: > On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote: >> On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: >>> >>> Can you try 3.9-rc4 or later and see if the problem still persists? >>> There were a number of ext4 issues especially

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Tue, Apr 02, 2013 at 07:16:13PM -0400, Theodore Ts'o wrote: > I've tried doing some quick timing, and if it is a performance > regression, it's not a recent one --- or I haven't been able to > reproduce what Mel is seeing. I tried the following commands while > booted into 3.2, 3.8, and 3.9-rc3

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Wed, Apr 03, 2013 at 08:05:30AM -0400, Theodore Ts'o wrote: > On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote: > > > > I'm running with -rc5 now. I have not noticed much interactivity problems > > as such but the stall detection script reported that mutt stalled for > > 20 seconds op

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Theodore Ts'o
On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote: > > I'm running with -rc5 now. I have not noticed much interactivity problems > as such but the stall detection script reported that mutt stalled for > 20 seconds opening an inbox and imapd blocked for 59 seconds doing path > lookups, ima

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote: > On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: > > > > Can you try 3.9-rc4 or later and see if the problem still persists? > > There were a number of ext4 issues especially around low memory > > performance which were

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
I've tried doing some quick timing, and if it is a performance regression, it's not a recent one --- or I haven't been able to reproduce what Mel is seeing. I tried the following commands while booted into 3.2, 3.8, and 3.9-rc3 kernels: time git clone ... rm .git/index ; time git reset I did thi

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
So I tried to reproduce the problem, and so I installed systemtap (bleeding edge, since otherwise it won't work with development kernel), and then rebuilt a kernel with all of the necessary CONFIG options enabled: CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_RELAY, CONFIG_DEBUG_FS, CO

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:03:36PM +0800, Zheng Liu wrote: > Hi Mel, > > Thanks for reporting it. > > On 04/02/2013 10:27 PM, Mel Gorman wrote: > > I'm testing a page-reclaim-related series on my laptop that is partially > > aimed at fixing long stalls when doing metadata-intensive operations on

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: > > Can you try 3.9-rc4 or later and see if the problem still persists? > There were a number of ext4 issues especially around low memory > performance which weren't resolved until -rc4. Actually, sorry, I took a closer look and I'm n

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote: > I'm testing a page-reclaim-related series on my laptop that is partially > aimed at fixing long stalls when doing metadata-intensive operations on > low memory such as a git checkout. I've been running 3.9-rc2 with the > series applied b

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Zheng Liu
Hi Mel, Thanks for reporting it. On 04/02/2013 10:27 PM, Mel Gorman wrote: > I'm testing a page-reclaim-related series on my laptop that is partially > aimed at fixing long stalls when doing metadata-intensive operations on > low memory such as a git checkout. I've been running 3.9-rc2 with the >

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Jiri Slaby
On 04/02/2013 04:27 PM, Mel Gorman wrote: > I'm testing a page-reclaim-related series on my laptop that is partially > aimed at fixing long stalls when doing metadata-intensive operations on > low memory such as a git checkout. I've been running 3.9-rc2 with the > series applied but found that the