On Wed, Apr 24, 2013 at 03:09:13PM -0400, Jeff Moyer wrote:
> Mel Gorman writes:
>
> >> I'll also note that even though your I/O is going all over the place
> >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a
> >> struct request allocation or between Queue and Merge:
> >>
> >
Mel Gorman writes:
>> I'll also note that even though your I/O is going all over the place
>> (D2C is pretty bad, 14ms), most of the time is spent waiting for a
>> struct request allocation or between Queue and Merge:
>>
>> All Devices
>>
>>
On Tue, Apr 23, 2013 at 11:50:19AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote:
> > That's a pretty big drop but it gets bad again for the second worst stall --
> > wait_on_page_bit as a result of generic_file_buffered_write.
> >
> > Vanilla kernel 133
On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote:
> That's a pretty big drop but it gets bad again for the second worst stall --
> wait_on_page_bit as a result of generic_file_buffered_write.
>
> Vanilla kernel 1336064 ms stalled with 109 events
> Patched kernel 2338781 ms stalled with
On Sat, Apr 20, 2013 at 08:05:22PM -0400, Theodore Ts'o wrote:
> An alternate solution which I've been playing around adds buffer_head
> flags so we can indicate that a buffer contains metadata and/or should
> have I/O submitted with the REQ_PRIO flag set.
>
I beefed up the reporting slightly and
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote:
> > 3. The blktrace indicates that reads can starve writes from flusher
> >
> >While there are people that can look at a blktrace and find problems
> >like they are rain man, I'm more like an ADHD squirrel when looking at
> >a
On Mon 22-04-13 18:42:23, Jeff Moyer wrote:
> Jan, if I were to come up with a way of promoting a particular async
> queue to the front of the line, where would I put such a call in the
> ext4/jbd2 code to be effective?
As Ted wrote the simplies might be to put his directly in
__lock_buffer(). So
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote:
>
> Jan, if I were to come up with a way of promoting a particular async
> queue to the front of the line, where would I put such a call in the
> ext4/jbd2 code to be effective?
Well, I thought we had discussed trying to bump a pending I
Mel Gorman writes:
> (Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace)
Thanks. I've got a few comments and corrections for you below.
> TLDR: Flusher writes pages very quickly after processes dirty a buffer. Reads
> starve flusher writes.
[snip]
> 3. The blktrace indi
(Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace)
On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote:
> On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote:
> > > If that is the case, one possible solution that comes to mind would be
> > > to mark buffe
As an update to this thread, we brought up this issue at LSF/MM, and
there is a thought that we should be able to solve this problem by
having lock_buffer() check to see if the buffer is locked due to a
write being queued, to have the priority of the write bumped up in the
write queues to resolve t
On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote:
> On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote:
> > > If that is the case, one possible solution that comes to mind would be
> > > to mark buffer_heads that contain metadata with a flag, so that the
> > > flusher thread
On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote:
> > If that is the case, one possible solution that comes to mind would be
> > to mark buffer_heads that contain metadata with a flag, so that the
> > flusher thread can write them back at the same priority as reads.
>
> Ext4 is already
Hi all,
On Thursday 11 April 2013 22:57:08 Theodore Ts'o wrote:
> That's an interesting theory. If the workload is one which is very
> heavy on reads and writes, that could explain the high latency. That
> would explain why those of us who are using primarily SSD's are seeing
> the problems, be
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote:
> On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> > I think it might be more enlightening if Mel traced which process in
> > which funclion is holding the buffer lock. I suspect we'll find out that
> > the flusher thread h
On Thu, Apr 11, 2013 at 02:35:12PM -0400, Theodore Ts'o wrote:
> On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote:
> > > If we're stalling on lock_buffer(), that implies that buffer was being
> > > written, and for some reason it was taking a very long time to
> > > complete.
> > >
> >
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote:
> On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> > I think it might be more enlightening if Mel traced which process in
> > which funclion is holding the buffer lock. I suspect we'll find out that
> > the flusher thread h
On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> I think it might be more enlightening if Mel traced which process in
> which funclion is holding the buffer lock. I suspect we'll find out that
> the flusher thread has submitted the buffer for IO as an async write and
> thus it takes a
On Thu 11-04-13 14:35:12, Ted Tso wrote:
> On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote:
> > > If we're stalling on lock_buffer(), that implies that buffer was being
> > > written, and for some reason it was taking a very long time to
> > > complete.
> > >
> >
> > Yes.
> >
> > > It
On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote:
> > If we're stalling on lock_buffer(), that implies that buffer was being
> > written, and for some reason it was taking a very long time to
> > complete.
> >
>
> Yes.
>
> > It might be worthwhile to put a timestamp in struct dm_crypt_
On Wed, Apr 10, 2013 at 09:12:45AM -0400, Theodore Ts'o wrote:
> On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote:
> > During major activity there is likely to be "good" behaviour
> > with stalls roughly every 30 seconds roughly corresponding to
> > dirty_expire_centiseconds. As you'd exp
On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote:
> During major activity there is likely to be "good" behaviour
> with stalls roughly every 30 seconds roughly corresponding to
> dirty_expire_centiseconds. As you'd expect, the flusher thread is stuck
> when this happens.
>
> 237 ?
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote:
> > I'm testing a page-reclaim-related series on my laptop that is partially
> > aimed at fixing long stalls when doing metadata-intensive operations on
> > low memory such a
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote:
> > semantic error: while resolving probe point: identifier 'kprobe' at
> > /tmp/stapdjN4_l:18:7
> > source: probe kprobe.function("get_request_wait")
> > ^
> > Pass 2: analysis failed. [man error::pas
Hi, Mel -
> > [...] git kernel developers
> > should use git systemtap, as has always been the case. [...]
>
> At one point in the past this used to be the case but then systemtap had to
> be compiled as part of automated tests across different kernel versions. It
> could have been worked aroun
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote:
>
> Hi -
>
>
> tytso wrote:
>
> > So I tried to reproduce the problem, and so I installed systemtap
> > (bleeding edge, since otherwise it won't work with development
> > kernel), and then rebuilt a kernel with all of the necessa
Hi -
tytso wrote:
> So I tried to reproduce the problem, and so I installed systemtap
> (bleeding edge, since otherwise it won't work with development
> kernel), and then rebuilt a kernel with all of the necessary CONFIG
> options enabled:
>
> CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_REL
On Sat, Apr 06, 2013 at 09:29:48AM +0200, Jiri Slaby wrote:
>
> I'm not sure, as I am using -next like for ever. But sure, there was a
> kernel which didn't ahve this problem.
Any chance you could try rolling back to 3.2 or 3.5 to see if you can
get a starting point? Even a high-level bisection
On 04/06/2013 09:37 AM, Jiri Slaby wrote:
> On 04/06/2013 09:29 AM, Jiri Slaby wrote:
>> On 04/06/2013 01:16 AM, Theodore Ts'o wrote:
>>> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote:
Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but
it still sucks. Upd
On 04/06/2013 09:29 AM, Jiri Slaby wrote:
> On 04/06/2013 01:16 AM, Theodore Ts'o wrote:
>> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote:
>>> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but
>>> it still sucks. Updating a kernel in a VM still results in "Your sy
On 04/06/2013 01:16 AM, Theodore Ts'o wrote:
> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote:
>> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but
>> it still sucks. Updating a kernel in a VM still results in "Your system
>> is too SLOW to play this!" by mplayer a
On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote:
> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but
> it still sucks. Updating a kernel in a VM still results in "Your system
> is too SLOW to play this!" by mplayer and frame dropping.
What was the first kernel wher
On 04/03/2013 12:19 PM, Mel Gorman wrote:
> On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote:
>> On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote:
>>>
>>> Can you try 3.9-rc4 or later and see if the problem still persists?
>>> There were a number of ext4 issues especially
On Tue, Apr 02, 2013 at 07:16:13PM -0400, Theodore Ts'o wrote:
> I've tried doing some quick timing, and if it is a performance
> regression, it's not a recent one --- or I haven't been able to
> reproduce what Mel is seeing. I tried the following commands while
> booted into 3.2, 3.8, and 3.9-rc3
On Wed, Apr 03, 2013 at 08:05:30AM -0400, Theodore Ts'o wrote:
> On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote:
> >
> > I'm running with -rc5 now. I have not noticed much interactivity problems
> > as such but the stall detection script reported that mutt stalled for
> > 20 seconds op
On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote:
>
> I'm running with -rc5 now. I have not noticed much interactivity problems
> as such but the stall detection script reported that mutt stalled for
> 20 seconds opening an inbox and imapd blocked for 59 seconds doing path
> lookups, ima
On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote:
> >
> > Can you try 3.9-rc4 or later and see if the problem still persists?
> > There were a number of ext4 issues especially around low memory
> > performance which were
I've tried doing some quick timing, and if it is a performance
regression, it's not a recent one --- or I haven't been able to
reproduce what Mel is seeing. I tried the following commands while
booted into 3.2, 3.8, and 3.9-rc3 kernels:
time git clone ...
rm .git/index ; time git reset
I did thi
So I tried to reproduce the problem, and so I installed systemtap
(bleeding edge, since otherwise it won't work with development
kernel), and then rebuilt a kernel with all of the necessary CONFIG
options enabled:
CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_RELAY, CONFIG_DEBUG_FS,
CO
On Tue, Apr 02, 2013 at 11:03:36PM +0800, Zheng Liu wrote:
> Hi Mel,
>
> Thanks for reporting it.
>
> On 04/02/2013 10:27 PM, Mel Gorman wrote:
> > I'm testing a page-reclaim-related series on my laptop that is partially
> > aimed at fixing long stalls when doing metadata-intensive operations on
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote:
>
> Can you try 3.9-rc4 or later and see if the problem still persists?
> There were a number of ext4 issues especially around low memory
> performance which weren't resolved until -rc4.
Actually, sorry, I took a closer look and I'm n
On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote:
> I'm testing a page-reclaim-related series on my laptop that is partially
> aimed at fixing long stalls when doing metadata-intensive operations on
> low memory such as a git checkout. I've been running 3.9-rc2 with the
> series applied b
Hi Mel,
Thanks for reporting it.
On 04/02/2013 10:27 PM, Mel Gorman wrote:
> I'm testing a page-reclaim-related series on my laptop that is partially
> aimed at fixing long stalls when doing metadata-intensive operations on
> low memory such as a git checkout. I've been running 3.9-rc2 with the
>
On 04/02/2013 04:27 PM, Mel Gorman wrote:
> I'm testing a page-reclaim-related series on my laptop that is partially
> aimed at fixing long stalls when doing metadata-intensive operations on
> low memory such as a git checkout. I've been running 3.9-rc2 with the
> series applied but found that the
44 matches
Mail list logo