On Wed, 3 Jul 2013 14:49:01 +1000 Dave Chinner wrote:
> On Tue, Jul 02, 2013 at 08:28:42PM -0700, Linus Torvalds wrote:
> > On Tue, Jul 2, 2013 at 8:07 PM, Dave Chinner wrote:
> > >>
> > >> Then that test would become
> > >>
> > >> if (wbc->sync_mode == WB_SYNC_SINGLE) {
> > >>
> > >> in
On Tue, Jul 02, 2013 at 08:28:42PM -0700, Linus Torvalds wrote:
> On Tue, Jul 2, 2013 at 8:07 PM, Dave Chinner wrote:
> >>
> >> Then that test would become
> >>
> >> if (wbc->sync_mode == WB_SYNC_SINGLE) {
> >>
> >> instead, and now "sync_mode" would actually describe what mode of
> >> syn
On Tue, Jul 2, 2013 at 8:07 PM, Dave Chinner wrote:
>>
>> Then that test would become
>>
>> if (wbc->sync_mode == WB_SYNC_SINGLE) {
>>
>> instead, and now "sync_mode" would actually describe what mode of
>> syncing the caller wants, without that hacky special "we know what the
>> caller _r
On Tue, Jul 02, 2013 at 10:38:20AM -0700, Linus Torvalds wrote:
> On Tue, Jul 2, 2013 at 9:57 AM, Jan Kara wrote:
> >
> > sync(2) was always slow in presence of heavy concurrent IO so I don't
> > think this is a stable material.
>
> It's not the "sync being slow" part I personally react to. I d
On Tue, Jul 2, 2013 at 9:57 AM, Jan Kara wrote:
>
> sync(2) was always slow in presence of heavy concurrent IO so I don't
> think this is a stable material.
It's not the "sync being slow" part I personally react to. I don't
care that much about that.
It's the "sync slows down other things" par
On Tue 02-07-13 09:13:43, Linus Torvalds wrote:
> On Tue, Jul 2, 2013 at 7:05 AM, Jan Kara wrote:
> > On Tue 02-07-13 22:38:35, Dave Chinner wrote:
> >>
> >> IOWs, sync is 7-8x faster on a busy filesystem and does not have an
> >> adverse impact on ongoing async data write operations.
> > The pa
On Tue, Jul 2, 2013 at 7:05 AM, Jan Kara wrote:
> On Tue 02-07-13 22:38:35, Dave Chinner wrote:
>>
>> IOWs, sync is 7-8x faster on a busy filesystem and does not have an
>> adverse impact on ongoing async data write operations.
> The patch looks good. You can add:
> Reviewed-by: Jan Kara
Ok, I
On Tue 02-07-13 22:38:35, Dave Chinner wrote:
> > As a bonus filesystems could also optimize their write_inode() methods when
> > they know ->sync_fs() is going to happen in future. E.g. ext4 wouldn't have
> > to do the stupid ext4_force_commit() after each written inode in
> > WB_SYNC_ALL mode.
>
On Tue, Jul 02, 2013 at 10:19:37AM +0200, Jan Kara wrote:
> On Tue 02-07-13 16:29:54, Dave Chinner wrote:
> > > > We could, but we just end up in the same place with sync as we are
> > > > now - with a long list of clean inodes with a few inodes hidden in
> > > > it that are under IO. i.e. we still
On Tue 02-07-13 16:29:54, Dave Chinner wrote:
> > > We could, but we just end up in the same place with sync as we are
> > > now - with a long list of clean inodes with a few inodes hidden in
> > > it that are under IO. i.e. we still have to walk lots of clean
> > > inodes to find the dirty ones th
On Mon, Jul 01, 2013 at 02:00:37PM +0200, Jan Kara wrote:
> On Sat 29-06-13 13:39:24, Dave Chinner wrote:
> > On Fri, Jun 28, 2013 at 12:28:19PM +0200, Jan Kara wrote:
> > > On Fri 28-06-13 13:58:25, Dave Chinner wrote:
> > > > writeback: store inodes under writeback on a separate list
> > > >
> >
On Sat 2013-06-29 19:44:49, Dave Jones wrote:
> On Sat, Jun 29, 2013 at 03:23:48PM -0700, Linus Torvalds wrote:
>
> > > So with that patch, those two boxes have now been fuzzing away for
> > > over 24hrs without seeing that specific sync related bug.
> >
> > Ok, so at least that confirms that
On Sat 29-06-13 13:39:24, Dave Chinner wrote:
> On Fri, Jun 28, 2013 at 12:28:19PM +0200, Jan Kara wrote:
> > On Fri 28-06-13 13:58:25, Dave Chinner wrote:
> > > writeback: store inodes under writeback on a separate list
> > >
> > > From: Dave Chinner
> > >
> > > When there are lots of cached in
On Sun, Jun 30, 2013 at 12:05:31PM +1000, Dave Chinner wrote:
> On Sat, Jun 29, 2013 at 03:23:48PM -0700, Linus Torvalds wrote:
> > On Sat, Jun 29, 2013 at 1:13 PM, Dave Jones wrote:
> > >
> > > So with that patch, those two boxes have now been fuzzing away for
> > > over 24hrs without seeing that
On Sat, Jun 29, 2013 at 03:23:48PM -0700, Linus Torvalds wrote:
> On Sat, Jun 29, 2013 at 1:13 PM, Dave Jones wrote:
> >
> > So with that patch, those two boxes have now been fuzzing away for
> > over 24hrs without seeing that specific sync related bug.
>
> Ok, so at least that confirms that yes,
On Sat, 2013-06-29 at 19:44 -0400, Dave Jones wrote:
> Yeah, this is running as a user. Those don't sound like things that should
> be possible. What instrumentation could I add to figure out why
> that kthread got awakened ?
trace-cmd record -e sched_wakeup -f 'comm ~ "migrati*"'
Add "-O sta
On Sat, 2013-06-29 at 15:23 -0700, Linus Torvalds wrote:
> Does the machine recover? Because if it does, I'd be inclined to just
> ignore it. Although it would be interesting to hear what triggers this
> - normal users - and I'm assuming you're still running trinity as
> non-root - generally shoul
On Sat, Jun 29, 2013 at 03:23:48PM -0700, Linus Torvalds wrote:
> > So with that patch, those two boxes have now been fuzzing away for
> > over 24hrs without seeing that specific sync related bug.
>
> Ok, so at least that confirms that yes, the problem is the excessive
> contention on inode_
On Sat, Jun 29, 2013 at 1:13 PM, Dave Jones wrote:
>
> So with that patch, those two boxes have now been fuzzing away for
> over 24hrs without seeing that specific sync related bug.
Ok, so at least that confirms that yes, the problem is the excessive
contention on inode_sb_list_lock.
Ugh. There'
On Fri, Jun 28, 2013 at 01:58:25PM +1000, Dave Chinner wrote:
> > Oh, that's easy enough to fix. It's just changing the wait_sb_inodes
> > loop to use a spin_trylock(&inode->i_lock), moving the inode to
> > the end of the sync list, dropping all locks and starting again...
>
> New version be
On Fri, Jun 28, 2013 at 12:28:19PM +0200, Jan Kara wrote:
> On Fri 28-06-13 13:58:25, Dave Chinner wrote:
> > writeback: store inodes under writeback on a separate list
> >
> > From: Dave Chinner
> >
> > When there are lots of cached inodes, a sync(2) operation walks all
> > of them to try to fi
On Fri 28-06-13 13:58:25, Dave Chinner wrote:
> On Fri, Jun 28, 2013 at 11:13:01AM +1000, Dave Chinner wrote:
> > On Thu, Jun 27, 2013 at 11:21:51AM -0400, Dave Jones wrote:
> > > On Thu, Jun 27, 2013 at 10:52:18PM +1000, Dave Chinner wrote:
> > >
> > >
> > > > > Yup, that's about three of ord
On Thu 27-06-13 19:59:50, Linus Torvalds wrote:
> On Thu, Jun 27, 2013 at 5:54 PM, Dave Chinner wrote:
> > On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
> >>
> >> So what made it all start happening now? I don't recall us having had
> >> these kinds of issues before..
> >
> > Not
On Thu, Jun 27, 2013 at 10:22:45PM -1000, Linus Torvalds wrote:
> > It looks ok, but I still think it is solving the wrong problem.
> > FWIW, your optimisation has much wider application that just this
> > one place. I'll have a look to see how we can apply this approach
> > across all the inode l
On Thu, Jun 27, 2013 at 9:21 PM, Dave Chinner wrote:
>
> Besides, making the inode_sb_list_lock per sb won't help solve this
> problem, anyway. The case that I'm testing involves a filesystem
> that contains 99.97% of all inodes cached by the system. This is a
> pretty common situation
Yeah..
On Thu, Jun 27, 2013 at 07:59:50PM -1000, Linus Torvalds wrote:
> Also, looking some more now at that wait_sb_inodes logic, I have to
> say that if the problem is primarily the inode->i_lock, then that's
> just crazy.
Looks more like contention on inode_sb_list_lock, actually...
> And no, I don
On Thu, Jun 27, 2013 at 07:59:50PM -1000, Linus Torvalds wrote:
> On Thu, Jun 27, 2013 at 5:54 PM, Dave Chinner wrote:
> > On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
> >>
> >> So what made it all start happening now? I don't recall us having had
> >> these kinds of issues befo
On Thu, Jun 27, 2013 at 5:54 PM, Dave Chinner wrote:
> On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
>>
>> So what made it all start happening now? I don't recall us having had
>> these kinds of issues before..
>
> Not sure - it's a sudden surprise for me, too. Then again, I have
On Fri, Jun 28, 2013 at 11:13:01AM +1000, Dave Chinner wrote:
> On Thu, Jun 27, 2013 at 11:21:51AM -0400, Dave Jones wrote:
> > On Thu, Jun 27, 2013 at 10:52:18PM +1000, Dave Chinner wrote:
> >
> >
> > > > Yup, that's about three of orders of magnitude faster on this
> > > > workload
> >
On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
> On Thu, Jun 27, 2013 at 3:18 PM, Dave Chinner wrote:
> >
> > Right, that will be what is happening - the entire system will go
> > unresponsive when a sync call happens, so it's entirely possible
> > to see the soft lockups on inode
On Thu, Jun 27, 2013 at 3:18 PM, Dave Chinner wrote:
>
> Right, that will be what is happening - the entire system will go
> unresponsive when a sync call happens, so it's entirely possible
> to see the soft lockups on inode_sb_list_add()/inode_sb_list_del()
> trying to get the lock because of the
On Thu, Jun 27, 2013 at 10:30:55AM -0400, Dave Jones wrote:
> On Thu, Jun 27, 2013 at 05:55:43PM +1000, Dave Chinner wrote:
>
> > Is this just a soft lockup warning? Or is the system hung?
>
> I've only seen it completely lock up the box 2-3 times out of dozens
> of times I've seen this, and t
On Thu, Jun 27, 2013 at 11:21:51AM -0400, Dave Jones wrote:
> On Thu, Jun 27, 2013 at 10:52:18PM +1000, Dave Chinner wrote:
>
>
> > > Yup, that's about three of orders of magnitude faster on this
> > > workload
> > >
> > > Lightly smoke tested patch below - it passed the first round of
On Thu, Jun 27, 2013 at 10:52:18PM +1000, Dave Chinner wrote:
> > Yup, that's about three of orders of magnitude faster on this
> > workload
> >
> > Lightly smoke tested patch below - it passed the first round of
> > XFS data integrity tests in xfstests, so it's not completely
> > bu
On Thu, Jun 27, 2013 at 05:55:43PM +1000, Dave Chinner wrote:
> Is this just a soft lockup warning? Or is the system hung?
I've only seen it completely lock up the box 2-3 times out of dozens
of times I've seen this, and tbh that could have been a different bug.
> I mean, what you see here i
On Thu, Jun 27, 2013 at 08:06:12PM +1000, Dave Chinner wrote:
> On Thu, Jun 27, 2013 at 05:55:43PM +1000, Dave Chinner wrote:
> > On Wed, Jun 26, 2013 at 08:22:55PM -0400, Dave Jones wrote:
> > > On Wed, Jun 26, 2013 at 09:18:53PM +0200, Oleg Nesterov wrote:
> > > > On 06/25, Dave Jones wrote:
> >
On Thu, Jun 27, 2013 at 05:55:43PM +1000, Dave Chinner wrote:
> On Wed, Jun 26, 2013 at 08:22:55PM -0400, Dave Jones wrote:
> > On Wed, Jun 26, 2013 at 09:18:53PM +0200, Oleg Nesterov wrote:
> > > On 06/25, Dave Jones wrote:
> > > >
> > > > Took a lot longer to trigger this time. (13 hours of ru
On Wed, Jun 26, 2013 at 08:22:55PM -0400, Dave Jones wrote:
> On Wed, Jun 26, 2013 at 09:18:53PM +0200, Oleg Nesterov wrote:
> > On 06/25, Dave Jones wrote:
> > >
> > > Took a lot longer to trigger this time. (13 hours of runtime).
> >
> > And _perhaps_ this means that 3.10-rc7 without 8aac62
On Wed, 2013-06-26 at 16:00 -0400, Dave Jones wrote:
> On Wed, Jun 26, 2013 at 03:52:15PM -0400, Steven Rostedt wrote:
> Yeah, that's what I meant by "this patch".
> To reduce ambiguity, I mean the one below.. There wasn't another patch
> that I missed right ?
>
On other patch, but I've found is
Hello,
On Wed, Jun 26, 2013 at 06:06:45PM -0700, Eric W. Biederman wrote:
> Just based on the last trace and your observation that it seems to be
> vfs/block layer related I am going to mildly suggest that Jens and Tejun
> might have a clue. Tejun made a transformation of the threads used for
> w
Dave Jones writes:
> On Wed, Jun 26, 2013 at 09:18:53PM +0200, Oleg Nesterov wrote:
> > On 06/25, Dave Jones wrote:
> > >
> > > Took a lot longer to trigger this time. (13 hours of runtime).
> >
> > And _perhaps_ this means that 3.10-rc7 without 8aac6270 needs more
> > time to hit the same
On Wed, Jun 26, 2013 at 09:18:53PM +0200, Oleg Nesterov wrote:
> On 06/25, Dave Jones wrote:
> >
> > Took a lot longer to trigger this time. (13 hours of runtime).
>
> And _perhaps_ this means that 3.10-rc7 without 8aac6270 needs more
> time to hit the same bug ;)
Ok, that didn't take long.
On Wed, Jun 26, 2013 at 03:52:15PM -0400, Steven Rostedt wrote:
> > > Hmm, no it needs a fix to make this work. I applied a patch below that
> > > should do this correctly (and will put this into my 3.11 queue).
> > >
> > > If you run the test again with this change and with the above fil
On Wed, 2013-06-26 at 01:23 -0400, Dave Jones wrote:
> On Tue, Jun 25, 2013 at 12:23:34PM -0400, Steven Rostedt wrote:
>
> > Now, what we can try to do as well, is to add a trigger to disable
> > tracing, which should (I need to check the code) stop tracing on printk.
> > To do so:
> >
> > #
On Wed, Jun 26, 2013 at 09:18:53PM +0200, Oleg Nesterov wrote:
> On 06/25, Dave Jones wrote:
> >
> > Took a lot longer to trigger this time. (13 hours of runtime).
>
> And _perhaps_ this means that 3.10-rc7 without 8aac6270 needs more
> time to hit the same bug ;)
>
> Dave, I am not going
On 06/25, Dave Jones wrote:
>
> Took a lot longer to trigger this time. (13 hours of runtime).
And _perhaps_ this means that 3.10-rc7 without 8aac6270 needs more
time to hit the same bug ;)
Dave, I am not going to "deny the problem". We should investigate it
anyway. And yes, 8aac6270 is not as tr
On Tue, Jun 25, 2013 at 12:23:34PM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-25 at 11:35 -0400, Dave Jones wrote:
> > Took a lot longer to trigger this time. (13 hours of runtime).
> >
> > This trace may still not be from the first lockup, as a flood of
> > them happened at the same time
On Tue, Jun 25, 2013 at 12:23:34PM -0400, Steven Rostedt wrote:
> Now, what we can try to do as well, is to add a trigger to disable
> tracing, which should (I need to check the code) stop tracing on printk.
> To do so:
>
> # echo printk:traceoff > /sys/kernel/debug/tracing/set_ftrace_filter
On Tue, Jun 25, 2013 at 01:29:54PM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-25 at 13:21 -0400, Steven Rostedt wrote:
> > On Tue, 2013-06-25 at 12:55 -0400, Dave Jones wrote:
> >
> > > While I've been spinning wheels trying to reproduce that softlockup bug,
> > > On another machine I've
On Tue, 2013-06-25 at 13:26 -0400, Dave Jones wrote:
>
> > What's the above saying? 880243288000->prev == 0088023c6cdd but
> > it should have been 88023c6cdd18? That is: 88023c6cdd18->next ==
> > 880243288001?
>
> It's saying something has done >>8 on a pointer, and stuck
On Tue, 2013-06-25 at 13:26 -0400, Dave Jones wrote:
> > Not sure how that would mess up. The ring-buffer code has lots of
> > integrity checks to make sure nothing like this breaks.
>
> My integrity checks can beat up your integrity checks.
I don't know. It looks like my code is beating up yo
On Tue, 2013-06-25 at 13:21 -0400, Steven Rostedt wrote:
> On Tue, 2013-06-25 at 12:55 -0400, Dave Jones wrote:
>
> > While I've been spinning wheels trying to reproduce that softlockup bug,
> > On another machine I've been refining my list-walk debug patch.
> > I added an ugly "ok, the ringbuffer
On Tue, Jun 25, 2013 at 01:21:30PM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-25 at 12:55 -0400, Dave Jones wrote:
>
> > While I've been spinning wheels trying to reproduce that softlockup bug,
> > On another machine I've been refining my list-walk debug patch.
> > I added an ugly "ok, th
On Tue, 2013-06-25 at 13:21 -0400, Steven Rostedt wrote:
> Not sure how that would mess up. The ring-buffer code has lots of
> integrity checks to make sure nothing like this breaks.
See rb_check_pages() and rb_check_list().
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe
On Tue, 2013-06-25 at 12:55 -0400, Dave Jones wrote:
> While I've been spinning wheels trying to reproduce that softlockup bug,
> On another machine I've been refining my list-walk debug patch.
> I added an ugly "ok, the ringbuffer is playing games with lower two bits"
> special case.
>
> But wh
On Mon, Jun 24, 2013 at 01:04:36PM -0400, Steven Rostedt wrote:
> On Mon, 2013-06-24 at 12:51 -0400, Dave Jones wrote:
> > On Mon, Jun 24, 2013 at 12:24:39PM -0400, Steven Rostedt wrote:
> >
> > > > Ah, this is the first victim of my new 'check sanity of nodes during
> > list walks' patch.
On Tue, 2013-06-25 at 11:35 -0400, Dave Jones wrote:
> Took a lot longer to trigger this time. (13 hours of runtime).
>
> This trace may still not be from the first lockup, as a flood of
> them happened at the same time.
>
>
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5
Took a lot longer to trigger this time. (13 hours of runtime).
This trace may still not be from the first lockup, as a flood of
them happened at the same time.
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 3.10.0-rc7+
# -
On Mon, Jun 24, 2013 at 01:53:11PM -0400, Steven Rostedt wrote:
> > Also. watchdog_timer_fn() calls printk() only if it detects the
> > lockup, so I assume you hit another one?
>
> Probably.
Yeah, unfortunately it happened while I was travelling home to the box,
so I couldn't stop it after t
On Mon, 2013-06-24 at 19:35 +0200, Oleg Nesterov wrote:
> On 06/24, Dave Jones wrote:
> >
> > On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> > >
> > > Could you please do the following:
> > >
> > > 1. # cd /sys/kernel/debug/tracing
> > > # echo 0 >> options/function-trac
On Mon, Jun 24, 2013 at 07:35:10PM +0200, Oleg Nesterov wrote:
> > Not sure this is helpful, but..
>
> This makes me think that something is seriously broken.
>
> Or I do not understand this stuff at all. Quite possible too.
> Steven, could you please help?
>
> But this is already call
On 06/24, Dave Jones wrote:
>
> On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> >
> > Could you please do the following:
> >
> >1. # cd /sys/kernel/debug/tracing
> > # echo 0 >> options/function-trace
> > # echo preemptirqsoff >> current_tracer
> >
> >2.
On Mon, 2013-06-24 at 12:51 -0400, Dave Jones wrote:
> On Mon, Jun 24, 2013 at 12:24:39PM -0400, Steven Rostedt wrote:
>
> > > Ah, this is the first victim of my new 'check sanity of nodes during
> list walks' patch.
> > > It's doing the same prev->next next->prev checking as list_add and
> fr
On Mon, Jun 24, 2013 at 12:24:39PM -0400, Steven Rostedt wrote:
> > Ah, this is the first victim of my new 'check sanity of nodes during list
> > walks' patch.
> > It's doing the same prev->next next->prev checking as list_add and friends.
> > I'm looking at getting it into shape for a 3.12 m
On Mon, Jun 24, 2013 at 06:37:08PM +0200, Oleg Nesterov wrote:
> On 06/24, Dave Jones wrote:
> >
> > On Mon, Jun 24, 2013 at 10:52:29AM -0400, Steven Rostedt wrote:
> >
> > > > > check_list_nodes corruption. next->prev should be prev
> > (88023b8a1a08), but was 0088023b8a1a. (next=
On 06/24, Dave Jones wrote:
>
> On Mon, Jun 24, 2013 at 10:52:29AM -0400, Steven Rostedt wrote:
>
> > > > check_list_nodes corruption. next->prev should be prev
> (88023b8a1a08), but was 0088023b8a1a. (next=880243288001).
> > >
> > > Can't find "check_list_nodes" in lib/list_debug.
On Mon, 2013-06-24 at 12:00 -0400, Dave Jones wrote:
> On Mon, Jun 24, 2013 at 10:52:29AM -0400, Steven Rostedt wrote:
>
> > > > check_list_nodes corruption. next->prev should be prev
> (88023b8a1a08), but was 0088023b8a1a. (next=880243288001).
> > >
> > > Can't find "check_list
On Mon, Jun 24, 2013 at 10:52:29AM -0400, Steven Rostedt wrote:
> > > check_list_nodes corruption. next->prev should be prev
> > > (88023b8a1a08), but was 0088023b8a1a. (next=880243288001).
> >
> > Can't find "check_list_nodes" in lib/list_debug.c or elsewhere...
> >
> > >
On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> > [11054.897670] BUG: soft lockup - CPU#2 stuck for 22s!
> > [trinity-child2:14482]
> > [11054.898503] Modules linked in: bridge stp snd_seq_dummy tun fuse hidp
> > bnep rfcomm can_raw ipt_ULOG can_bcm nfnetlink af_rxrpc llc2 ro
On Mon, 2013-06-24 at 16:39 +0200, Oleg Nesterov wrote:
> On 06/23, Dave Jones wrote:
> >
> > On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> >
> > > Could you please do the following:
> > >
> > > 1. # cd /sys/kernel/debug/tracing
> > > # echo 0 >> options/function-trace
>
On 06/23, Dave Jones wrote:
>
> On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
>
> > Could you please do the following:
> >
> >1. # cd /sys/kernel/debug/tracing
> > # echo 0 >> options/function-trace
> > # echo preemptirqsoff >> current_tracer
>
> dammit.
>
> WA
On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> Could you please do the following:
>
> 1. # cd /sys/kernel/debug/tracing
> # echo 0 >> options/function-trace
> # echo preemptirqsoff >> current_tracer
dammit.
WARNING: at include/linux/list.h:385 rb_hea
On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> > [11018.927809] [sched_delayed] sched: RT throttling activated
> > [11054.897670] BUG: soft lockup - CPU#2 stuck for 22s!
> > [trinity-child2:14482]
> > [11054.898503] Modules linked in: bridge stp snd_seq_dummy tun fuse hidp
On 06/23, Dave Jones wrote:
>
> On Sun, Jun 23, 2013 at 04:36:34PM +0200, Oleg Nesterov wrote:
>
> > > > Dave, I am sorry but all I can do is to ask you to do more testing.
> > > > Could you please reproduce the lockup again on the clean Linus's
> > > > current ? (and _without_ reverting 8aac
On Sun, Jun 23, 2013 at 04:36:34PM +0200, Oleg Nesterov wrote:
> > > Dave, I am sorry but all I can do is to ask you to do more testing.
> > > Could you please reproduce the lockup again on the clean Linus's
> > > current ? (and _without_ reverting 8aac6270, of course).
> >
> > I'll give
On 06/22, Dave Jones wrote:
>
> On Sat, Jun 22, 2013 at 07:31:29PM +0200, Oleg Nesterov wrote:
>
> > > [ 7485.261299] WARNING: at include/linux/nsproxy.h:63
> get_proc_task_net+0x1c8/0x1d0()
> >
> > Hmm. The test case tries to create the symlink in /proc/*/net/ ?
>
> hit it with symlink, but al
On Sat, Jun 22, 2013 at 05:59:05PM -0400, Dave Jones wrote:
> On Sat, Jun 22, 2013 at 07:31:29PM +0200, Oleg Nesterov wrote:
>
> > > [ 7485.261299] WARNING: at include/linux/nsproxy.h:63
> get_proc_task_net+0x1c8/0x1d0()
> > > [ 7485.262021] Modules linked in: 8021q garp stp tun fuse rfcomm bn
On Sat, Jun 22, 2013 at 07:31:29PM +0200, Oleg Nesterov wrote:
> > [ 7485.261299] WARNING: at include/linux/nsproxy.h:63
> > get_proc_task_net+0x1c8/0x1d0()
> > [ 7485.262021] Modules linked in: 8021q garp stp tun fuse rfcomm bnep hidp
> > snd_seq_dummy nfnetlink scsi_transport_iscsi can_bc
On 06/21, Dave Jones wrote:
>
> On Fri, Jun 21, 2013 at 09:59:49PM +0200, Oleg Nesterov wrote:
>
> > I am puzzled. And I do not really understand
> >
> >hardirqs last enabled at (2380318): []
> restore_args+0x0/0x30
> >hardirqs last disabled at (2380319): []
> apic_timer_interrupt+0x
On Fri, Jun 21, 2013 at 09:59:49PM +0200, Oleg Nesterov wrote:
> I am puzzled. And I do not really understand
>
> hardirqs last enabled at (2380318): []
> restore_args+0x0/0x30
> hardirqs last disabled at (2380319): []
> apic_timer_interrupt+0x6a/0x80
> softirqs last ena
On 06/21, Dave Jones wrote:
>
> On Thu, Jun 20, 2013 at 09:16:52AM -0700, Paul E. McKenney wrote:
> > > > > > I've been hitting this a lot the last few days.
> > > > > > This is the same machine that I was also seeing lockups during
> sync()
> > > > >
> > > > > On a whim, I reverted 9713
On Thu, Jun 20, 2013 at 09:16:52AM -0700, Paul E. McKenney wrote:
> > > > > I've been hitting this a lot the last few days.
> > > > > This is the same machine that I was also seeing lockups during
> > sync()
> > > >
> > > > On a whim, I reverted 971394f389992f8462c4e5ae0e3b49a10a9534a3
On Thu, Jun 20, 2013 at 09:16:52AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 19, 2013 at 08:12:12PM -0400, Dave Jones wrote:
> > On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 19, 2013 at 01:53:56PM -0400, Dave Jones wrote:
> > > > On Wed, Jun 19, 2013
On Wed, Jun 19, 2013 at 08:12:12PM -0400, Dave Jones wrote:
> On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 19, 2013 at 01:53:56PM -0400, Dave Jones wrote:
> > > On Wed, Jun 19, 2013 at 12:45:40PM -0400, Dave Jones wrote:
> > > > I've been hitting this a lot
On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 19, 2013 at 01:53:56PM -0400, Dave Jones wrote:
> > On Wed, Jun 19, 2013 at 12:45:40PM -0400, Dave Jones wrote:
> > > I've been hitting this a lot the last few days.
> > > This is the same machine that I was also
On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> > On a whim, I reverted 971394f389992f8462c4e5ae0e3b49a10a9534a3
> > (As I started seeing these just after that rcu merge).
> >
> > It's only been 30 minutes, but it seems stable again. Normally I would
> > hit these within
On Wed, Jun 19, 2013 at 01:53:56PM -0400, Dave Jones wrote:
> On Wed, Jun 19, 2013 at 12:45:40PM -0400, Dave Jones wrote:
> > I've been hitting this a lot the last few days.
> > This is the same machine that I was also seeing lockups during sync()
>
> On a whim, I reverted 971394f389992f8462c4e5
On Wed, Jun 19, 2013 at 12:45:40PM -0400, Dave Jones wrote:
> I've been hitting this a lot the last few days.
> This is the same machine that I was also seeing lockups during sync()
On a whim, I reverted 971394f389992f8462c4e5ae0e3b49a10a9534a3
(As I started seeing these just after that rcu merg
I've been hitting this a lot the last few days.
This is the same machine that I was also seeing lockups during sync()
Dave
BUG: soft lockup - CPU#1 stuck for 22s! [trinity-child9:6902]
Modules linked in: bridge snd_seq_dummy dlci bnep fuse 8021q garp stp hidp tun
rfcomm can_raw ipt_ULOG
89 matches
Mail list logo